Sei sulla pagina 1di 4

CSCE614 Computer Architecture (Spring 2012)

Assignment #3 Due: 3/19 (Mon) 2:50PM


1. Circle only one of TRUE or FALSE: a. (TRUE, FALSE) The CPI term of the execution time equation can be affected by choice of programming languages (e.g. C, Java, assembly). b. (TRUE, FALSE) The IC term of the execution time equation can be affected by number of pipeline stages. c. (TRUE, FALSE) A RISC ISA typically follows a register-memory architecture model because programs do not access memory very often. d. (TRUE, FALSE) Pipelining facilitates high clock frequency designs. e. (TRUE, FALSE) Pipeline reduces CPI and execution time of individual instruction. f. (TRUE, FALSE) MIPS Architecture doesnt have any RAW hazard after adding forwarding hardware which bypasses results from ALU output at the end of the EX, ALU output at the end of MEM and the memory output at the end of the MEM. 2. Three enhancements with the following speedups are proposed for a new architecture: Speedup_1 = 30, Speedup_2 = 20, Speedup_3 = 15. Only one enhancement is usable at a time. a. If enhancements 1 and 2 are each usable for 25% of the time, what fraction of the time must enhancement 3 be used to achieve an overall speedup of 10? b. Assume the enhancements can be used 25%, 35%, and 10% of the time for enhancements 1, 2, and 3, respectively. For what fraction of the reduced execution time is no enhancement in use? c. Assume we build a compiler for a 500MHz machine for which the following measurements have been made. What is the MIPS rate? Instruction Type R-type Loads Stores Branches Frequency 30% 25% 15% 30% CPI 1 3 2 2 -1-

3. Assume a disk subsystem with the following components and MTTF: 12 disks, each rated at 1,500,000-hour MTTF 1 SCSI controller, 750,000-hour MTTF 1 power supply, 200,000-hour MTTF 1 fan, 300,000-hour MTTF 1 SCSI cable, 2,000,000-hour MTTF Using the simplifying assumptions that the components lifetimes are exponentially distributed which means that the age of the component is not important in probability of failure and that failures are independent, compute the MTTF of the system as a whole.

4. Suppose the following branch instructions have been executed. Label . 1 2 3 4 5 101101 101101 101101 110011 110011 b1 b1 b1 b2 b2 NT NT T NT T Address branch Taken/Not Taken

a. Show the prediction for each branch instruction using a tournament predictor with 2 entries. Also show the final contents of Predictor 1 buffer and Predictor 2 buffer. Predictor 1 and Predictor 2 are 2-bit saturating counters with 2 prediction entries. Note that Predictor 1 is a local predictor while Predictor 2 is global. Assume all table and buffer contents are initialized to zero. Instruction 1 2 3 4 5 Prediction

-2-

5. The latencies of the pipeline function units are: Function Unit Type Integer FP adder Assume the following: - There is no forwarding between function units; results are communicated by a CDB. - There are separate integer functional units for effective address calculation, for ALU operation, and for branch condition evaluation. - There are two FP adder units. - The EX stage does the effective address calculation only for loads and stores. - The issue (IS) and write result (WB) stages each take 1 clock cycle. - There are 5 load buffer slots and 5 store buffer slots. The Load and Store latencies are 3 cycles (1 for address calculation and 2 for memory access). - The BNE takes 1 clock cycle. Assume branches single issue but that branch prediction is perfect Fill out the timetable of a pipeline using two-issue Tomasulo algorithm. Cycles in EX 1 5 Number of Reservation Stations 5 2

Instruction L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, #-8 BNE R1, R2, LOOP L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDIU R1, R1, #-8 BNE R1, R2, LOOP

Issue

Execute Memory Access

Write CDB

comments

6. A pipelined microprocessor has separated integer and floating point functional units. Use latencies of the instructions in the following table: Inst producing result FP ALU op FP ALU op Load double Load double Integer ALU op Inst using result Another FP ALU op Store double Another FP ALU op Store double Any Int -3Latency in clock cycles 3 2 1 0 0

Given the following source code for (i = 100; i > 1; i --) A[i] = x*B[i] + y*C[i]; and its translated MIPS code for the loop body: Loop: L.D F4, 0(R1) MUL.D F6, F4, F0 L.D F8, 0(R2) MUL.D F10, F8, F2 ADD.D F12, F6, F10 S.D F12, 0(R3) DADDUI R3, R3, #-8 DADDUI R1, R1, #-8 DADDUI R2, R2, #-8 BNE R3, R4, Loop ; load B[i] ; multiply x*B[i] ; load C[i] ; multiply y*C[i] ; add x*B[i] + y*C[i] ; store A[i] ; decrement A index ; decrement B index ; decrement C index ; exit loop if done

a. If the processor has in-order-execution implementation, identify stalls in the above code. Write the number of stall cycles for each stall and the instruction that causes the stall. How much is the execution time of the loop body (per iteration)?

b. With a single-issue pipeline, unroll two times to schedule it without any delays. Show the schedule after eliminating any redundant overhead instructions. How much is the execution of the loop body (per iteration) after unrolling and scheduling?

-4-

Potrebbero piacerti anche