Sei sulla pagina 1di 3

CSE 490/590

Spring 2013 Homework #3


1. For each of the following instruction sequences, determine if a hazard exists. If a hazard exists, show the critical forwarding path and determine the number of cycles by which the penalty is reduced with forwarding. If the penalty cannot be eliminated via forwarding suggest another method of eliminating it. You may assume that the destination register is the first register, and that the machine is a load/store machine. Hence, all ALU instructions must access operands in the register file, not in data memory. In addition, SW is a store word instruction, and LW is load word.
SW AND ADD r3, 0(r2) r1, r2, r3 r1, r1, r3 LW XOR ADD r1, 4(r2) r0, r1 r1, r2, r3

IF

IF

ID

ID

RD

RD

ALU

ALU

MEM

MEM

WB

WB

2. Consider the pipeline discussed in the previous problem. A new stage is added in between RD and ALU which adds an additional addressing mode, allowing the second ALU source operand to be obtained directly from memory. Hence, the second source operand can come from the register file (in stage RD) or from memory (in the newly added stage). Draw the new pipeline diagram. For the following instruction sequence, determine if a hazard exists. If a hazard exists, show the critical forwarding path and determine the number of cycles by which the penalty is reduced with forwarding. Does the addition of the new stage have any impact on the external fragmentation? If so, how?
ADD SW SUB LW r5, r3, r6, r1, r4, r3 0(r2) r4, 0(r2) 0(r2)

3. Equation 2.4 in the textbook describes the performance enhancement of a pipelined processor. Why is this equation an oversimplification? Is the actual performance higher or lower? Why? 4. Shown on the next page is a nonpipelined microprocessor. Design a pipelined version of the processor which minimizes internal fragmentation and utilizes five stages. Assume pipeline registers have a setup time of 0.5 ns, and there is a 1 ns propagation delay through the register (from the triggering clock event to the output). Compute the latency of the pipelined and nonpipelined instructions, the cycle time, internal fragmentation, and potential speedup over the nonpipelined version, and the internal fragmentation. Is the external fragmentation minimized? Justify your answer.

Add (2.5 ns) 0.5 ns setup


CLK Program Counter (1 ns)
32

Instruction Cache (6 ns)


32

Instruction Type Decoder (3 ns)

Function Decoder (2.5 ns)

Source Operand Decoder (2 ns)

Immediate Operand Decoder (2 ns)


16

Destination Operand Decoder (2 ns)

10 5

Register File (4 ns) Read Register 1 Read Register 2 Operand 1 Write Register Write Data Operand 2 MUX (1 ns)
32 32

32

ALU (4 ns)
32

MUX (1 ns)

Data Cache (6 ns)


32

Potrebbero piacerti anche