Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CSCE 513
Computer Architecture
Homework: 02
Question 1: Use the following code fragment:
Loop: LD R1,0(R2) load R1 from address 0+R2
DADDI R1, R1, #1 R1=R1+1
SD R1, 0, (R2)
R4=R3-R2
R1
R1
R2
R2
R2
R4
LD
DADDI
LD
SD
DSUB
BNEZ
DADDI
SD
DADDI
DADDI
DADDI
DSUB
For Instruction 1 and 2 it is (WAR) Hazard which is write after Read hence to execute
instruction for adding it will have to wait to load data in the register.
For Instruction 2 and 3 to store in the register it has to load and calculate then only it will
store into the register, hence this also dependency. It is (WAW) Hazard.
For Instruction 4 it has to first load into R2 from instruction 3 then only it can perform
addition, hence it is also dependent on loading. Hence it is (WAR) Hazard.
For storing in R2 it has to load then only it can store hence it is also a dependency. Hence it
is (WAR) Hazard.
For R4 it is dependent on two registers R2 and R3 to load and it is also depending on the
value of R2 from previous instruction, hence it is also a dependency. Hence it is (RAW)
Hazard.
For the last instruction it is completely dependent on all other instruction to execute as it is a
loop. Hence it is (RAW) Hazard.
b. Show the timing of this instruction sequence for the 5-stage RISC pipeline without any
forwarding or bypassing hardware but assuming that a register read and a write in the
same clock cycle forwards through the register file, as shown in Figure C.6. Use a
pipeline timing chart like that in Figure C.5. Assume that the branch is handled by
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
flushing the pipeline. If all memory references take 1 cycle, how many cycles does this
loop take to execute?
Solution: Instruction sequence for the 5-stage RISC pipeline: There is no Forwarding and
Bypassing Hardware.
Assumption Register read and write in same clock cycle.
In this part forwarding is done only via register file. Hence branch outcomes and targets will
not be known until end of execute stage. All instruction introduced to pipeline before this
point are stalled/ Flushed.
Given: Initial value of R3 is R2 + 396.
Instruction
LD R1, 0(R2)
10
11
R4,
12
13
14
1
5
LD R1, 0(R2)
Initial value of R3 is R2 + 396 and after every equal instance of loop will add 4 to R2.
Hence total number of Iterations = 396 / 4 = 99.
Number of cycles lost to RAW hazard including the branch Instructions are: 8
cycles.
2 cycles are lost after branch, due to flushing.
Number of cycles it takes between the loop instances are 16.
Hence,
Total Number of cycles = 98 * 16 + 18 = 1586.
1
6
1
7
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
Last loop will take 2 additional cycles, because this latency will not be able to overlapped
with additional loop instances.
c. Show the timing of this instruction sequence for the 5-stage RISC pipeline with full
forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure
C.5. Assume that the branch is handled by predicting it as not taken. If all memory
references take 1 cycle, how many cycles does this loop take to execute?
Solution: In this question we are allowed to do normal bypassing and forward circuitry. At
the end of decode we know about branch out comes and targets.
Instruction
LD R1, 0(R2)
D
F
R4,
(incorrect
instructions)
LD R1, 0(R2)
10
11
12
13
14
1
5
1
6
1
7
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
measured times were IF, 1 ns; ID, 1.5 ns; EX, 1 ns; MEM, 2 ns; and WB, 1.5 ns. The pipeline
register delay is 0.1 ns.
a. What is the clock cycle time of the 5-stage pipelined machine?
Solution: Given: Clock Cycle time for original machine = 7ns.
IF = 1ns.
ID = 1.5ns.
EX = 1ns.
MEM = 2ns.
WB = 1.5ns.
Pipeline delay = 0.1ns.
For pipelined machine the clock cycle time is equal to the time of the slowest instruction,
as all the processes are executed simultaneously hence they must be synchronized.
So the slowest instruction in this question is MEM with the time of 2ns. Hence time of
clock cycle will be the same as MEM instruction because every other instruction has to
synchronize or set according the slowest.
Clock cycle time of 5 staged pipeline machine = MAX (time of individual phase) +
Pipeline Delay
Clock cycle time of 5 staged pipeline machine = time of MEM Instruction + Pipeline
Delay.
Clock cycle time of 5 staged pipeline machine = 2ns + 0.1ns = 2.1 ns.
b. If there is a stall every 4 instructions, what is the CPI of the new machine?
Solution: If there is a stall after every 4 instructions, the number of cycles increases by
the factor of 1 after every 4 instructions.
Cycle required for 4 instructions with stall = 4 cycles + 1 stall = 5 cycles
The cycle = number of instruction * time + stall / time.
Hence, CPI = Cycle per Instruction = Cycle / Instruction.
CPI = 5 / 4 = 1.25.
c. What is the speedup of the pipelined machine over the single cycle machine?
Solution:
Speed Up = Execution time Original / Execution time new.
Execution Time = Instruction * Clock per Instruction * Cycle time.
Instruction = I
CPI new = 1.25.
CPI old = 1.
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
Question 3: We will now add support for register-memory ALU operations to the
classic five-stage RISC pipeline. To offset this increase in complexity, all memory
addressing will be restricted to register indirect (i.e., all addresses are simply a value
held in a register; no offset or displacement may be added to the register value). For
example, the register-memory instruction ADD R4, R5, (R1) means add the contents
of register R5 to the contents of the memory location with address equal to the value
in register R1 and put the sum in register R4. Register-register ALU operations are
unchanged. The following items apply to the integer RISC pipeline:
a. List a rearranged order of the five traditional stages of the RISC pipeline that will
support register-memory operations implemented exclusively by register indirect
addressing.
Solution:
The original order of 5 stage pipeline is IF, ID, EX, MEM, WB.
Now in the instruction ADD R4, R5, (R1) there is a reference to memory location
stored in register R1. Therefore, we need to fetch data from address first and then
execute it.
To support register-memory operations implemented exclusively by register indirect
addressing the
order of pipeline can be rearranged to:
IF, ID, MEM, EX, WB
b. Describe what new forwarding paths are needed for the rearranged pipeline by
stating the source, destination, and information transferred on each needed new
path.
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
Solution:
As we are first fetching data from the memory and then executing the instruction.
Therefore, we have to forward the result of MEM phase directly into the input of the
ALU.
A hardware is required to keep track that if a memory location is accessed and data is
fetched from it, after that the hardware forwards the output of the MEM phase to the
input of ALU.
Source pipeline stage
MEM
EX
MEM
MEM
MEM
Information transferred
Result of MEM transferred
to ALU input.
ALU output to Load/ Store.
Load/ Store to Load/ Store.
c. For the reordered stages of the RISC pipeline, what new data hazards are created
by this addressing mode? Give an instruction sequence illustrating each new
hazard.
Solution:
In the register indirect pipelining there is essentially one hazard, but we have to
consider two cases.
Consider this instruction sequence as example:
ADD R3, R2, R1
NOP
; no operation
ADD R4, (R3), R5
MEM stage in the last instruction should only use the value generated at the end of
EX stage 2 instructions previously. We can solve this problem with using the
forwarding/ bypassing technique: special hardware will catch the case, and will route
the operand directly from EX to MEM without even going through register file. Thus
stall can be avoided.
Then, consider the case in which R3 will be used directly after the instruction in which
it is computed.
ADD R3, R2, R1
ADD R4, (R3), R5
This time also, MEM stage of the last instruction should use the value generated by
the first one, but as you can see if we refer to the timing we observe that the MEM
stage for the 2nd instruction is supposed to start in the same cycle as the EX stage in
which R3 is calculated. Hence there is no alternative but to stall.
We can observe from above that, both of these hazards are (RAW) hazards,
in which a stage is trying to read a value before it was written.
d. List all of the ways that the RISC pipeline with register-memory ALU operations
can have a different instruction count for a given program than the original RISC
pipeline. Give a pair of specific instruction sequences, one for the original pipeline
and one for the rearranged pipeline, to illustrate each way.
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
Solution:
1st way: The given register-memory instruction is: ADD R4, R5, (R1)
Instruction Count = 1
2nd way: Another way to represent this instruction in the classical register- register
format is:
LD
ADD
R0, (R1)
R4, R5, R0
R0, (R1)
R4, R5, R0
DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
1. LD
R1, 0(R2)
2. DADDI R1, R1, #6
3. SD
R1, 0(R2)
Chart for Original RISC pipeline
Instruction
LD R1, 0(R2)
DADDI R1, R1,
#6
SD R1, 0(R2)
1
F
2
D
F
3
X
S
4
M
S
5
W
D
10
11
10
11
Now,
Chart for New RISC pipeline
Instruction
LD R1, 0(R2)
DADDI R1, R1,
#6
SD R1, 0(R2)
1
F
2
D
F
3
M
S
4
X
D
5
W
M