Homework 2

DEEPAK BEGRAJKA
CSCE 513
Computer Architecture
Homework: 02
Question 1: Use the following code fragment:
Loop: LD R1,0(R2) load R1 from address 0+R2
DADDI R1, R1, #1 R1=R1+1
SD R1, 0, (R2)
store R1 at address 0+R2
DADDI R2, R2, #4 R2=R2+4

DSUB R4, R3, R2
R4=R3-R2
BNEZ R4, Loop
branch to Loop if R4!= 0
Assume that the initial value of R3 is R2 + 396.

a. Data hazards are caused by data dependences in the code. Whether a dependency
causes a hazard depends on the machine implementation (i.e., number of pipeline
stages). List all of the data dependences in the code above. Record the register, source
instruction, and destination instruction; for example, there is a data dependency for
register R1 from the LD to the DADDI.
Solution: All of the data dependencies in the above code are:
1.
2.
3.
4.
5.
6.
R1
R1
R2
R2
R2
R4
LD
DADDI
LD
SD
DSUB
BNEZ
DADDI
SD
DADDI
DADDI
DADDI
DSUB
For Instruction 1 and 2 it is (WAR) Hazard which is write after Read hence to execute
instruction for adding it will have to wait to load data in the register.
For Instruction 2 and 3 to store in the register it has to load and calculate then only it will
store into the register, hence this also dependency. It is (WAW) Hazard.
For Instruction 4 it has to first load into R2 from instruction 3 then only it can perform
addition, hence it is also dependent on loading. Hence it is (WAR) Hazard.
For storing in R2 it has to load then only it can store hence it is also a dependency. Hence it
is (WAR) Hazard.
For R4 it is dependent on two registers R2 and R3 to load and it is also depending on the
value of R2 from previous instruction, hence it is also a dependency. Hence it is (RAW)
Hazard.
For the last instruction it is completely dependent on all other instruction to execute as it is a
loop. Hence it is (RAW) Hazard.
b. Show the timing of this instruction sequence for the 5-stage RISC pipeline without any
forwarding or bypassing hardware but assuming that a register read and a write in the
same clock cycle forwards through the register file, as shown in Figure C.6. Use a
pipeline timing chart like that in Figure C.5. Assume that the branch is handled by
DEEPAK BEGRAJKA
CSCE 513
flushing the pipeline. If all memory references take 1 cycle, how many cycles does this
loop take to execute?
Solution: Instruction sequence for the 5-stage RISC pipeline: There is no Forwarding and
Bypassing Hardware.
Assumption Register read and write in same clock cycle.
In this part forwarding is done only via register file. Hence branch outcomes and targets will
not be known until end of execute stage. All instruction introduced to pipeline before this
point are stalled/ Flushed.
Given: Initial value of R3 is R2 + 396.
Instruction
LD R1, 0(R2)
DADDI R1, R1,

#1
SD 0(R2), R1
DADDI R2, R2,

#4
DSUB R4, R3,
R2
BNEZ
LOOP
10
11
R4,
12
13
14
1
5
LD R1, 0(R2)
Initial value of R3 is R2 + 396 and after every equal instance of loop will add 4 to R2.
Hence total number of Iterations = 396 / 4 = 99.
Number of cycles lost to RAW hazard including the branch Instructions are: 8
cycles.
2 cycles are lost after branch, due to flushing.
Number of cycles it takes between the loop instances are 16.
Hence,
Total Number of cycles = 98 * 16 + 18 = 1586.
1
6
1
7
DEEPAK BEGRAJKA
CSCE 513
Last loop will take 2 additional cycles, because this latency will not be able to overlapped
with additional loop instances.
c. Show the timing of this instruction sequence for the 5-stage RISC pipeline with full
forwarding and bypassing hardware. Use a pipeline timing chart like that shown in Figure
C.5. Assume that the branch is handled by predicting it as not taken. If all memory
references take 1 cycle, how many cycles does this loop take to execute?
Solution: In this question we are allowed to do normal bypassing and forward circuitry. At
the end of decode we know about branch out comes and targets.
Instruction
LD R1, 0(R2)
DADDI R1, R1,

#1
SD 0(R2), R1
DADDI R2, R2,
#4
DSUB R4, R3,
R2
BNEZ
LOOP
D
F
R4,
(incorrect
instructions)
LD R1, 0(R2)
10
11
12
13
14
Again, given R3 is R2 + 396.

Total number of Iterations = 396 / 4 = 99.
Number of RAW stalls: 2 and 1 flush after branch, since the branch is taken.
Therefore, Total Number of cycles = 98 * 9 + 12 = 894.
Last loop will take 3 additional cycles because the latency cannot be overlapped
with additional loop instances.
Question 2: We begin with a computer implemented in single-cycle implementation. When

the stages are split by functionality, the stages do not require exactly the same amount of
time. The original machine had a clock cycle time of 7 ns. After the stages were split, the
1
5
1
6
1
7
DEEPAK BEGRAJKA
CSCE 513
measured times were IF, 1 ns; ID, 1.5 ns; EX, 1 ns; MEM, 2 ns; and WB, 1.5 ns. The pipeline
register delay is 0.1 ns.
a. What is the clock cycle time of the 5-stage pipelined machine?
Solution: Given: Clock Cycle time for original machine = 7ns.
IF = 1ns.
ID = 1.5ns.
EX = 1ns.
MEM = 2ns.
WB = 1.5ns.
Pipeline delay = 0.1ns.
For pipelined machine the clock cycle time is equal to the time of the slowest instruction,
as all the processes are executed simultaneously hence they must be synchronized.
So the slowest instruction in this question is MEM with the time of 2ns. Hence time of
clock cycle will be the same as MEM instruction because every other instruction has to
synchronize or set according the slowest.
Clock cycle time of 5 staged pipeline machine = MAX (time of individual phase) +
Pipeline Delay
Clock cycle time of 5 staged pipeline machine = time of MEM Instruction + Pipeline
Delay.
Clock cycle time of 5 staged pipeline machine = 2ns + 0.1ns = 2.1 ns.
b. If there is a stall every 4 instructions, what is the CPI of the new machine?
Solution: If there is a stall after every 4 instructions, the number of cycles increases by
the factor of 1 after every 4 instructions.
Cycle required for 4 instructions with stall = 4 cycles + 1 stall = 5 cycles
The cycle = number of instruction * time + stall / time.
Hence, CPI = Cycle per Instruction = Cycle / Instruction.
CPI = 5 / 4 = 1.25.
c. What is the speedup of the pipelined machine over the single cycle machine?
Solution:
Speed Up = Execution time Original / Execution time new.
Execution Time = Instruction * Clock per Instruction * Cycle time.
Instruction = I
CPI new = 1.25.
CPI old = 1.
DEEPAK BEGRAJKA
CSCE 513
Un-pipelined Cycle time= 7.

Pipelined Cycle time = 2.1.
Speed Up = Execution Time old / Execution Time new.
Speed Up = (Pipeline Depth * Un-pipelined Cycle time) / ((1 + Pipeline Stall CPI) *
Pipelined Cycle time)
Speed Up = (1 * 7) / ((1 + 1/4) * 2.1)
Speed Up = 7 / 2.65 = 2.67.
d. If the pipelined machine had an infinite number of stages, what would its speedup be
over the single-cycle machine?
Solution: If there is Infinite number of stages we can ignore extra stall cycles. Hence
cycle time will be equal to delay.
Speed Up = Execution time old / Execution time new.
Execution Time = Instruction * Clock per Instruction * Cycle time.
Speed Up = I * 1 * 7 / I * 1.25 * .1 = 70.
Question 3: We will now add support for register-memory ALU operations to the
classic five-stage RISC pipeline. To offset this increase in complexity, all memory
addressing will be restricted to register indirect (i.e., all addresses are simply a value
held in a register; no offset or displacement may be added to the register value). For
example, the register-memory instruction ADD R4, R5, (R1) means add the contents
of register R5 to the contents of the memory location with address equal to the value
in register R1 and put the sum in register R4. Register-register ALU operations are
unchanged. The following items apply to the integer RISC pipeline:
a. List a rearranged order of the five traditional stages of the RISC pipeline that will
support register-memory operations implemented exclusively by register indirect
addressing.
Solution:
The original order of 5 stage pipeline is IF, ID, EX, MEM, WB.
Now in the instruction ADD R4, R5, (R1) there is a reference to memory location
stored in register R1. Therefore, we need to fetch data from address first and then
execute it.
To support register-memory operations implemented exclusively by register indirect
addressing the
order of pipeline can be rearranged to:
IF, ID, MEM, EX, WB
b. Describe what new forwarding paths are needed for the rearranged pipeline by
stating the source, destination, and information transferred on each needed new
path.
DEEPAK BEGRAJKA
CSCE 513
Solution:
As we are first fetching data from the memory and then executing the instruction.
Therefore, we have to forward the result of MEM phase directly into the input of the
ALU.
A hardware is required to keep track that if a memory location is accessed and data is
fetched from it, after that the hardware forwards the output of the MEM phase to the
input of ALU.
Source pipeline stage
MEM
Destination pipeline stage

EX
EX
MEM
MEM
MEM
Information transferred
Result of MEM transferred
to ALU input.
ALU output to Load/ Store.
Load/ Store to Load/ Store.
c. For the reordered stages of the RISC pipeline, what new data hazards are created
by this addressing mode? Give an instruction sequence illustrating each new
hazard.
Solution:
In the register indirect pipelining there is essentially one hazard, but we have to
consider two cases.
Consider this instruction sequence as example:
ADD R3, R2, R1
NOP
; no operation
ADD R4, (R3), R5
MEM stage in the last instruction should only use the value generated at the end of
EX stage 2 instructions previously. We can solve this problem with using the
forwarding/ bypassing technique: special hardware will catch the case, and will route
the operand directly from EX to MEM without even going through register file. Thus
stall can be avoided.
Then, consider the case in which R3 will be used directly after the instruction in which
it is computed.
ADD R3, R2, R1
ADD R4, (R3), R5
This time also, MEM stage of the last instruction should use the value generated by
the first one, but as you can see if we refer to the timing we observe that the MEM
stage for the 2nd instruction is supposed to start in the same cycle as the EX stage in
which R3 is calculated. Hence there is no alternative but to stall.
We can observe from above that, both of these hazards are (RAW) hazards,
in which a stage is trying to read a value before it was written.
d. List all of the ways that the RISC pipeline with register-memory ALU operations
can have a different instruction count for a given program than the original RISC
pipeline. Give a pair of specific instruction sequences, one for the original pipeline
and one for the rearranged pipeline, to illustrate each way.
DEEPAK BEGRAJKA
CSCE 513
Solution:
1st way: The given register-memory instruction is: ADD R4, R5, (R1)
Instruction Count = 1
2nd way: Another way to represent this instruction in the classical registerregister
format is:
LD
ADD
R0, (R1)
R4, R5, R0
Instruction Count for classical RISC pipeline = 2.

In this we have broken the register indirect single instruction into two direct registerregister instructions, hence it will change the instruction count.
CPI = cycles / Instructions.
This will also change the instruction count as well as CPI as in the above way stall will
be different as in register indirect pipelining we can fetch second instruction and
decode it from memory before execution, but in the register-register direct pipelining
we have to wait for execution as well as until it is store in the memory.
Hence both will have different stalls and hence will require different cycles to
execute.
e. Assume that all instructions take 1 clock cycle per stage. List all of the ways that
the register-memory RISC can have a different CPI for a given program as
compared to the original RISC pipeline.
Solution:
The given register-memory instruction is: ADD R4, R5, (R1)
Instruction Count = 1
Another way to represent this instruction in the classical registerregister format is:
LD
ADD
R0, (R1)
R4, R5, R0
Instruction Count for classical RISC pipeline = 2.

CPI for 1st is = Cycles / Instruction = C / 1
CPI for 2nd is = Cycles / Instruction = C / 2
Hence we can see clearly that in 2 nd way the CPI has brought down by the
factor of Instruction + 1 in the denominator, hence we can say that both of
them will have different Instructions and CPI also.
Consider the following Instruction EXAMPLE:
DEEPAK BEGRAJKA
CSCE 513
1. LD
R1, 0(R2)
2. DADDI R1, R1, #6
3. SD
R1, 0(R2)
Chart for Original RISC pipeline
Instruction
LD R1, 0(R2)
DADDI R1, R1,
#6
SD R1, 0(R2)
1
F
2
D
F
3
X
S
4
M
S
5
W
D
10
11
10
11
Here, as we can see it took 11 cycles to finish 3 Instruction.

Hence,
CPI for Original RISC Pipeline = Cycle / Instruction.
CPI original = 11 / 3.
Now,
Chart for New RISC pipeline
Instruction
LD R1, 0(R2)
DADDI R1, R1,
#6
SD R1, 0(R2)
1
F
2
D
F
3
M
S
4
X
D
5
W
M
Here, as we can see it took 10 cycles to finish 3 Instruction.

Hence,
CPI for new RISC Pipeline = Cycle / Instruction.
CPI new = 10 / 3.
Hence from here we can see that the register-memory RISC can have a different CPI
for a given program as compared to the original RISC pipeline.

Homework 2

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Homework 2

Caricato da

Copyright:

Formati disponibili

DEEPAK BEGRAJKA

store R1 at address 0+R2

DADDI R2, R2, #4 R2=R2+4

BNEZ R4, Loop

branch to Loop if R4!= 0

Assume that the initial value of R3 is R2 + 396.

DADDI R1, R1,

DADDI R2, R2,

DADDI R1, R1,

Again, given R3 is R2 + 396.

Question 2: We begin with a computer implemented in single-cycle implementation. When

Un-pipelined Cycle time= 7.

Destination pipeline stage

Instruction Count for classical RISC pipeline = 2.

Instruction Count for classical RISC pipeline = 2.

Here, as we can see it took 11 cycles to finish 3 Instruction.

Here, as we can see it took 10 cycles to finish 3 Instruction.

Potrebbero piacerti anche