Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
with
PIPELINING
Pipelining
• Pipeline concepts
• Hazards
• Example
Pipelined vs. Single-Cycle
Instruction Execution
Program
execution 2 4 6 8 10 12 14 16 18
order Time
Single-cycle
(in instructions)
Instruction Data
lw $1, 100($0) fetch
Reg ALU
access
Reg
Instruction Data
lw $2, 200($0) 8 ns fetch
Reg ALU
access
Reg
Instruction
lw $3, 300($0) 8 ns fetch
...
8 ns
Instruction Data
Pipelined
lw $2, 200($0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $3, 300($0) 2 ns Reg ALU Reg
fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Pipeline Implementation
• Pipelining
– Goal of MIPS: (Clock cycles Per Instruction) CPI <= 1
– Some instructions take longer to execute than others
– Don’t want cycle time to depend on slowest instruction
– Want 100% hardware utilization
– Split execution of each instruction into several, balanced “stages”
– Each stage is a block of combinational logic
– Latency of each stage fits within 1 clock cycle
– Insert registers between each pipeline stage to hold intermediate results
– Execute each of these steps in parallel for a sequence of
instructions
Pipelining MIPS
• MIPS characteristics make pipelining easy
– All instructions are approx. same length
• Fetch and decode stages are similar for all instructions
– Just a few instruction formats
• Simplifies instruction decode and makes it possible in one
stage
– Memory operands appear only in load/stores
• Memory access can be deferred to exactly one later stage
– Operands are aligned in memory
• One data transfer instruction requires one memory access
stage
MIPS pipeline stages
• Fetch (IF)
– Read next instruction from memory
– Increment address counter
• Decode (ID)
– Read register operands,
– Resolve instruction in control signals
– Compute branch target
• Execute (EX)
– Execute arithmetic/resolve branches
• Memory (MEM)
– Perform load/store accesses to memory
– Take branches
• Write back (WB)
– Write arithmetic results to register file
Pipelined Datapath
Recall the 5 steps in instruction execution
1. Instruction Fetch & PC Increment (IF)
2. Instruction Decode and Register Read (ID)
3. Execution or calculate address (EX)
4. Memory access (MEM)
5. Write result into register (WB)
Review - Single-Cycle
Datapath “Steps”
ADD
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1 Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
Memory RD M
E U
16 X 32 X
T WD
N
D
IF ID EX MEM WB
Instruction Fetch Instruction Decode Execute/ Address Calc. Memory Access Write Back
Pipelined Datapath – Key Idea
• What happens if we break the execution into multiple cycles,
but keep the extra hardware?
– Answer: We may be able to start executing a new instruction at each
clock cycle - pipelining
• …but we shall need extra registers to hold data between
cycles – pipeline registers
Pipelined Datapath
Pipeline registers wide enough to hold data coming in
ADD
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD M
E Memory U
16 X 32 X
T WD
N
D
4 ADD
64 bits 128 bits
PC <<2 97 bits 64 bits
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Zero
Register File ALU
WD
RD2 M
U ADDR
Hazard- X
Data
RD
Situation
M
E Memory U
16 X 32 X
WD
that would T
N
cause D
incorrect
execution IF/ID ID/EX EX/MEM MEM/WB
Only data flowing right to left may cause hazard…, why?
Bug in the Datapath
4 ADD
PC <<2
Instruction I
ADDR RD
32 16 32
5 5 5
Instruction
Memory RN1 RN2 WN
RD1
Register File ALU
WD
RD2 M
U ADDR
X
Data
RD M
E Memory U
16 X 32 X
T WD
N
D
ADD
ADD
4 64 bits 133 bits
102 bits 69 bits
<<2
PC
ADDR RD 5
RN1 RD1
32
ALU Zero
Instruction RN2
5
Memory Register
5
WN File RD2 M
WD U ADDR
X
Data
E Memory RD M
U
16 X 32 X
T WD
N
5 D
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
SW
Clock Cycle 2
LW
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
ADD
Clock Cycle 3
SW LW
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 4
ADD SW LW
SUB
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 5
SUB ADD SW LW
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 6
SUB ADD SW
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 7 SUB ADD
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Single-Clock-Cycle Diagram:
Clock Cycle 8 SUB
lw $t0, 10($t1)
sw $t3, 20($t4)
add $t5, $t6, $t7
sub $t8, $t9, $t10
Represent Pipelines Graphically
• Multiple instruction execution over multiple clock
cycles
– Instructions are listed in execution order from top to
bottom
– Clock cycles move from left to right
– Show the use of resources at each stage and each
cycle
Represent Pipelines Graphically
1. Lw $t6, 8($s5)
2. Add $s1, $s2, $s3
3. Ori $s4, $t3, 7
4. Sub $t5, $s2, $t3
5. Sw $s2, 10($t3)
Graphically Representing
Pipelines
Time (in cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
Program Execution Order
lwlw$t6,
$t6,8($s5)
8($s5) IM Reg ALU DM Reg
add
add$s1,
$s1,$s2,
$s2,$s3
$s3 IM Reg ALU DM Reg
ori
ori$s4,
$s4,$t3,
$t3,77 IM Reg ALU DM Reg
sub
sub$t5,
$t5,$s2,
$s2,$t3
$t3 IM Reg ALU DM Reg
sw
sw$s2,
$s2,10($t3)
10($t3) IM Reg ALU DM
Instruction-Time Diagram
• Instruction-Time Diagram shows:
– Which instruction occupying what stage at
each clock cycle
• Instruction flow is pipelined over the 5 stages
1. Lw $t7, 8($s3)
2. Lw $t6, 8($st)
3. Ori $t4, $s3, 7
4. Sub $s5, $s2, $t3
5. Sw $s2, 10($s3)
Instruction-Time Diagram
IF ID EX MEM WB
lw $t6, 8($s5) IF ID EX MEM WB
ori $t4, $s3, 7 IF ID EX – WB
sub $s5, $s2, $t3 IF ID EX – WB
sw $s2, 10($s3) IF ID EX MEM –
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Time
Instruction [5 0]
Recall Single-Cycle – ALU Control
Instruction AluOp Instruction Funct Field Desired ALU control
opcode operation ALU action input
LW 00 load word xxxxxx add 010
SW 00 store word xxxxxx add 010
Branch eq 01 branch eq xxxxxx subtract 110
R-type 10 add 100000 add 010
R-type 10 subtract 100010 subtract 110
R-type 10 AND 100100 and 000
R-type 10 OR 100101 or 001
R-type 10 set on less 101010 set on less 111
RegDst The register destination number for the The register destination number for the
Write register comes from the rt field (bits 20-16) Write register comes from the rd field (bits 15-11)
RegWrite None The register on the Write register input is written
with the value on the Write data input
AlLUSrc The second ALU operand comes from the The second ALU operand is the sign-extended,
second register file output (Read data 2) lower 16 bits of the instruction
PCSrc The PC is replaced by the output of the adder The PC is replaced by the output of the adder
that computes the value of PC + 4 that computes the branch target
MemRead None Data memory contents designated by the address
input are put on the first Read data output
MemWrite None Data memory contents designated by the address
input are replaced by the value of the Write data input
MemtoReg The value fed to the register Write data input The value fed to the register Write data input
comes from ALU comes from the data memory
0
M
u
x
1
Add
Add
4 Add
result
Branch
Shift
RegWrite left 2
Read MemWrite
Instruction
PC Address register 1
Read
Read data 1 ALUSrc
register 2 Zero
Zero MemtoReg
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u M
Data u
Write x memory
data x
1
0
Write
data
Instruction
Same control
[15– 0] 16 32 6
Sign ALU
extend control MemRead
signals as the Instruction
single-cycle
[20– 16]
0
M
datapath
ALUOp
Instruction u
[15– 11] x
1
RegDst
Pipeline Control Signals
• There are five stages in the pipeline
– instruction fetch / PC increment Nothing to control as instruction memory
read and PC write are always enabled
– instruction decode / register fetch
– execution / address calculation
– memory access
– write back
Write-back
Execution/Address Calculation Memory access stage stage control
stage control lines control lines lines
Reg ALU ALU ALU Mem Mem Reg Mem to
Instruction Dst Op1 Op0 Src Branch Read Write write Reg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 X
Pipeline Control Implementation
• Pass control signals along just like the data – extend each pipeline
register to hold needed control bits for succeeding stages
WB
Instruction
Control M WB
EX M WB
• Note: The 6-bit funct field of the instruction required in the EX stage to
generate ALU control can be retrieved as the 6 least significant bits of the
immediate field which is sign-extended and passed from the IF/ID
register to the ID/EX register
Pipeline Hazards
• Situations that would cause incorrect execution
• Data flow problems that arise as a result of pipelining
– Limits the amount of parallelism, sometimes induces “penalties”
that prevent one instruction per clock cycle
• Types
– Structural hazards
– Data hazards
– Control hazards
Hazards
Draw pipeline diagram, and check hazard is exist or not?
• lw $1, 100($0)
• lw $2, 200($0)
• lw $3, 300($0)
• lw $4, 400($0)
Hazard
e x e c u t io n 2 4 6 8 10 12 14
T im e
o rd e r
( in in s t r u c tio n s )
In s tru c tio n D a ta
lw $ 1 , 1 0 0 ($ 0 ) R eg ALU R eg
fe tc h acc es s
In s tr u c tio n D a ta
lw $ 2 , 2 0 0 ($ 0 ) 2 ns Reg A LU R eg
fe tc h a cc e s s
In s tru c tio n D a ta
lw $ 3 , 3 0 0 ($ 0 ) 2 ns R eg ALU Reg
fe tc h a cc e s s
lw $4, 400($0) 2 ns 2 ns 2 ns 2 ns 2 ns
Structural Hazards
• E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth lw instructions
P rogram
e xecutio n 2 4 6 8 10 12 14
Time
o rd er
(in in structions)
Instruction Data
lw $1, 100 ($ 0) Reg ALU Reg
fetch access Pipelined
Instruction Data
lw $2, 200 ($ 0) 2 ns Reg ALU Reg
fetch access Hazard if single memory
Instruction Data
lw $3, 300 ($ 0) 2 ns Reg ALU Reg
fetch access
Instruction Data
lw $4, 400 ($ 0) Reg ALU Reg
2 ns fetch access
2 ns 2 ns 2 ns 2 ns 2 ns
Structural Hazards
• Inadequate hardware to simultaneously support all instructions in
the pipeline in the same clock cycle
• lw $1, 100($0)
• lw $2, 200($0)
• lw $3, 300($0)
• lw $4, 400($0)
Resolving Structural Hazards
• Serious Hazard:
– Hazard cannot be ignored
– Easy to avoid
Example:
sub $s2, $t1, $t3
add $s4, $s2, $t5
or $s6, $t3, $s2
and $s7, $t4, $s2
sw $t8, 10($s2)
RAW Data Hazard Solutions
Example:
sub $s2, $t1, $t3
add $s4, $s2, $t5
or $s6, $t3, $s2
and $s7, $t4, $s2
sw $t8, 10($s2)
Example of a RAW Data Hazard
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
value of $s2 10 10 10 10 10 20 20 20
Program Execution Order
add $s4, $s2, $t5 IM Reg Reg Reg Reg ALU DM Reg
• Is forwarding useful?
• If an R-type instruction following a load uses the
result of the load – called load-use data hazard
RAW Data Hazard Solutions
• Unfortunately, not all data hazards can be forwarded
– Load has a delay that cannot be eliminated by forwarding
• In the example shown below …
– The LW instruction does not read data until end of CC4
– Cannot forward data to ADD at end of CC3 - NOT possible
Time (cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8
lw $s2, 20($t1)
However, load can
IF Reg ALU DM Reg
Program Order
forward data to
2nd next and later
add $s4, $s2, $t5 IF Reg ALU DM Reg
instructions
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Time
Showing Stall Cycles
• Stall cycles can be shown on instruction-time diagram
• Hazard is detected in the Decode stage
• Stall indicates that instruction is delayed
• Instruction fetching is also delayed after a stall
• Example:
CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 CC10 Time
RAW Data Hazard Solutions
• Software Solution
– Reordering Code to Avoid Pipeline Stall
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t2, 0($t1)
sw $t0, 4($t1)
RAW Data Hazard Solutions
• Example:
lw $t0, 0($t1)
lw $t2, 4($t1)
Data hazard
sw $t2, 0($t1)
sw $t0, 4($t1)
• Reordered code:
lw $t0, 0($t1)
lw $t2, 4($t1)
sw $t0, 4($t1)
Interchanged
sw $t2, 0($t1)
Example
• Draw the pipelining execution for the following
code and detect and resolve the hazard, if any.
EX M WB
IF/ID
Add
Add
4 Add result
RegWrite
Branch
Shift
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction 16 32 6
[15– 0]
Control signals
Sign ALU MemRead
extend control
pipeline registers
RegDst
Data Hazards and Forwarding
Time (in clock cycles)
$2 = 10 before sub;
•
$2 = -20 after sub
Value of CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
register $2: 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Program
execution
order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg
Value of CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7 CC 8 CC 9
register $2: 10 10 10 10 10/– 20 – 20 – 20 – 20 – 20
Program
execution
order
(in instructions) sub $2, $1, $3
Reg
sub $2, $1, $3 IM Reg DM and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
and $12, $2, $5 IM Reg DM Reg sw $15, 100($2)
Program
execution order
(in instructions)
sub $2, $1, $3 IM Reg DM Reg
Registers ALU
Data
memory M
u
x
a. No forwarding
M
u
x
Registers
ForwardA ALU
M Data
u memory
x M
u
x
Rs ForwardB
Rt
Rt M
u EX/MEM.RegisterRd
Rd
x
Forwarding MEM/WB.RegisterRd
unit
1. EX hazard
if ( EX/MEM.RegWrite // if there is a write…
and ( EX/MEM.RegisterRd 0 ) // to a non-$0 register…
and ( EX/MEM.RegisterRd = ID/EX.RegisterRs ) ) // matches, then
ForwardA = 10
WB
EX/MEM
Control M WB
MEM/WB
IF/ID EX M WB
M
Instruction
u
x
Registers
Instruction Data
PC ALU
memory memory M
u
M x
u
x
IF/ID.RegisterRs Rs
IF/ID.RegisterRt Rt
IF/ID.RegisterRt Rt
M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
Forwarding MEM/WB.RegisterRd
unit
ID/EX
10 10
WB
Forwarding
EX/MEM
Control M WB
MEM/WB
IF/ID EX M WB
2 $2 $1
M
Instruction
5 u
x
Registers
Instruction Data
PC ALU
memory memory M
$5 $3
u
M x
u
x
2 1
5 3
M
4 2 u
x
Forwarding
• Execution Clock 3
example:
add $9, $4, $2 or $4, $4, $2 and $4, $2, $5 sub $2, . . . before<1>
ID/EX
10 10
WB
IF/ID
or $4, $4, $2 4 $4 $2
6 u
x
Registers
Instruction Data
PC ALU
memory memory M
$2 $5
u
M x
u
x
2 2
6 5
M 2
4 4 u
x
Forwarding
Clock 4
after<1> add $9, $4, $2 or $4, $4, $2 and $4, . . . sub $2, . . .
ID/EX
10 10
WB
Forwarding
EX/MEM
10
Control M WB
MEM/WB
1
IF/ID EX M WB
4 $4 $4
M
Instruction
2 u
x
Registers
Instruction 2 Data
PC ALU
memory memory M
$2 $2
u
M x
u
x
4 4
2 2
M 4 2
u
•
9 4
Execution x
Forwarding
(cont.): Clock 5
ID/EX
10
WB
1
IF/ID
or $4, $4, $2 $4
u
x
Registers
Instruction 4 Data
PC ALU
memory memory M
$2
u
M x
u
x
4
2
M 4 4
9 u
x
Forwarding
Clock 6
Data Hazards and Stalls
• Load word can cause a hazard:
– An instruction tries to read a register following a load instruction that
writes to the same register
As even a pipeline
or $8, $2, $6 IM Reg DM Reg
dependency goes
backward in time
forwarding will not add $9, $4, $2 IM Reg DM Reg
Hazard ID/EX
0
M
u Detection WB
EX/MEM
x
1 Unit Control M WB
MEM/WB
EX M WB
IF/ID
Add
Add
4 Add result
RegWrite
Branch
Shift
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction 16 32 6
[15– 0]
Control signals
Sign ALU MemRead
extend control
pipeline registers
RegDst
Hazard Detection Logic to Stall
• Hazard detection unit implements the following check at ID stage, if
to stall by inserting a bubble into the pipeline by changing the EX,
MEM and WB control fields of the ID/EX pipeline register to 0
WB
IF/IDWrite
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
PCWrite
M
Instruction
u
x
Registers
Instruction Data
PC ALU
memory memory M
u
M x
u
x
IF/ID.RegisterRs
IF/ID.RegisterRt
IF/ID.RegisterRt Rt M EX/MEM.RegisterRd
IF/ID.RegisterRd Rd u
x
ID/EX.RegisterRt Rs Forwarding MEM/WB.RegisterRd
Rt unit
bubble
Stalling
11
WB
IF/IDWrite
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
PCWrite
1 $1
M
Instruction
X u
x
Registers
Instruction Data
PC ALU
memory memory M
$X
u
M x
u
x
• Execution 1
X
example: 2
M
u
x
ID/EX.RegisterRt Forwarding
unit
ClockClock
cycle
2
2
lw $2, 20($1) or $4, $4, $2 and $4, $2, $5 lw $2, 20($1) before<1> before<2>
EX/MEM
$2 $1
PCWrite
2
M
Instruction
5 u
x
Registers
Instruction Data
PC ALU
memory memory M
$5 $X
u
M x
u
x
2 1
5 X
2 M
4 u
x
ID/EX.RegisterRt Forwarding
unit
Clock cycle 3
Clock 3
or $4, $4, $2 and $4, $2, $5 bubble lw $2, . . . before<1>
Hazard
ID/EX.MemRead
detection
2 unit ID/EX
5
10 00
IF/IDWrite
WB
EX/MEM
Stalling
M 11
Control u M WB
x MEM/WB
0
IF/ID EX M WB
PCWrite
2 $2 $2
M
Instruction
5 u
x
Registers
Instruction Data
PC ALU
memory memory M
$5 $5
u
M x
u
x
• Execution 2
5
2
5
M 2
example 4 4 u
x
ID/EX.RegisterRt Forwarding
(cont.): unit
Clock cycle 4
Clock 4
add $9, $4, $2 or $4, $4, $2 and $4, $2, $5 bubble lw $2, . . .
Hazard
ID/EX.MemRead
detection
lw $2, 20($1) 4
2
unit
10
ID/EX
10
IF/IDWrite
WB
and $4, $2, $5 Control
M
u M
EX/MEM
WB
0
4 $4 $2
M
Instruction
2 u
x
Registers
Instruction 2 Data
PC ALU
memory memory M
$2 $5
u
M x
u
x
4 2
2 5
M 2
4 4 u
x
ID/EX.RegisterRt Forwarding
unit
Clock cycle 5
Clock 5
after<1> add $9, $4, $2 or $4, $4, $2 and $4, . . . bubble
Hazard ID/EX.MemRead
detection
4
unit ID/EX
2
Stalling
10 10
WB
IF/IDWrite
EX/MEM
M 10
Control u M WB
x MEM/WB
0
0
IF/ID EX M WB
PCWrite
4 $4 $4
M
Instruction
2 u
x
Registers
Instruction Data
PC ALU
memory memory M
$2 $2
u
M x
u
x
4 4
• Execution 2 2
M 4
9 u
example ID/EX.RegisterRt
4
x
Forwarding
unit
(cont.):
Clock cycle 6
Clock 6
lw $2, 20($1)
detection
unit ID/EX
10 10
WB
EX/MEM
M 10
$4
PCWrite
M
Instruction
u
x
Registers
Instruction 4 Data
PC ALU
memory memory M
$2
u
M x
u
x
4
2
M 4 4
9 u
x
ID/EX.RegisterRt Forwarding
Clock cycle 7
unit
Clock 7
Control Hazards
• Need to make a decision based on the result of a previous
instruction still executing in pipeline
• Jump and Branch can cause great performance loss
• Jump instruction needs only the jump target address
• Branch instruction needs two things:
– Branch Result Taken or Not Taken
– Branch Target Address
• PC + 4 If Branch is NOT taken
• PC + 4 + 4 × immediate If Branch is Taken
Control Hazards
• Solution 1 Stall the pipeline
• Control logic detects a Branch instruction in the 2nd Stage
• ALU computes the Branch outcome in the 3rd Stage
• Next1 and Next2 instructions will be fetched anyway
• Convert Next1 and Next2 into bubbles if branch is taken
cc1 cc2 cc3 cc4 cc5 cc6 cc7
Branch
L1: target instruction Target IF Reg ALU DM
Addr
Instruction Data
beq $1, $2, 40
2 ns fetch
Reg ALU
access
Reg
Prediction success
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
Program
execution 2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
add $4, $5 ,$6 Reg ALU Reg
fetch access
Instruction Data
or $7, $8, $9 Reg ALU Reg
fetch access
4 ns
Control Hazards
Solution 3 Delayed branch: always execute the sequentially next
statement with the branch executing after one instruction delay
– compiler’s job to find a statement that can be put in the slot
that is independent of branch outcome
Program
execution
2 4 6 8 10 12 14
order Time
(in instructions)
Instruction Data
lw $3, 300($0) Reg ALU Reg
2 ns fetch access
2 ns
Delayed branch beq is followed by add that is independent of branch outcome
Control (or Branch) Hazards
• Problem with branches in the pipeline we have so far is that the branch
decision is not made till the MEM stage – so what instructions, if at all,
should we insert into the pipeline following the branch instructions?
Hazard
detection IF.Flush control zeros out the instruction in the IF/ID
unit
M ID/EX
pipeline register (which follows the branch)
u
x
WB
EX/MEM
M
Control u M WB
x MEM/WB
0
IF/ID EX M WB
4 Shift
left 2
M
u
x
Registers =
Instruction Data
PC ALU
memory memory M
u
M x
u
x
Sign
extend
M
u
x
Forwarding
unit
Branch decision is moved from the MEM stage to the ID stage – simplified drawing
not showing enhancements to the forwarding and hazard detection units
and $12, $2, $5 beq $1, $3, 7 sub $10, $4, $8 before<1> before<2>
IF.Flush
Pipelined 72
48 x
M
u
Hazard
detection
Control
unit
M
u
ID/EX
WB
M
EX/MEM
WB
MEM/WB
28 x
Branch
0
IF/ID EX M WB
48 44 72
4
$1
Shift M $4
left 2 u
x
=
Registers
Instruction Data
PC ALU
memory memory M
72 44 $3
u
M $8 x
7 u
x
• Execution Sign
extend
example:
10
Forwarding
Clock cycle 3
unit
40 beq $1, $3, 7 lw $4, 50($7) bubble (nop) beq $1, $3, 7 sub $10, . . . before<1>
Hazard
detection
48 or $13 $2, $6 M
u
unit
ID/EX
… 4
Shift
left 2
M
u
$1
72 lw $4, 50($7) PC
Instruction
Registers
= x
ALU
Data
memory
76 72 memory M
u
M $3 x
u
Sign
Forwarding
Clock cycle 4
unit
Clock 4
Simple Example: Comparing
Performance
• Compare performance for single-cycle, multicycle, and pipelined
datapaths using the gcc instruction mix
– assume 2 ns for memory access, 2 ns for ALU operation, 1 ns for
register read or write
– assume gcc instruction mix 23% loads, 13% stores, 19% branches,
2% jumps, 43% ALU
– for pipelined execution assume
• 50% of the loads are followed immediately by an instruction that uses
the result of the load
• 25% of branches are mispredicted
• branch delay on misprediction is 1 clock cycle
• jumps always incur 1 clock cycle delay so their average time is 2 clock
cycles
Simple Example: Comparing Performance
• Single-cycle (p. 373): average instruction time 8 ns
• Multicycle (p. 397): average instruction time 8.04 ns
• Pipelined:
– loads use 1 cc (clock cycle) when no load-use dependency and 2 cc when
there is dependency – given 50% of loads are followed by dependency the
average cc per load is 1.5
– stores use 1 cc each
– branches use 1 cc when predicted correctly and 2 cc when not – given 25%
misprediction average cc per branch is 1.25
– jumps use 2 cc each
– ALU instructions use 1 cc each
– therefore, average CPI is
1.5 23% + 1 13% + 1.25 19% + 2 2% + 1 43% = 1.18
– therefore, average instruction time is 1.18 2 = 2.36 ns
• 50% of the loads are followed immediately by an instruction that uses the result of
the load
• 25% of branches are mispredicted
• branch delay on misprediction is 1 clock cycle
• jumps always incur 1 clock cycle delay so their average time is 2 clock cycles
Pipelining Advantages
• Higher maximum throughput
• Higher utilization of CPU resources
1.Rewrite the code to include the no-ops that are needed. Do not change the
order of the four statements. Use as few no-ops as possible.
2.Suppose the complier is allowed to change the order of the four statements,
provided it doesn’t change the final answer. Is it possible to reduce the number
of no-ops needed? Why or why not?
Tutorial Question
Draw an execution diagram that shows where forwarding and
stalling would take place, if any.
add $6,$5,$2
lw $7,0($6)
addi $7,$7,10
add $6,$4,$2
sw $7,0($6)
addi $2,$2,4
blt $2,$3,loop
add $6,$5,$2
Summary-Pipeline Hazards
• Structural hazards
– Caused by resource contention
– Two operations require a single piece of hardware e.g. Memory
– Using same resource by two instructions during the same cycle
– Structural hazards can be overcome by adding additional hardware
• Data hazards
– Instruction from one pipeline stage is “dependant” of data computed in previous pipeline stage
– Hardware can detect dependencies between instructions
• Control hazards
– Caused by instructions that change control flow (branches/jumps)
• i.e. delays in changing the flow of control
– Requiring subsequent instruction fetches to be predicted
• Flushed if prediction does not hold (make sure no state change)
– Branch hazards can use dynamic prediction/speculation, branch delay slot
Refer
Patterson Chapter 6: Topics 6.1 to 6.6
End…
Pipelined Datapath with Control II
PCSrc
ID/EX
0
M
u WB
x EX/MEM
1
Control M WB
MEM/WB
EX M WB
IF/ID
Add
Add
4 Add result
RegWrite Shift
Branch
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1
Read
data 1
Read
register 2 Zero
Instruction
Registers Read ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction 16
Control signals
32 6
[15– 0] Sign ALU MemRead
extend control
emanate from Instruction
the control
[20– 16]
0 ALUOp
M
portions of the Instruction
[15– 11]
u
x
pipeline registers 1
RegDst
IF: lw $10, 20($1) ID: before<1> EX: before<2> MEM: before<3> WB: before<4>
Pipelined
IF/ID ID/EX EX/MEM MEM/WB
0
M 00 00
u WB
x
1 000 000 00
Control M WB
0 0 0
0000 00 0
Execution
EX M WB 0
0 0
Add
Add
4 Add result
RegWrite
Branch
and
Shift
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1 Read
Read data 1
register 2 Zero
Instruction
Registers Read ALU ALU
Control
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
[15– 0] Sign ALU MemRead
extend control
Instruction
[20– 16]
Clock cycle 1
0 ALUOp
M
Instruction u
[15– 11] x
IF: sub $11, $2, $3 ID: lw $10, 20($1) EX: before<1> MEM: before<2> WB: before<3>
lw $10, 20($1)
M 11 00
u WB
x
1 lw 010 000 00
Control M WB
RegWrite
Shift Branch
left 2
MemWrite
add $14, $8, $9 1 Read
ALUSrc
MemtoReg
Instruction
register 1
PC Address Read $1
X data 1
Read
register 2 Zero
Instruction
Registers Read $X ALU ALU
memory Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
x
i th instruction before
Instruction
20 [15– 0] Sign 20 ALU MemRead
extend control
lw
Instruction
10 [20– 16] 10
0
Clock cycle 2
ALUOp
M
Instruction u
X [15– 11] X x
1
Clock 2 RegDst
Pipelined
IF/ID ID/EX EX/MEM MEM/WB
0
M 10 11
u WB
x
1 sub 000 010 00
Control M WB
0 0 0
1100 00 0
EX M WB 0
Execution
1 0
Add
Add
4 Add result
RegWrite
Shift Branch
and left 2
MemWrite
ALUSrc
2 Read
MemtoReg
Instruction
PC Address register 1 Read $2 $1
3 Read data 1
register 2 Zero
Instruction
Registers Read $3 ALU ALU
memory 0 Read
Control
Write data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
X [15– 0] Sign X 20 ALU MemRead
extend control
Instruction
X [20– 16] X 10
Clock cycle 3
0 ALUOp
M
Instruction u
11 [15– 11] 11 x
• Instruction
1
Clock 3 RegDst
sequence:
IF: or $13, $6, $7 ID: and $12, $2, $3 EX: sub $11, . . . MEM: lw $10, . . . WB: before<1>
lw $10, 20($1)
u WB
x
1 and 000 000 11
Control M WB
Add
Add result
or $13, $6, $7
RegWrite
Shift Branch
left 2
MemWrite
ALUSrc
MemtoReg
Instruction
register 1
PC Address Read $4 $2
5 data 1
Read
register 2 Zero
Instruction
Registers Read $5 $3 ALU ALU
memory Write 0 Address Read
data 2 result 1
register M data
u Data M
Write x u
memory x
data 1
0
Write
data
Instruction
X [15– 0] Sign X ALU MemRead
extend control
Instruction
X [20– 16] X
0 ALUOp
Clock cycle 4
M 10
Instruction u
12 [15– 11] 12 11 x
1
Clock 4 RegDst
IF: add $14, $8, $9 ID: or $13, $6, $7 EX: and $12, . . . MEM: sub $11, . . . WB: lw $10, . . .
Pipelined
IF/ID ID/EX EX/MEM MEM/WB
0
M 10 10
u WB
x
1 or 000 000 10
Control M WB
1 0 1
1100 10 0
Execution
EX M WB 1
0 0
Add
Add
4 Add result
RegWrite
Branch
and
Shift
left 2
MemWrite
ALUSrc
6 Read
MemtoReg
Instruction
PC Address register 1 Read $6 $4
7 Read data 1
register 2 Zero
Instruction $5
Registers Read $7 ALU ALU
Control
memory 10 Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
X [15– 0] Sign X ALU MemRead
extend control
Instruction
X [20– 16] X
Clock cycle 5
0 ALUOp
M 11 10
Instruction u
13 [15– 11] 13 12 x
Clock 5 1
• Instruction RegDst
sequence: IF: after<1> ID: add $14, $8, $9 EX: or $13, . . . MEM: and $12, . . . WB: sub $11, . . .
lw $10, 20($1) 1
x
add
Control
000
M
000
WB
10
1 0
Add
Add result
or $13, $6, $7
RegWrite
Shift Branch
left 2
MemWrite
ALUSrc
MemtoReg
Instruction
register 1
PC Address Read $8 $6
9 data 1
Read
register 2 Zero
Instruction
Registers Read $9 $7 ALU ALU
memory 11 Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
x
Instruction
X [20– 16] X
Clock cycle 6
0 ALUOp
M 12 11
Instruction u
14 [15– 11] 14 13 x
1
Clock 6 RegDst
IF: after<2> ID: after<1> EX: add $14, . . . MEM: or $13, . . . WB: and $12, . . .
Pipelined
IF/ID ID/EX EX/MEM MEM/WB
0
M 00 10
u WB
x
1 000 000 10
Control M WB
1 0 1
0000 10 0
Execution
EX M WB 0
0 0
Add
Add
4 Add result
RegWrite
Branch
and
Shift
left 2
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1 Read $8
Read data 1
register 2 Zero
Instruction $9
Registers Read ALU ALU
Control
memory 12 Write 0 Read
data 2 result Address 1
register M data
u Data M
Write x memory u
data x
1
0
Write
data
Instruction
[15– 0] Sign ALU MemRead
extend control
Instruction
[20– 16]
Clock cycle 7
0 ALUOp
M 13 12
Instruction u
[15– 11] 14 x
1
Clock 7 RegDst
• Instruction
IF: after<3> ID: after<2> EX: after<1> MEM: add $14, . . . WB: or $13, . . .
sequence:
IF/ID ID/EX EX/MEM MEM/WB
0
M 00 00
u WB
x
1 000 000 10
lw $10, 20($1)
Control M WB
0 0 1
0000 00 0
EX M WB 0
0 0
RegWrite
Shift Branch
left 2
or $13, $6, $7
MemWrite
ALUSrc
Read
MemtoReg
Instruction
PC Address register 1
Read
Instruction
[15– 0] Sign ALU MemRead
extend control
Instruction
[20– 16]
Clock cycle 8
0 ALUOp
M 14 13
Instruction u
[15– 11] x
1
Clock 8 RegDst
Pipelined Execution and Control
• Instruction IF: after<4> ID: after<3> EX: after<2> MEM: after<1> WB: add $14, . . .
sequence:
IF/ID ID/EX EX/MEM MEM/WB
0
M 00 00
u WB
x
1 000 000 00
lw $10, 20($1) Control
0000
M
0
00
WB
0
0
1
Add
Add result
or $13, $6, $7
RegWrite
Shift Branch
left 2
MemWrite
add $14, $8, $9 Read
ALUSrc
MemtoReg
Instruction
Instruction
[15– 0] Sign ALU MemRead
extend control
Instruction
[20– 16]
0 ALUOp
M 14