Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Basic Pipelining
Cycle 2
Clk
lw
sw
multicycle clock
slower than 1/5th of
single cycle clock
due to stage register
overhead
Waste
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8 Cycle 9Cycle 10
lw
IFetch
Dec
Exec
Mem
WB
sw
IFetch
Dec
Exec
Mem
R-type
IFetch
Faster line movement yields more cars per hour off the
line
Key idea
Minimize delays
instruction latency (execution time, delay time, response time time from the start of an instruction to its completion) is not
reduced
Cycle 1 Cycle 2
lw
sw
R-type
IFetch
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
Cycle 2
Clk
lw
sw
Waste
Dec
Exec
Mem
WB
sw
IFetch
Dec
Pipeline Implementation:
lw
IFetch
sw
Dec
Exec
Mem
WB
IFetch
Dec
Exec
Mem
WB
Dec
Exec
Mem
R-type IFetch
WB
Exec
Mem
R-type
IFetch
IF:IFetch
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
ID:Dec
EX:Execute
MEM:
MemAccess
WB:
WriteBack
Add
Shift
left 2
Read Addr 2Data 1
File
Write Addr
Write Data
16
System Clock
Sign
Extend
Read
Data 2
32
ALU
Exec/Mem
Register Read
Dec/Exec
Read
Address
Read Addr 1
IFetch/Dec
PC
Instruction
Memory
Add
Data
Memory
Address
Write Data
Read
Data
Mem/WB
Pipeline registers
Reg
ALU
IM
DM
Reg
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 3
DM
ALU
Inst 2
Reg
ALU
Inst 1
IM
ALU
O
r
d
e
r
Inst 0
ALU
I
n
s
t
r.
Once the
pipeline is full,
one instruction
is completed
every cycle, so
CPI = 1
Inst 4
Time to fill the pipeline
Reg
Reg
Reg
Reg
DM
Reg
Inst 4
Mem
Reg
Reg
Mem
Reg
Reg
Mem
Reg
Reg
ALU
Inst 3
Reg
ALU
Inst 2
Mem
Reg
ALU
Inst 1
Reg
Mem
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Mem
Mem
Mem
Mem
Reading instruction
from memory
Mem
Reg
Fix with separate instr and data memories (I$ and D$)
Inst 2
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 1
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
add $2,$1,
Reg
Reg
DM
Reg
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
$8,$1,$9
IM
ALU
or
DM
ALU
and $6,$1,$7
Reg
ALU
sub $4,$1,$5
IM
ALU
add $1,
xor $4,$1,$5
Reg
Reg
Reg
Reg
DM
Reg
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
DM
IM
Reg
ALU
sub $4,$1,$5
IM
ALU
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
and $6,$1,$7
or
$8,$1,$9
xor $4,$1,$5
Reg
Reg
Reg
Reg
DM
Reg
Reg
DM
Reg
IM
Reg
DM
IM
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
stall
stall
sub $4,$1,$5
and $6,$1,$7
Reg
DM
Reg
or
$8,$1,$9
xor $4,$1,$5
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$1,$7
DM
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Reg
Reg
Reg
Reg
DM
Reg
or
$8,$1,$9
xor $4,$1,$5
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$1,$7
DM
ALU
sub $4,$1,$5
Reg
ALU
$1,4($2) IM
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
Reg
Reg
Reg
DM
Reg
Inst 4
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
Inst 3
Reg
ALU
lw
IM
ALU
O
r
d
e
r
beq
ALU
I
n
s
t
r.
Reg
Reg
Reg
DM
Reg
O
r
d
e
r
stall
IM
Reg
ALU
I
n
s
t
r.
DM
Fix branch
hazard by
waiting
stall but
affects CPI
Reg
stall
stall
Reg
DM
IM
Reg
ALU
Inst 3
IM
ALU
lw
Reg
DM
IF/ID
ID/EX
EX/MEM
Add
4
Shift
left 2
PC
Instruction
Memory
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
MEM/WB
ALU
Address
Write Data
Read
Data
Control
Add
4
Shift
left 2
PC
Instruction
Memory
Read
Address
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
MEM/WB
Add
ALU
Address
Write Data
Read
Data
Reg
ALU
IM
DM
Reg
let data memory access take two cycles (and keep the same
clock rate)
IM
Reg
ALU
DM1
DM2
Reg
ARM7
IM
Reg
PC update
IM access
XScale
IM
IM1
PC update
BTB access
start IM access
Reg
IM2
ALU op
DM access
shift/rotate
commit result
(write back)
DM
Reg
Reg
SHFT
ALU
StrongARM-1
decode
reg
access
ALU
EX
decode
reg 1 access
IM access
DM1
DM write
reg write
start DM access
exception
ALU op
shift/rotate
reg 2 access
Reg
DM2
Summary
Stalling negatively affects CPI (makes CPI less than the ideal
of 1)
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
MEM/WB
Branch
ALUSrc
ALU
Read
Data 2
Address
Read
Data
Write Data
ALU
cntrl
32
ALUOp
RegDst
MemWrite MemRead
MemtoReg
Control Settings
EX Stage
Reg
Dst
MEM Stage
WB Stage
R
lw
sw
beq
DM
Reg
IM
Reg
DM
IM
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
stall
stall
sub $4,$1,$5
and $6,$7,$1
Reg
DM
Reg
or
$8,$1,$1
sw
$4,4($1)
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
and $6,$7,$1
DM
ALU
sub $4,$1,$5
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
Reg
Reg
Reg
DM
Reg
Take the result from the earliest point that it exists in any of
the pipeline state registers and forward it to the functional
units (e.g., the ALU) that need it that cycle
For ALU functional unit: the inputs can come from any
pipeline register rather than just from ID/EX by
EX/MEM hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardB = 10
2.
!= 0)
= ID/EX.RegisterRs))
!= 0)
= ID/EX.RegisterRt))
Forwards the
result from the
previous instr.
to either input
of the ALU
MEM/WB hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd
and (MEM/WB.RegisterRd
ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd
and (MEM/WB.RegisterRd
ForwardB = 01
!= 0)
= ID/EX.RegisterRs))
!= 0)
= ID/EX.RegisterRt))
Forwards the
result from the
second
previous instr.
to either input
of the ALU
Forwarding Illustration
sub $4,$1,$5
and $6,$7,$1
Reg
DM
IM
Reg
DM
IM
Reg
ALU
IM
ALU
O
r
d
e
r
add $1,
ALU
I
n
s
t
r.
EX/MEM hazard
forwarding
Reg
Reg
DM
Reg
MEM/WB hazard
forwarding
add $1,$1,$4
Reg
DM
IM
Reg
DM
IM
Reg
ALU
add $1,$1,$3
IM
ALU
O
r
d
e
r
add $1,$1,$2
ALU
I
n
s
t
r.
Reg
Reg
DM
Reg
MEM/WB hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRs)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd != 0)
and (EX/MEM.RegisterRd != ID/EX.RegisterRt)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01
ID/EX
EX/MEM
Control
IF/ID
Add
4
Shift
left 2
PC
Instruction
Memory
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
MEM/WB
Branch
ALU
Read
Data 2
Address
Read
Data
Write Data
ALU
cntrl
32
EX/MEM.RegisterRd
ID/EX.RegisterRt
ID/EX.RegisterRs
Forward
Unit
MEM/WB.RegisterRd
Memory-to-Memory Copies
For loads immediately followed by stores (memory-tomemory copies) can avoid a stall by adding forwarding
hardware from the MEM/WB register to the data memory
input.
sw $1,4($3)
IM
Reg
DM
IM
Reg
ALU
O
r
d
e
r
lw $1,4($2)
ALU
I
n
s
t
r.
Reg
DM
Reg
or
xor $8,$1,$9
$4,$1,$5
xor $4,$1,$5
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
DM
IM
Reg
ALU
or $6,$1,$7
$8,$1,$9
and
IM
ALU
and $4,$1,$5
$6,$1,$7
sub
DM
ALU
stall $4,$1,$5
sub
Reg
ALU
IM
$1,4($2)
ALU
O
r
d
e
r
lw
ALU
I
n
s
t
r.
Reg
Reg
Reg
Reg
Reg
DM
After this one cycle stall, the forwarding logic can handle
the remaining data hazards
Stall Hardware
Set the control bits in the EX, MEM, and WB control fields of the
ID/EX pipeline register to 0 (noop). The Hazard Unit controls the
mux that chooses between the real control values and the 0s.
Hazard
Unit
EX/MEM
Control 0
Shift
left 2
Instruction
Memory
PC
ID/EX.MemRead
IF/ID
Add
ID/EX
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
ID/EX.RegisterRt
MEM/WB
Branch
ALU
Address
Read
Data
Write Data
ALU
cntrl
Forward
Unit
On control hazards
te
Wri
.
D
I
IF/
rite
PC.W
Hazard
Unit
EX/MEM
Control 0
Shift
left 2
Instruction
Memory
PC
ID/EX.MemRead
IF/ID
Add
ID/EX
Read
Address
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
ID/EX.RegisterRt
MEM/WB
Branch
ALU
Address
Read
Data
Write Data
ALU
cntrl
Forward
Unit
Control Hazards
Exceptions
Possible approaches
PCSrc
ID/EX
Shift
left 2
IF/ID
EX/MEM
Control
Add
PC+4[31-28]
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
MEM/WB
Branch
ALU
Address
Read
Data
Write Data
ALU
cntrl
Forward
Unit
j target
IM
Reg
DM
IM
Reg
ALU
O
r
d
e
r
flush
DM
ALU
Reg
ALU
I
n
s
t
r.
IM
Fix jump
hazard by
waiting
stall but
affects CPI
Reg
Reg
DM
Reg
PCSrc
ID/EX
Shift
left 2
IF/ID
EX/MEM
Control
Add
PC+4[31-28]
PC
Instruction
Memory
Read
Address
Shift
left 2
Add
Read Addr 1
Data
Memory
Register Read
File
Write Addr
Write Data
16
Sign
Extend
Read
Data 2
32
MEM/WB
Branch
ALU
Address
Read
Data
Write Data
ALU
cntrl
Forward
Unit
O
r
d
e
r
flush
IM
Reg
ALU
I
n
s
t
r.
DM
Reg
Fix branch
hazard by
waiting
stall but
affects CPI
flush
flush
IM
Reg
ALU
beq target
DM
Reg
MEM/WB forwarding
is taken care of by the
normal RegFile write
before read operation
WB
MEM
EX
ID
IF
add3
$1,
add2
$3,
add1
$4,
beq
$1,$2,Loop
next_seq_instr
WB
MEM
EX
ID
IF
add3
$3,
add2
$1,
add1
$4,
beq
$1,$2,Loop
next_seq_instr
if (IDcontrol.Branch
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardC = 1
if (IDcontrol.Branch
and (EX/MEM.RegisterRd
and (EX/MEM.RegisterRd
ForwardD = 1
Forwards the
result from the
second
previous instr.
!= 0)
= IF/ID.RegisterRt)) to either input
of the compare
!= 0)
= IF/ID.RegisterRs))
Bounce the beq (in ID) and next_seq_instr (in IF) in place
(ID Hazard Unit deasserts PC.Write and IF/ID.Write)
Insert a stall between the add in the EX stage and the beq in
the ID stage by zeroing the control bits going into the ID/EX
pipeline register (done by the ID Hazard Unit)
Supporting
ID Stage Branches
Branch
PCSrc
IF/ID
Add
PC
Instruction
Memory
Read
Address
IF.Flush
ID/EX
Control
Shift
left 2
Add
Read Addr 1
RegFile
Read Addr 2
Read Data 1
Write Addr
ReadData 2
Write Data
16
EX/MEM
Sign
Extend 32
MEM/WB
Compare
Hazard
Unit
Data
Memory
ALU
Read Data
Address
Write Data
ALU
cntrl
Forward
Unit
Forward
Unit
Delayed Decision
becomes
if $2=0 then
add $1,$2,$3
add $1,$2,$3
if $1=0 then
sub $4,$5,$6
sub $4,$5,$6
1.
Reg
DM
IM
Reg
IM
Reg
Reg
Reg
DM
ALU
20 or r8,$1,$9
IM
ALU
16 and $6,$1,$7
O
r
d
e
r
DM
ALU
8 flush
sub $4,$1,$5
Reg
ALU
4 beq $1,$2,2
I
n
s
t
r.
IM
Reg
DM
Reg
Branching Structures
2.
3.
PC
Instruction
Memory
0
Read
Address
1.
2.
3.
2-bit Predictors
right 9 times
wrong on loop
Taken
fall out
1 Predict 11
Taken
Taken
Not taken
Taken
Predict 1
10 Taken
right on 1st
iteration
0 Predict 01
Not Taken
Not taken
Not taken
00Predict 0
Not Taken
Taken
Not taken
BHT also
stores the
initial FSM
state
ALU
IM
DM
Reg
Stage(s)?
Synchronous?
Arithmetic overflow
EX
yes
Undefined instruction
ID
yes
IF, MEM
yes
any
no
Hardware malfunction
any
no
Reg
ALU
Inst 0
I
n
s
t
r.
DM
Reg
D$ page fault
IM
Reg
ALU
Inst 1
DM
Reg
arithmetic overflow
IM
Reg
ALU
Inst 2
O
r
d
e
r
DM
Reg
Inst 4
IM
I$ page fault
Reg
DM
IM
Reg
ALU
Inst 3
ALU
undefined instruction
Reg
DM
Reg
Summar
All modern
y day processors use pipelining for