Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Krste Asanovic
Electrical Engineering and Computer Sciences University of California at Berkeley http://www.eecs.berkeley.edu/~krste http://inst.eecs.berkeley.edu/~cs152
2/3/2009
CS152-Spring09
CPI >1 1 1
An Ideal Pipeline
stage 1 stage 2 stage 3 stage 4
All objects go through the same stages No sharing of resources between any two stages Propagation delay through all pipeline stages is equal The scheduling of an object entering the pipeline is not affected by the objects in other stages These conditions generally hold for industrial assembly lines. But can an instruction pipeline satisfy the last condition?
CS152-Spring09
2/3/2009
Pipelined MIPS To pipeline MIPS: First build MIPS without pipelining with CPI=1 Next, add pipeline registers to reduce cycle time while maintaining CPI=1
2/3/2009
CS152-Spring09
Pipelined Datapath
0x4 Add we rs1 rs2 rd1 ws wd rd2 GPRs Imm Ext
PC
addr rdata
IR
ALU
we addr
Inst. Memory
Data Memory
wdata
rdata
write fetch decode & Reg-fetch execute memory -back phase phase phase phase phase Clock period can be reduced by dividing the execution of an instruction into multiple cycles tC > max {tIM , tRF , tALU , tDM , tRW } ( = tDM probably) However, CPI will increase unless instructions are pipelined
2/3/2009 CS152-Spring09 6
Technology Assumptions
A small amount of very fast memory (caches) backed up by a large, slower memory Fast ALU (at least for integers) Multiported Register files (slower!)
A 5-stage pipelined Harvard architecture will be the focus of our detailed design
2/3/2009
CS152-Spring09
PC
addr rdata
IR
ALU
Inst. Memory
Data Memory
I-Fetch (IF)
Memory (MA)
t6 t7 ....
WB2 MA3 WB3 EX4 MA4 WB4 ID5 EX5 MA5 WB5
8
CS152-Spring09
PC
addr rdata
IR
ALU
Inst. Memory
Data Memory
I-Fetch (IF)
Resources
Memory (MA)
t6 t7 ....
time IF ID EX MA WB
I5 I4 I3
I5 I4
I5
9
2/3/2009
CS152-Spring09
Pipelined Execution:
ALU Instructions
0x4
Add
IR
IR
31
IR
PC
addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
Not quite correct! We need an Instruction Reg (IR) for each stage
2/3/2009 CS152-Spring09 10
M
IR
31
W
IR
MemWrite we addr
WBSrc
PC
addr
inst IR
Inst Memory
GPRs
Imm Ext
rdata
data hazard
Dependence may be for the next instructions address control hazard (branches, exceptions)
2/3/2009
CS152-Spring09
12
Data Hazards
r4 r1
0x4
Add
r1
IR IR
31
IR
PC
addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
... r1 r0 + 10 r4 r1 + 17 ...
2/3/2009 CS152-Spring09
r1 is stale. Oops!
13
CS152 Administrivia
PS 1 due Tuesday Feb 10 in class Section covering PS 1 on Wednesday Feb 11
Room/time TBD
Lecture 7, Tuesday Feb 17 in 320 Soda Lecture 8, Thursday Feb 19 back in 306 Soda See website for full schedule
2/3/2009
CS152-Spring09
14
Strategy 1: Wait for the result to be available by freezing earlier pipeline stages interlocks
2/3/2009
CS152-Spring09
15
FB1
FB2
FB3
FB4
stage 1
stage 2
stage 3
stage 4
Later stages provide dependence information to earlier stages which can stall (or kill) instructions
Controlling a pipeline in this manner works provided the instruction at stage i+1 can complete without any interference from instructions in stages 1 to i (otherwise deadlocks may occur)
2/3/2009 CS152-Spring09 16
0x4
Add
nop
IR
IR
31
IR
PC
addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
... r1 r0 + 10 r4 r1 + 17 ...
2/3/2009
wdata
Data Memory
rdata
wdata
MD1
MD2
CS152-Spring09
17
EX2 MA2 WB2 ID3 EX3 MA3 WB3 IF4 ID4 EX4 MA4 WB4 IF5 ID5 EX5 MA5 WB5
Resource Usage
IF ID EX MA WB
t2 I3 I2 I1
t3 I3 I2 nop I1
t4 I3 I2 nop nop I1
t6 I4 I3 I2 nop nop
t7 I5 I4 I3 I2 nop
.... I5 I4 I3 I2
I5 I4 I3
I5 I4
18
I5
nop
2/3/2009 CS152-Spring09
pipeline bubble
0x4
Add
nop
IR
IR
31
IR
PC
addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
wdata
MD1
MD2
Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions.
2/3/2009 CS152-Spring09 19
ws
nop
IR
IR
PC
addr
inst IR
Inst Memory
Cdest
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
wdata
MD1
MD2
Should we always stall if the rs field matches some rd? not every instruction writes a register we not every instruction reads a register re
2/3/2009 CS152-Spring09
20
immediate26
rd (rs) func (rt) rt (rs) op imm rt M [(rs) + imm] M [(rs) + imm] (rt) cond (rs) true: PC (PC) + imm false: PC (PC) + 4 PC (PC) + imm r31 (PC), PC (PC) + imm PC (rs) r31 (PC), PC (rs)
2/3/2009 CS152-Spring09
destination rd rt rt
31 31
21
Cstall
2/3/2009
stall = ((rsD =wsE).weE + (rsD =wsM).weM + (rsD =wsW).weW) . re1D ((rtD =wsE).weE + (rtD =wsM).weM + (rtD =wsW).weW) . re2D CS152-Spring09
t ry ! n o to is l s s l hi fu T e th
22
0x4
Add
nop
IR
IR
31
IR
PC
addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
wdata
MD1
MD2
However, the hazard is avoided because our memory system completes writes in one cycle ! Load/Store hazards are sometimes resolved in the pipeline and sometimes in the memory system itself. More on this later in the course.
2/3/2009 CS152-Spring09 24
Strategy 2: Route data as soon as possible after it is calculated to the earlier pipeline stage bypass
2/3/2009
CS152-Spring09
25
Bypassing
time (I1) r1 r0 + 10 (I2) r4 r1 + 17 (I3) (I4) (I5) t0 IF1 t1 t2 t3 t4 t5 ID1 EX1 MA1 WB1 IF2 ID2 ID2 ID2 ID2 IF3 IF3 IF3 IF3 stalled stages t6 t7 .... EX2 MA2 WB2 ID3 EX3 MA3 IF4 ID4 EX4 IF5 ID5
Each stall or kill introduces a bubble in the pipeline CPI > 1 A new datapath, i.e., a bypass, can get the data from the output of the ALU to its input
time (I1) r1 r0 + 10 (I2) r4 r1 + 17 (I3) (I4) (I5)
2/3/2009
t0
t1 IF1
t6
t7
....
WB2 MA3 WB3 EX4 MA4 WB4 ID5 EX5 MA5 WB5
26
CS152-Spring09
Adding a Bypass
stall
r4 r1...
0x4
Add
r1 ...
nop
IR
IR
M
31
IR
ASrc
PC addr
inst IR
Inst Memory
A
ALU
we addr
GPRs
Imm Ext
wdata
Data Memory
rdata
wdata
MD1
MD2
(I1) (I2)
2/3/2009
When does this bypass help? ... r1 r0 + 10 r1 M[r0 + 10] JAL 500 r4 r1 + 17 r4 r31 + 17 r4 r1 + 17 yes no no
CS152-Spring09 27
ASrc = (rsD=wsE).weE.re1D
Is this correct?
No because only ALU and ALUi instructions can benefit from this bypass Split weE into two components: we-bypass, we-stall
2/3/2009
CS152-Spring09
28
ASrc stall
= (rsD =wsE).we-bypassE . re1D = ((rsD =wsE).we-stallE + (rsD=wsM).weM + (rsD=wsW).weW). re1D +((rtD = wsE).weE + (rtD = wsM).weM + (rtD = wsW).weW). re2D
2/3/2009
CS152-Spring09
29
0x4
Add
nop
ASrc
we rs1 rs2 rd1 ws wd rd2
IR
IR
M
31
IR
PC
addr
inst IR
A
ALU
we addr
Inst Memory
GPRs
Imm Ext
BSrc
wdata
Data Memory
rdata
2/3/2009
Strategy 3: Speculate on the dependence. Two cases: Guessed correctly do nothing Guessed incorrectly kill and restart
2/3/2009
CS152-Spring09
31
2/3/2009
CS152-Spring09
32
Acknowledgements
These slides contain material developed and copyright by:
Arvind (MIT) Krste Asanovic (MIT/UCB) Joel Emer (Intel/MIT) James Hoe (CMU) John Kubiatowicz (UCB) David Patterson (UCB)
MIT material derived from course 6.823 UCB material derived from course CS252
2/3/2009
CS152-Spring09
33