Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Instruction Fetch
Next PC
Memory Access
MUX
Write Back
Adder
Next SEQ PC
Zero?
RS1
4
Address
PC <= PC + 4
MUX MUX
IR <= mem[PC];
Memory
Reg File
RS2
Inst
ALU
Data Memory
RD
L M D
MUX
Imm
Sign Extend
WB Data
Memory Access
MUX
Write Back
Adder
RS1
4
Address
PC <= PC + 4
Zero?
MUX MUX
MEM/WB
Imm
Sign Extend
A <= Reg[IRrs]; B <= Reg[IRrt] rslt <= A opIRop B WB <= rslt Reg[IRrd] <= WB
RD
RD
RD
WB Data
IR <= mem[PC];
Memory
EX/MEM
Reg File
RS2
ID/EX
IF/ID
ALU
Data Memory
MUX
JSR br
JR jmp
opFetch-DCD
ST RI
r <= A opIRop IRim
RR
PC <= IRjaddr r <= A opIRop B
LD
r <= A + IRim
WB <= r
WB <= r
WB <= Mem[r]
Reg[IRrd] <= WB
Reg[IRrd] <= WB
Reg[IRrd] <= WB
Memory Access
MUX
Write Back
Adder
RS1
4
Address
Zero?
MUX MUX
MEM/WB
Imm
Sign Extend
RD
RD
RD
WB Data
Memory
EX/MEM
Reg File
RS2
ID/EX
IF/ID
ALU
Data Memory
MUX
Visualizing Pipelining
Figure 3.3, Page 133 Time (clock cycles)
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7
ALU
I n s t r. O r d e r
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
CPIpipelined ! Ideal CPI Average Stall cycles per Inst Cycle Timeunpipelined Ideal CPI v Pipeline depth Speedup ! v Ideal CPI Pipeline stall CPI Cycle Timepipelined
Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards: HW cannot support this combination of instructions (single person to fold and put clothes away) Data hazards: Instruction depends on result of prior instruction still in the pipeline (missing sock) Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
ALU
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
Bubble
Bubble Bubble
Bubble
ALU
Bubble
Ifetch
Reg
DMem
Reg
Data Hazard on R1
Figure 3.9, page 147
WB
Reg
I n s t r. O r d e r
Ifetch
Reg
ALU
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
r8,r1,r9
Ifetch
Reg
DMem
Reg
xor r10,r1,r11
ALU
Ifetch
Reg
DMem
Reg
11
12
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
xor r10,r1,r11
Ifetch
Reg
DMem
Reg
15
mux
Immediate
Registers
MEM/WR
EX/MEM
ALU
ID/EX
Data Memory
mux
mux
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
17
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
Bubble
ALU
DMem
Reg
Ifetch
Bubble
ALU
Reg
DMem
Reg
Bubble
Ifetch
Reg
ALU
DMem
18
Control Hazards
Branch instructions can cause great performance loss Branch instructions need two things:
ID
EX
MEM
IF/ID
ID/EX
EX/MEM
Extend
A d d
Zero
m u x
1
PC = 1000
Registers
Reg_dst Data_in Rt
m u x m u x m u x
Instruction Memory
Op
A L U
ALU result
Writeback data
W M
Main Control
W M E
1004
PC = 1004
m u x
1
Imm1 6 Rs Rt
100
Extend
A d d
Zero
Address Instruction
$3
Registers
Reg_dst Data_in
m u x m u x m u x
$1
Instruction Memory
beq
A L U
ALU result
Writeback data
W M
Main Control
W M E
1008
1004
Extend
100
PC = 1008
Address Instruction Rt
1234 1234
m u x
1
Imm1 6 Rs
A d d
Zero
Registers
Reg_dst Data_in
m u x m u x m u x
Instruction Memory
A L U
ALU result
Writeback data W M E
W M
Main Control
Beq = 1
next_2
ID/EX
next_1
1012
1008
PC = 1012
m u x
1
Rs Address Instruction Rt
Registers
Reg_dst Data_in
m u x m u x m u x
Instruction Memory
A L U
1 0
W M
Imm1 6
Extend
A d d
1404
Zero = 1 ALU result Beq = 1
Writeback data
Main Control
W M E
next_3
next_2
next_1
IF/ID
ID/EX
EX/MEM
1016
1012
PC = 1404
m u x
1
Extend
A d d
Zero
Registers
Reg_dst Data_in
m u x m u x m u x
Instruction Memory
A L U
ALU result
Writeback data
W M
Main Control
W M E
beq $1,$3,100
IM
Re g IM
ALU Re g IM
D M
Bubbl e
Re g
Bubbl e Bubbl e Bubbl e Bubbl e Bubbl e Bubbl e Bubbl e
Re g IM
Re g IM
Re g
ALU
Branches can be determined earlier in the ID stage Branch address calculation adder is moved to ID stage A comparator in the ID stage to compare the two fetched registers
To determine branch decision, whether the branch is taken or not Only one instruction that follows the branch will
PC
Instruction memory
=
M u x Sign extend
ALU
Data memory
M u x
M u x Forwarding unit
Fetch successor instruction: PC+4 already calculated Almost half of MIPS branches are not taken on average Flush instructions in pipeline only if branch is actually taken
Predict Branch Taken
Delayed Branch
Define branch to take place after the next instruction For a 1-cycle branch delay, we have one delay slot
branch instruction branch delay slot next instruction ... branch target if branch taken
branch instruction (taken) branch delay slot (next instruction) branch target
IF
ID IF
Delay Slot
sub $t4,$t5,$t6
From Before
From Target
Zero-Delayed Branch
How can we achieve zero-delay for a taken branch
Check the PC to see if the instruction being fetched is a branch Store the branch target address in a table in the IF stage Such a table is called the branch target buffer If branch is predicted taken then
PC
Prediction Buffer
To store the prediction bits for branch Branch Target Buffer instructions PC of Branch Target Address +4 The prediction bits are dynamically mux Lookup determined by the hardware
Not Taken
Taken
Not Taken
Not Taken
Ifetc h
ALU
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
37
38
Memory Access
Write Back
MUX
Adder
Adder
4
Address
Zero?
RS1
MEM/WB
Imm
Sign Extend
RD
RD
RD
WB Data
Memory
EX/MEM
RS2
Reg File
ID/EX
ALU
IF/ID
Data Memory
MUX
MUX
40
1 slot delay allows proper decision and branch target address in 5 stage pipeline DLX uses this
41
Delayed Branch
Where to get instructions to fill branch delay slot?
Before branch instruction From the target address: only valuable when branch taken From fall through: only valuable when branch not taken Canceling branches allow more slots to be filled
Delayed Branch downside: 7-8 stage pipelines, multiple instructions issued per clock (superscalar)
42
Scheduling Branch scheme penalty Stall pipeline 3 Predict taken 1 Predict not taken 1 Delayed branch 0.5