Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Scheduling
8/2/2017 GGITM-CSE 1
Dynamic Scheduling
8/2/2017 GGITM-CSE 2
The idea:
Dynamic Scheduling
HW Schemes: Instruction Parallelism
Why in HW at run time?
Works when cant know real dependence at compile time
Compiler simpler
Code for one machine runs well on another
Key Idea: Allow instructions behind stall to proceed.
Key Idea: Instructions executing in parallel. There are multiple
execution units, so use them.
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Enables out-of-order execution => out-of-order completion
8/2/2017 GGITM-CSE 3
The idea:
Dynamic Scheduling
HW Schemes: Instruction Parallelism
Out-of-order execution divides ID stage:
1. Issuedecode instructions, check for structural hazards
2. Read operandswait until no data hazards, then read operands
Scoreboards allow instruction to execute whenever 1 & 2 hold, not
waiting for prior instructions.
A scoreboard is a data structure that provides the information
necessary for all pieces of the processor to work together.
We will use In order issue, out of order execution, out of order
commit ( also called completion)
First used in CDC6600. Our example modified here for DLX.
CDC had 4 FP units, 5 memory reference units, 7 integer units.
DLX has 2 FP multiply, 1 FP adder, 1 FP divider, 1 integer.
8/2/2017 GGITM-CSE 4
Using A Scoreboard
Dynamic Scheduling
Scoreboard Implications
Out-of-order completion => WAR, WAW hazards?
Solutions for WAR
Queue both the operation and copies of its operands
Read registers only during Read Operands stage
For WAW, must detect hazard: stall until other completes
Need to have multiple instructions in execution phase => multiple
execution units or pipelined execution units
Scoreboard keeps track of dependencies, state or operations
Scoreboard replaces ID, EX, WB with 4 stages
8/2/2017 GGITM-CSE 5
Using A Scoreboard
Dynamic Scheduling
8/2/2017 GGITM-CSE 6
Using A Scoreboard
Dynamic Scheduling
8/2/2017 GGITM-CSE 7
Using A Scoreboard
Dynamic Scheduling
Four Stages of Scoreboard Control
3. Execution operate on operands (EX)
The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the
scoreboard that it has completed execution.
8/2/2017 GGITM-CSE 8
Using A Scoreboard
Dynamic Scheduling
Three Parts of the Scoreboard
8/2/2017 GGITM-CSE 9
Using A Scoreboard
Dynamic Scheduling
Detailed Scoreboard Pipeline Control
Instruction Bookkeeping
Wait until
status
Busy(FU) yes; Op(FU) op;
Fi(FU) `D; Fj(FU) `S1;
Not busy (FU) Fk(FU) `S2; Qj Result(S1);
Issue
and not result(D) Qk Result(`S2); Rj not Qj;
Rk not Qk; Result(D) FU;
Read Rj No; Rk No
Rj and Rk
operands
Execution Functional unit
complete done
8/2/2017 GGITM-CSE 10
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example
This is the sample code well be working with in the example:
LD F6, 34(R2)
LD F2, 45(R3)
MULT F0, F2, F4
SUBD F8, F6, F2
DIVD F10, F0, F6
ADDD F6, F8, F2
8/2/2017 GGITM-CSE 11
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example
8/2/2017 GGITM-CSE 22
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 11
Instruction status Read Execution
W rite ADDD cant start because
Instruction j k Issue operandscompleteResult add unit is busy.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 23
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 12
8/2/2017 GGITM-CSE 33
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 22
8/2/2017 GGITM-CSE 37
Using A Scoreboard
Dynamic Scheduling
Tomasulo Algorithm vs. Scoreboard
Control & buffers distributed with Function Units (FU) vs.
centralized in scoreboard;
FU buffers called reservation stations; have pending operands
Registers in instructions replaced by values or pointers to
reservation stations(RS); called register renaming ;
avoids WAR, WAW hazards
More reservation stations than registers, so can do optimizations
compilers cant
Results to FU from RS, not through registers, over Common
Data Bus that broadcasts results to all FUs
Load and Stores treated as FUs with RSs as well
Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue
8/2/2017 GGITM-CSE 38
Dynamic Scheduling Using A Scoreboard
Tomasulo Organization
FP Op Queue FP
Registers
Load
Buffer
Store
Common Buffer
Data
Bus
FP Add FP Mul
Res. Res.
Station Station
8/2/2017 GGITM-CSE 39
Using A Scoreboard
Dynamic Scheduling
Reservation Station Components
OpOperation to perform in the unit (e.g., + or )
Vj, VkValue of Source operands
Store buffers have V field, result to be stored
Qj, QkReservation stations producing source registers (value to be
written)
Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready
Store buffers only have Qi for RS producing result
BusyIndicates reservation station or FU is busy
8/2/2017 GGITM-CSE 40
Using A Scoreboard
Dynamic Scheduling
8/2/2017 GGITM-CSE 42
Using A Scoreboard
Dynamic Scheduling
Review: Tomasulo
8/2/2017 GGITM-CSE 43
Dynamic Hardware
Prediction
8/2/2017 GGITM-CSE 44
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Dynamic Branch Prediction
Performance = (accuracy, cost of misprediction)
Branch History Lower bits of PC address index table of 1-bit values
Says whether or not branch taken last time
Problem: in a loop, 1-bit BHT will cause two mis-predictions:
End of loop case, when it exits instead of looping as before
First time through loop on next time through code, when it predicts exit instead
of looping
P
Address 0 r
e
d
31 1 Bits 13 - 2 i
c
t
1023 i
o
n
8/2/2017 GGITM-CSE 45
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Dynamic Branch Prediction
T
NT
Predict Taken Predict Taken
T
T NT
NT
Predict Not Predict Not
Taken T Taken
NT
8/2/2017 GGITM-CSE 46
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
BHT Accuracy
8/2/2017 GGITM-CSE 47
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Correlating Branches
Idea: taken/not taken of Branch address
recently executed branches is
related to behavior of next 2-bits per branch predictors
branch (as well as the history
of that branch behavior)
Then behavior of recent
branches selects between, say, Prediction
four predictions of next branch,
updating just that prediction
14%
Frequency of Mispredictions
12% 11%
10%
8%
6% 6% 6%
6% 5% 5%
4%
4%
2% 1% 1%
0% 0%
0%
doducd
gcc
nasa7
eqntott
espresso
spice
fpppp
tomcatv
li
matrix300
8/2/2017
4,096 entries: 2-bits per entry GGITM-CSE
Unlimited entries: 2-bits/entry 1,024 entries (2,2) 49
Dynamic Hardware Basic Branch Prediction:
Branch Target Buffers
Prediction
Branch Target Buffer
Branch Target Buffer (BTB): Use address of branch as index to get prediction AND
branch address (if taken)
Note: must check for branch match now, since cant use wrong branch address (Figure 4.22, p.
273)
8/2/2017 GGITM-CSE 51
References
www.csd.uoc.gr/~hy590-
25/Chapter04-Pipelining2.ppt
Hennesy & Patterson
8/2/2017 GGITM-CSE 52