Sei sulla pagina 1di 52

Dynamic Pipeline

Scheduling

8/2/2017 GGITM-CSE 1
Dynamic Scheduling

Dynamic Scheduling is when the hardware rearranges the order


of instruction execution to reduce stalls.
Advantages:
Dependencies unknown at compile time can be handled by
the hardware.
Code compiled for one type of pipeline can be efficiently run
on another.
Disadvantages:
Hardware much more complex.

8/2/2017 GGITM-CSE 2
The idea:
Dynamic Scheduling
HW Schemes: Instruction Parallelism
Why in HW at run time?
Works when cant know real dependence at compile time
Compiler simpler
Code for one machine runs well on another
Key Idea: Allow instructions behind stall to proceed.
Key Idea: Instructions executing in parallel. There are multiple
execution units, so use them.

DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
Enables out-of-order execution => out-of-order completion

8/2/2017 GGITM-CSE 3
The idea:
Dynamic Scheduling
HW Schemes: Instruction Parallelism
Out-of-order execution divides ID stage:
1. Issuedecode instructions, check for structural hazards
2. Read operandswait until no data hazards, then read operands
Scoreboards allow instruction to execute whenever 1 & 2 hold, not
waiting for prior instructions.
A scoreboard is a data structure that provides the information
necessary for all pieces of the processor to work together.
We will use In order issue, out of order execution, out of order
commit ( also called completion)
First used in CDC6600. Our example modified here for DLX.
CDC had 4 FP units, 5 memory reference units, 7 integer units.
DLX has 2 FP multiply, 1 FP adder, 1 FP divider, 1 integer.

8/2/2017 GGITM-CSE 4
Using A Scoreboard
Dynamic Scheduling

Scoreboard Implications
Out-of-order completion => WAR, WAW hazards?
Solutions for WAR
Queue both the operation and copies of its operands
Read registers only during Read Operands stage
For WAW, must detect hazard: stall until other completes
Need to have multiple instructions in execution phase => multiple
execution units or pipelined execution units
Scoreboard keeps track of dependencies, state or operations
Scoreboard replaces ID, EX, WB with 4 stages

8/2/2017 GGITM-CSE 5
Using A Scoreboard
Dynamic Scheduling

Four Stages of Scoreboard Control


1. Issue decode instructions & check for structural hazards (ID1)
If a functional unit for the instruction is free and no other active
instruction has the same destination register (WAW), the
scoreboard issues the instruction to the functional unit and
updates its internal data structure.
If a structural or WAW hazard exists, then the instruction issue
stalls, and no further instructions will issue until these hazards
are cleared.

8/2/2017 GGITM-CSE 6
Using A Scoreboard
Dynamic Scheduling

Four Stages of Scoreboard Control


2. Read operands wait until no data hazards, then read
operands (ID2)

A source operand is available if no earlier issued active


instruction is going to write it, or if the register containing
the operand is being written by a currently active
functional unit.
When the source operands are available, the scoreboard tells
the functional unit to proceed to read the operands from
the registers and begin execution. The scoreboard
resolves RAW hazards dynamically in this step, and
instructions may be sent into execution out of order.

8/2/2017 GGITM-CSE 7
Using A Scoreboard
Dynamic Scheduling
Four Stages of Scoreboard Control
3. Execution operate on operands (EX)
The functional unit begins execution upon receiving
operands. When the result is ready, it notifies the
scoreboard that it has completed execution.

4. Write result finish execution (WB)


Once the scoreboard is aware that the functional unit has
completed execution, the scoreboard checks for WAR
hazards. If none, it writes results. If WAR, then it stalls the
instruction.
Example:
DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F8,F8,F14
Scoreboard would stall SUBD until ADDD reads operands

8/2/2017 GGITM-CSE 8
Using A Scoreboard
Dynamic Scheduling
Three Parts of the Scoreboard

1. Instruction statuswhich of 4 steps the instruction is in

2. Functional unit statusIndicates the state of the functional unit (FU). 9


fields for each functional unit
BusyIndicates whether the unit is busy or not
OpOperation to perform in the unit (e.g., + or )
FiDestination register
Fj, FkSource-register numbers
Qj, QkFunctional units producing source registers Fj, Fk
Rj, RkFlags indicating when Fj, Fk are ready

3. Register result statusIndicates which functional unit will write each


register, if one exists. Blank when no pending instructions will write that
register

8/2/2017 GGITM-CSE 9
Using A Scoreboard
Dynamic Scheduling
Detailed Scoreboard Pipeline Control
Instruction Bookkeeping
Wait until
status
Busy(FU) yes; Op(FU) op;
Fi(FU) `D; Fj(FU) `S1;
Not busy (FU) Fk(FU) `S2; Qj Result(S1);
Issue
and not result(D) Qk Result(`S2); Rj not Qj;
Rk not Qk; Result(D) FU;
Read Rj No; Rk No
Rj and Rk
operands
Execution Functional unit
complete done

f((Fj( f )Fi(FU) f(if Qj(f)=FU then Rj(f) Yes);


or Rj( f )=No) & f(if Qk(f)=FU then Rj(f) Yes);
Write result
(Fk( f ) Fi(FU) or Result(Fi(FU)) 0; Busy(FU) No
Rk( f )=No))

8/2/2017 GGITM-CSE 10
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example
This is the sample code well be working with in the example:

LD F6, 34(R2)
LD F2, 45(R3)
MULT F0, F2, F4
SUBD F8, F6, F2
DIVD F10, F0, F6
ADDD F6, F8, F2

What are the hazards in this code?


Latencies (clock cycles):
LD 1
MULT 10
SUBD 2
DIVD 40
ADDD 2

8/2/2017 GGITM-CSE 11
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example

Instruction status Read Execution Write


Instruction j k Issue operandscompleteResult
LD F6 34+ R2
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU
8/2/2017 GGITM-CSE 12
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 1
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
Issue LD #1
LD F6 34+ R2 1
LD F2 45+ R3
MULTDF0 F2 F4 Shows in which cycle
SUBD F8 F6 F2 the operation occurred.
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer
8/2/2017 GGITM-CSE 13
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 2
Instruction status Read Execution
W rite LD #2 cant issue since
Instruction j k Issue operandscompleteResult integer unit is busy.
LD F6 34+ R2 1 2 MULT cant issue because
LD F2 45+ R3
MULTDF0 F2 F4
we require in-order issue.
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer
8/2/2017 GGITM-CSE 14
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 3
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer
8/2/2017 GGITM-CSE 15
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 4
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer
8/2/2017 GGITM-CSE 16
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 5
Instruction status Read Execution
W rite Issue LD #2 since integer
Instruction j k Issue operandscompleteResult unit is now free.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer
8/2/2017 GGITM-CSE 17
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 6
Instruction status Read Execution
W rite Issue MULT.
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULTDF0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer
8/2/2017 GGITM-CSE 18
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 7
Instruction status Read Execution
W rite MULT cant read its
Instruction j k Issue operandscompleteResult operands (F2) because LD
LD F6 34+ R2 1 2 3 4 #2 hasnt finished.
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add
8/2/2017 GGITM-CSE 19
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 8a

Instruction status Read Execution


W rite
DIVD issues.
Instruction j k Issue operandscompleteResult MULT and SUBD both
LD F6 34+ R2 1 2 3 4 waiting for F2.
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide
8/2/2017 GGITM-CSE 20
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 8b

Instruction status Read Execution


W rite LD #2 writes F2.
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 21
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 9
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4 Now MULT and SUBD can
LD F2 45+ R3 5 6 7 8 both read F2.
MULTDF0 F2 F4 6 9 How can both instructions
SUBD F8 F6 F2 7 9 do this at the same time??
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide

8/2/2017 GGITM-CSE 22
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 11
Instruction status Read Execution
W rite ADDD cant start because
Instruction j k Issue operandscompleteResult add unit is busy.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 23
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 12

Instruction status Read Execution


W rite
SUBD finishes.
Instruction j k Issue operandscompleteResult DIVD waiting for F0.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide
8/2/2017 GGITM-CSE 24
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 13

Instruction status Read Execution


W rite ADDD issues.
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 25
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 14
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 26
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 15

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
1 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 27
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 16

Instruction status Read Execution


W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 28
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 17

Instruction status Read Execution


W rite ADDD cant write because
Instruction j k Issue operandscompleteResult of DIVD. RAW!
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 29
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 18
Instruction status Read Execution
W rite Nothing Happens!!
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 30
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 19

Instruction status Read Execution


W rite MULT completes execution.
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide
8/2/2017 GGITM-CSE 31
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 20

Instruction status Read Execution


W rite MULT writes.
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide
8/2/2017 GGITM-CSE 32
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 21
Instruction status Read Execution
W rite DIVD loads operands
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide

8/2/2017 GGITM-CSE 33
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 22

Instruction status Read Execution


W rite Now ADDD can write since
Instruction j k Issue operandscompleteResult WAR removed.
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
40 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide
8/2/2017 GGITM-CSE 34
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 61
Instruction status Read Execution
W rite DIVD completes execution
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide
8/2/2017 GGITM-CSE 35
Using A Scoreboard
Dynamic Scheduling
Scoreboard Example Cycle 62
Instruction status Read Execution
W rite DONE!!
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU
8/2/2017 GGITM-CSE 36
Using A Scoreboard
Dynamic Scheduling
Another Dynamic Algorithm:
Tomasulo Algorithm
For IBM 360/91 about 3 years after CDC 6600 (1966)
Goal: High Performance without special compilers
Differences between IBM 360 & CDC 6600 ISA
IBM has only 2 register specifiers / instruction vs. 3 in CDC 6600
IBM has 4 FP registers vs. 8 in CDC 6600
Why Study? lead to Alpha 21264, HP 8000, MIPS 10000, Pentium II,
PowerPC 604,

8/2/2017 GGITM-CSE 37
Using A Scoreboard
Dynamic Scheduling
Tomasulo Algorithm vs. Scoreboard
Control & buffers distributed with Function Units (FU) vs.
centralized in scoreboard;
FU buffers called reservation stations; have pending operands
Registers in instructions replaced by values or pointers to
reservation stations(RS); called register renaming ;
avoids WAR, WAW hazards
More reservation stations than registers, so can do optimizations
compilers cant
Results to FU from RS, not through registers, over Common
Data Bus that broadcasts results to all FUs
Load and Stores treated as FUs with RSs as well
Integer instructions can go past branches, allowing
FP ops beyond basic block in FP queue

8/2/2017 GGITM-CSE 38
Dynamic Scheduling Using A Scoreboard
Tomasulo Organization
FP Op Queue FP
Registers
Load
Buffer

Store
Common Buffer
Data
Bus
FP Add FP Mul
Res. Res.
Station Station

8/2/2017 GGITM-CSE 39
Using A Scoreboard
Dynamic Scheduling
Reservation Station Components
OpOperation to perform in the unit (e.g., + or )
Vj, VkValue of Source operands
Store buffers have V field, result to be stored
Qj, QkReservation stations producing source registers (value to be
written)
Note: No ready flags as in Scoreboard; Qj,Qk=0 => ready
Store buffers only have Qi for RS producing result
BusyIndicates reservation station or FU is busy

Register result statusIndicates which functional unit will write each


register, if one exists. Blank when no pending instructions that will
write that register.

8/2/2017 GGITM-CSE 40
Using A Scoreboard
Dynamic Scheduling

Three Stages of Tomasulo Algorithm


1. Issueget instruction from FP Op Queue
If reservation station free (no structural hazard),
control issues instruction & sends operands (renames registers).
2. Executionoperate on operands (EX)
When both operands ready then execute;
if not ready, watch Common Data Bus for result
3. Write resultfinish execution (WB)
Write on Common Data Bus to all awaiting units;
mark reservation station available
Normal data bus: data + destination (go to bus)
Common data bus: data + source (come from bus)
64 bits of data + 4 bits of Functional Unit source address
Write if matches expected Functional Unit (produces result)
Does the broadcast
8/2/2017 GGITM-CSE 41
Using A Scoreboard
Dynamic Scheduling
Tomasulo Example Cycle 0
Instruction status Execution Write
Instruction j k Issue complete Result Busy Address
LD F6 34+ R2 Load1 No
LD F2 45+ R3 Load2 No
MULTDF0 F2 F4 Load3 No
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Reservation Stations S1 S2 RS for j RS for k
Time Name Busy Op Vj Vk Qj Qk
0 Add1 No
0 Add2 No
0 Add3 No
0 Mult1 No
0 Mult2 No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
0 FU

8/2/2017 GGITM-CSE 42
Using A Scoreboard
Dynamic Scheduling
Review: Tomasulo

Prevents Register as bottleneck


Avoids WAR, WAW hazards of Scoreboard
Allows loop unrolling in HW
Not limited to basic blocks (provided branch prediction)
Lasting Contributions
Dynamic scheduling
Register renaming
Load/store disambiguation
360/91 descendants are PowerPC 604, 620; MIPS R10000; HP-PA
8000; Intel Pentium Pro

8/2/2017 GGITM-CSE 43
Dynamic Hardware
Prediction

Dynamic Branch Prediction is the ability of the hardware to


make an educated guess about which way a branch will go
- will the branch be taken or not.

The hardware can look for clues based on the instructions, or it


can use past history - we will discuss both of these
directions.

8/2/2017 GGITM-CSE 44
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Dynamic Branch Prediction
Performance = (accuracy, cost of misprediction)
Branch History Lower bits of PC address index table of 1-bit values
Says whether or not branch taken last time
Problem: in a loop, 1-bit BHT will cause two mis-predictions:
End of loop case, when it exits instead of looping as before
First time through loop on next time through code, when it predicts exit instead
of looping

P
Address 0 r
e
d
31 1 Bits 13 - 2 i
c
t
1023 i
o
n
8/2/2017 GGITM-CSE 45
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Dynamic Branch Prediction

Solution: 2-bit scheme where change prediction only if get


misprediction twice: (Figure 4.13, p. 264)

T
NT
Predict Taken Predict Taken
T
T NT
NT
Predict Not Predict Not
Taken T Taken
NT

8/2/2017 GGITM-CSE 46
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction

BHT Accuracy

Mispredict because either:


Wrong guess for that branch
Got branch history of wrong branch when index the table
4096 entry table programs vary from 1% misprediction (nasa7,
tomcatv) to 18% (eqntott), with spice at 9% and gcc at 12%
4096 about as good as infinite table, but 4096 is a lot of HW

8/2/2017 GGITM-CSE 47
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction

Correlating Branches
Idea: taken/not taken of Branch address
recently executed branches is
related to behavior of next 2-bits per branch predictors
branch (as well as the history
of that branch behavior)
Then behavior of recent
branches selects between, say, Prediction
four predictions of next branch,
updating just that prediction

2-bit global branch history


8/2/2017 GGITM-CSE 48
Dynamic Hardware Basic Branch Prediction:
Branch Prediction Buffers
Prediction
Accuracy of Different Schemes
(Figure 4.21,
4096 Entries 2-bits per entry
p. 272)
Unlimited Entries 2-bits per entry
Frequency of Mispredictions

18% 1024 Entries - 2 bits of history,


18%
2 bits per entry
16%

14%
Frequency of Mispredictions

12% 11%

10%

8%
6% 6% 6%
6% 5% 5%
4%
4%

2% 1% 1%
0% 0%
0%
doducd

gcc
nasa7

eqntott
espresso
spice

fpppp
tomcatv

li
matrix300

8/2/2017
4,096 entries: 2-bits per entry GGITM-CSE
Unlimited entries: 2-bits/entry 1,024 entries (2,2) 49
Dynamic Hardware Basic Branch Prediction:
Branch Target Buffers
Prediction
Branch Target Buffer
Branch Target Buffer (BTB): Use address of branch as index to get prediction AND
branch address (if taken)
Note: must check for branch match now, since cant use wrong branch address (Figure 4.22, p.
273)

Predicted PC Branch Prediction:


Taken or not Taken

Return instruction addresses predicted with stack


8/2/2017 GGITM-CSE 50
Dynamic Hardware Basic Branch Prediction:
Branch Target Buffers
Prediction
Example Instructions
in Buffer
Prediction Actual
Branch
Penalty
Cycles
Yes Taken Taken 0
Yes Taken Not taken 2
No Taken 2

Example on page 274.


Determine the total branch penalty for a BTB using the above
penalties. Assume also the following:
Prediction accuracy of 80%
Hit rate in the buffer of 90%
60% taken branch frequency.
Branch Penalty = Percent buffer hit rate X Percent incorrect predictions X 2
+ ( 1 - percent buffer hit rate) X Taken branches X 2
Branch Penalty = ( 90% X 10% X 2) + (10% X 60% X 2)
Branch Penalty = 0.18 + 0.12 = 0.30 clock cycles

8/2/2017 GGITM-CSE 51
References
www.csd.uoc.gr/~hy590-
25/Chapter04-Pipelining2.ppt
Hennesy & Patterson

8/2/2017 GGITM-CSE 52

Potrebbero piacerti anche