Sei sulla pagina 1di 11

NJIT Computer Science Dept CS650 Computer Architecture

CS650
Computer Architecture

Lecture 7-1
Superscalar and
Hardware-based Speculation
Andrew Sohn
Computer Science Department
New Jersey Institute of Technology

Lecture 7-1: Superscalar and Speculation 7-1-1/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Sample Superscalar
Integer
EX
EX

FP Adder
A1 A2 A3 A4
A1 A2 A3 A4

IF ID
FP/Integer Multiply MA WB
IF ID MA WB
M1 M2 M3 M4 M5 M6 M7
M1 M2 M3 M4 M5 M6 M7

Integer/FP Divider
1 2 3 4 5 21 22 23 24 25
1 2 3 4 5 21 22 23 24 25

Lecture 7-1: Superscalar and Speculation 7-1-2/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Assumptions for Superscalar


• Four stage pipeline: issue, execute, memory access, write
• Write to common data bus
• 1 memory unit for load and store
• 1 integer unit for integer operations
• Integer unit used for load/store (effective address computation)
• Address computation at EX stage by integer unit or seperate
adddress unit
• 1 float unit
• Both 1 int and 1 float can be issued every clock
• 1 integer and 1 float can be exeuted at the same time
• Branch condition evaluated at the execution stage
• Instructions following a branch wait until the condition tested
• Perfect branch prediciton

Lecture 7-1: Superscalar and Speculation 7-1-3/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Superscalar Example
Consider the loop below:

LOOP: LOAD F0,0(R1)


ADD F4,F0,F2
STORE 0(R1),F4
ADDI R1,R1,−#8
BNE R1,R2,LOOP

See how it executes on superscalar

Lecture 7-1: Superscalar and Speculation 7-1-4/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Execution Sequence
It
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#
1 LOAD F0,0(R1) I E M W write to CDB
1 ADD F4,F0,F2 I x x x E E E W wait for load
1 STORE 0(R1),F4 I E x x x x x M int unit wait for add
1 ADDI R1,R1,-#8 I x E W int unit wait for int ALU
1 BNE R1,R2,Loop I x x E wait for addi
2 LOAD F0,0(R1) I E M W bne test known after bne Ex
2 ADD F4,F0,F2 I E E E W wait for load
2 STORE 0(R1),F4 I E M int unit, add
2 ADDI R1,R1,-#8 I E W wait for int ALU
2 BNE R1,R2,Loop I E wait for addi
3 LOAD F0,0(R1) wait for bne I E M W
3 ADD F4,F0,F2 wait for load I E E E W
3 STORE 0(R1),F4 wait for add I E M
3 ADDI R1,R1,-#8 wait for int ALU I E W
3 BNE R1,R2,Loop wait for addi I E
4 LOAD F0,0(R1)
...

Assume: 1 integer unit for ALU operations and effective memory address computa-
tion (load, store), FP add 3 clocks.

Lecture 7-1: Superscalar and Speculation 7-1-5/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Execution Sequence
It
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
#
1 LOAD F0,0(R1) I E M W address unit write to CDB
1 ADD F4,F0,F2 I x x x E E E W wait for load
1 STORE 0(R1),F4 I E M address unit wait for add
1 ADDI R1,R1,-#8 I E W int unit executes earlier
1 BNE R1,R2,Loop I E wait for addi
2 LOAD F0,0(R1) I E M W wait for bne
2 ADD F4,F0,F2 I E W wait for load
2 STORE 0(R1),F4 I E M wait for add
2 ADDI R1,R1,-#8 I E W executes earlier
2 BNE R1,R2,Loop I E wait for addi
3 LOAD F0,0(R1) wait for bne I E M W
3 ADD F4,F0,F2 wait for load I E W
3 STORE 0(R1),F4 wait for add I E M M
3 ADDI R1,R1,-#8 executes earlier I E W
3 BNE R1,R2,Loop wait for addi I E
4 LOAD F0,0(R1)
...

Assume: 1 integer unit for ALU operations, 1 address unit for effective memory ad-
dress computation (load, store)

Lecture 7-1: Superscalar and Speculation 7-1-6/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Hardware-based Speculation
Speculative execution of insructions before resolving branch
Split instruction completion into write and commit
Insruction commit when it is no longer speculative
Register/memory not updated until the instruction commits
Dynamic branch prediction
Dynamic scheduling of instructions through reorder buffer

Four stages: Issue, Execute, Write, Commit (Complete)

In-order instruction issue


Out of order execution
Out of order write
In-order commit (completion) through reorder buffer (ROB)

Lecture 7-1: Superscalar and Speculation 7-1-7/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Four Stages
1 Issue
• Reservation station entry available
• Reorder buffer entry available
• Lock up the entries
2 Execute
• If operands/values are available at reservation station
• Effective memory address computation for load/store
3 Write
• Common data bus is available
• Move the result/store val to all ROB entries, prepare for commit
• Free up reservation station entry
4 Commit
• If instruction is at the head of the ROB and is ready to commit
• Update the register/update the memory for store
• Clear the ROB entry on branch misprediction
• Free up the ROB entry

Lecture 7-1: Superscalar and Speculation 7-1-8/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Register Renaming
From instruction unit

Instruction queue

FP registers
Address Unit
Load/store FP operations
operations
Operation bus
Operand buses
Store buffers

Load buffers

3 2
2 Reservation stations Reservation stations 1
1

Data Address

Memory Unit FP + & - FP * & /


Common data bus

Lecture 7-1: Superscalar and Speculation 7-1-9/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Hardware-based Speculation
From instruction unit Reorder buffer

Reg # Data

Instruction queue

FP registers
Address Unit
Load/store FP operations
operations Operation bus
Operand buses
Store Address
Store Data

Store buffers

Load buffers

3 2
2 Reservation stations Reservation stations 1
1

Data Address

Memory Unit FP + & - FP * & /


Load data Common data bus

Lecture 7-1: Superscalar and Speculation 7-1-10/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Example
Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2

Reorder buffer
Entry Busy Instruction State Destination Value Ready
1 no LOAD F6,34(R2) Commit F6 Mem[34+Regs[R2]]
2 no LOAD F2,45(R3) Commit F2 Mem[45+Regs[R3]]
3 yes MULT F0,F2,F4 Write F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5 DIV F10,F0,F6
6 ADD F6,F8,F2
7

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder #
Busy

Lecture 7-1: Superscalar and Speculation 7-1-11/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Issue DIV F10, F0, F6


Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2 Reorder buffer entry for DIV available?
Add1 Reservation station entry available?
Add2
Add3
Mult1 Yes
Mult2 Yes

Reorder buffer
Entry Busy Instruction State Destination Value Ready
1 no LOAD F6,34(R2) Commit F6 Mem[34+Regs[R2]]
2 no LOAD F2,45(R3) Commit F2 Mem[45+Regs[R3]]
3 yes MULT F0,F2,F4 Write F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5
6
7

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3
Busy yes
Qi

Lecture 7-1: Superscalar and Speculation 7-1-12/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Issue DIV F10, F0, F6


Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 yes DIV #3 #5

Reorder buffer
Entry Busy Instruction State Destination Value
1 no LOAD F6,34(R2) Commit F6 Mem[34+Regs[R2]]
2 no ? no
dyLOAD F2,45(R3) Commit F2 Mem[45+Regs[R3]]
rea
3 yes MULT F0,F2,F4 Write F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5 yes DIV F10,F0,F6 F10
6
7

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3 4
Busy yes no yes

busy? yes

Lecture 7-1: Superscalar and Speculation 7-1-13/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Issue DIV F10, F0, F6


Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 yes DIV Regs[F6] #3 #5

Reorder buffer
Entry Busy Instruction State Destination Value
1 no LOAD F6,34(R2) Commit F6 Mem[34+Regs[R2]]
2 no LOAD F2,45(R3) Commit F2 Mem[45+Regs[R3]]
3 yes MULT F0,F2,F4 Write F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5 yes DIV F10,F0,F6 F10
6
7

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3 4 5
Busy yes no ? no yes yes
busy

Lecture 7-1: Superscalar and Speculation 7-1-14/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Execute DIV F10, F0, F6


Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 yes DIV Mult result Regs[F6] 0 0 #5

Reorder buffer
Entry Busy Instruction State Destination Value
1
2
3 yes MULT F0,F2,F4 Write done F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5 yes DIV F10,F0,F6 F10
6
7

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3 4 5
Busy yes no yes yes

Lecture 7-1: Superscalar and Speculation 7-1-15/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Write Result
Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2 Result #5 --> 0
Add1 Result #5 --> 0
Add2
Add3
Mult1
Mult2 no DIV Mult result Regs[F6] 0 0 #5

Reorder buffer
Entry Busy Instruction State Destination Value
1
2
3 yes MULT F0,F2,F4 Write done F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
5 yes DIV F10,F0,F6 Write F10 Result
6
7 Ready=Yes

FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3 4 5
Busy yes no yes yes

Lecture 7-1: Superscalar and Speculation 7-1-16/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Commit
Reservation Stations
Name Busy Op Vj Vk Qj Qk Dest A
Load1
Load2
Add1
Add2
Add3
Mult1
Mult2 no DIV Mult result Regs[F6] 0 0 #5

Reorder buffer
Entry Busy Instruction State Destination Value
1
2
3 yes MULT F0,F2,F4 Write done F0 #2 x Regs[F4]
4 yes SUB F8,F6,F2 Write F8 #1 − #2
Head

5 Yes-->No DIV F10,F0,F6 Write F10 Result


6
7
Branch? No
Regs[F10]
FP Register Status
Field F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
Reorder # 3 4 5
Busy yes yes->no

Lecture 7-1: Superscalar and Speculation 7-1-17/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Execution Sequence
Assumptions:
• Two load reservation stations - load takes two clocks
• Three add/sub reservation stations - 3 clocks
• Two mult/div reservation stations - mult 5 clocks, div 10 clocks
• Ten reorder buffer entries
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 ... 21 22 23 24
LOAD F6,34(R2) I E M W C emac and mem read
LOAD F2,45(R3) I E M W C emac and mem read
MULT F0,F2,F4 I E E E E E W C wait for load write
SUB F8,F6,F2 I E E E W C wait for load write
DIV F10,F0,F6 I E E E ... E W C
ADD F6,F8,F2 I E E E W C
...

Lecture 7-1: Superscalar and Speculation 7-1-18/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

Superscalar with Speculation


Consider the loop below:

LOOP: LOAD R2,0(R1)


ADDI R2,R2,#1
STORE 0(R1),R2
ADDI R1,R1,#4
BNE R2,R3,LOOP

See how it executes on superscalar

Lecture 7-1: Superscalar and Speculation 7-1-19/21 10/25/2004 A. Sohn

NJIT Computer Science Dept CS650 Computer Architecture

Without Speculation
It
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#
1 LOAD R2,0(R1) I E M W
1 ADDI R2,R2,#1 I x x x E W wait for load
1 STORE 0(R1),R2 I E x x x M wait for addi
1 ADDI R1,R1,#4 I E W int/float ok
1 BNE R2,R3,Loop I x x x E wait for addi
2 LOAD R2,0(R1) I E M W wait for bne
2 ADDI R2,R2,#1 I E W wait for load
2 STORE 0(R1),R2 I E M emac and addi
2 ADDI R1,R1,#4 I E W int/float exec ok
2 BNE R2,R3,Loop I E wait for addi
3 LOAD R2,0(R1) wait for bne I E M W
3 ADDI R2,R2,#1 wait for load I E W
3 STORE 0(R1),R2 emac and addi I E M
3 ADDI R1,R1,#4 int/float exec ok I E W
3 BNE R2,R3,Loop wait for addi I E
4 LOAD R2,0(R1)
...

Seperate int units for eff mem addr comp, ALU ops, and branch condition testing.

Lecture 7-1: Superscalar and Speculation 7-1-20/21 10/25/2004 A. Sohn


NJIT Computer Science Dept CS650 Computer Architecture

With Speculation
It
Instruction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
#
1 LOAD R2,0(R1) I E M W C
1 ADDI R2,R2,#1 I x x x E W C wait for load
1 STORE 0(R1),R2 I E x x x C wait for add
1 ADDI R1,R1,#4 I E W C emac/int ok. i-o-c
1 BNE R2,R3,Loop I x x E C wait for addi
2 LOAD R2,0(R1) I E M W C no wait for bne
2 ADDI R2,R2,#1 I E W C wait for load
2 STORE 0(R1),R2 I E C wait for add
2 ADDI R1,R1,#4 I E W C wait for int ALU
2 BNE R2,R3,Loop I E C wait for addi
3 LOAD R2,0(R1) wait for bne I E M W C
3 ADDI R2,R2,#1 wait for load I E W C
3 STORE 0(R1),R2 wait for add I E C
3 ADDI R1,R1,#4 wait for int ALU I E W C
3 BNE R2,R3,Loop wait for addi I E C
4 LOAD R2,0(R1)
...

Lecture 7-1: Superscalar and Speculation 7-1-21/21 10/25/2004 A. Sohn

Potrebbero piacerti anche