Sei sulla pagina 1di 16

Pipelined Data Paths

Pipelined Data Paths

Pipelined Data Paths

Pipelining is now used in even the simplest of processors

Same principles as assembly lines in manufacturing

Unlike in assembly lines, instructions not independent

Single-Cycle Data Path of MIPS Clock rate = 125 MHz Incr PC Next addr CPI
Single-Cycle Data Path of MIPS
Clock rate = 125 MHz
Incr PC
Next addr
CPI = 1 (125 MIPS)
jta
Next PC
ALUOvfl
(PC)
(rs)
PC
rs
Ovfl
rt
Data
ALU
Data
inst
addr
out
Instr
Reg
Data
out
0
ALU
0
cache
1
file
cache
rd
1
2
2
31
(rt)
Data
Func
0
32
in
SE
/
1
/
imm
16
Register input
op
fn
RegDst
ALUSrc
DataRead
RegInSrc
Br&Jump
RegWrite
ALUFunc
DataWrite

Key elements of the single-cycle MicroMIPS data path.

Multicycle Data Path of MIPS Clock rate = 500 MHz CPI  4 ( 125
Multicycle Data Path of MIPS
Clock rate = 500 MHz
CPI  4 ( 125 MIPS)
26
30
/
/
0
4 MSBs
1
SysCallAddr
30
ALUZero
Inst Reg
x
Reg
jta
x
Mux
ALUOvfl
rs
Address
(rs)
0
PC
Zero
z Reg
rt
4
1
Ovfl
0
0
1
1
rd
Reg
0
2
ALU
2
Cache
31
1
3
file
y
Mux
0
(rt)
4
0
Func
1
Data
1
2
2
ALU out
3
4
Data Reg
16
32
y
Reg
imm
SE
/
/
op
fn
InstData
MemWrite
RegInSrc
ALUSrcX
ALUFunc
PCSrc
PCWrite
MemRead
IRWrite
RegDst
RegWrite
ALUSrcY
JumpAddr

Key elements of the multicycle MicroMIPS data path.

Pipelining Concepts

Strategies for improving performance

1 Use multiple independent data paths accepting several instructions

that are read out at once: multiple-instruction-issue or superscalar

2 Overlap execution of several instructions, starting the next instruction before the previous one has run to completion: (super)pipelined

Start

here

Approval 1
Approval
1

Cashier

2

Cashier 2
Registrar 3
Registrar
3

ID photo

4

Start here Approval 1 Cashier 2 Registrar 3 ID photo 4 Pickup 5 Exit Pipelining in

Pickup

Pickup 5

5

Exit
Exit

Pipelining in the student registration process.

Instr 1

Instr 2

Instr 3

Instr 4

Instr 5

Pipelined Instruction Execution

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Instr Reg Data Reg ALU
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Instr
Reg
Data
Reg
ALU
cache
file
cache
file

Cycle 6

Cycle 7

Cycle 8

Cycle 9

Time dimension

Reg Instr Data ALU file cache cache
Reg
Instr
Data
ALU
file
cache
cache

Reg

file

Task

dimension

Reg Instr Data ALU file cache cache
Reg
Instr
Data
ALU
file
cache
cache

Reg

file

Reg Instr Data ALU file cache cache
Reg
Instr
Data
ALU
file
cache
cache

Reg

file

Reg Instr Data ALU file cache cache
Reg
Instr
Data
ALU
file
cache
cache

Reg

file

cache Reg file Reg Instr Data ALU file cache cache Reg file Reg Instr Data ALU
Reg Instr Data ALU file cache cache Reg file Pipelining in the MicroMIPS instruction execution process.
Reg Instr Data ALU file cache cache Reg file Pipelining in the MicroMIPS instruction execution process.

Pipelining in the MicroMIPS instruction execution process.

Alternate Representations of a Pipeline

Except for start-up and drainage overheads, a pipeline can execute one instruction per clock tick; IPS is dictated by the clock frequency

1

2

3

4

5

6

7

8

9

10

11

is dictated by the clock frequency 1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

by the clock frequency 1 2 3 4 5 6 7 8 9 10 11 1

f

r

a

d

w

Cycle

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5

f

r

a

d

w

3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7

f

r

a

d

w

8 9 10 11 1 2 3 4 5 6 7 f r a d w

f

r

a

d

w

f

= Fetch

6 7 f r a d w Cycle f r a d w f r a

f

r

a

d

w

r

a

d

= Reg read

= ALU op

= Data access

w = Writeback

a d w r a d = Reg read = ALU op = Data access w

f

r

a

d

w

d = Reg read = ALU op = Data access w = Writeback f r a

f

r

a

d

w

= ALU op = Data access w = Writeback f r a d w f r

Instruction

(a) Task -time diagram

1

2

3

4

5

1 2 3 4 5 6 7 8 9 10 11 f f f f
1
2
3
4
5
6
7
8
9
10
11
f
f
f
f
f
f
f
Cycle
r
r
r
r
r
r
r
Drainage
region
a
a
a
a
a
a
a
d
d
d
d
d
d
d
Start-up
region
w
w
w
w
w
w
w
d d d d d d d Start-up region w w w w w w w

Pipeline

stage

(b) Space-time diagram

Two abstract graphical representations of a 5-stage

pipeline executing 7 tasks (instructions).

Dependency

Data Dependency

Read after write

Read after load

Control Dependency

Data Dependency

First type of data dependency

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Reg
Reg
Instr
Data
$5 = $6 + $7
ALU
file
file
cache
cache
Data
forwarding
Reg
Reg
Instr
Data
$8 = $8 + $6
ALU
file
file
cache
cache
Reg
Reg
Instr
Data
$9 = $8 + $2
ALU
file
file
cache
cache
Reg
Reg
Instr
Data
sw $9, 0($3)
ALU
file
file
cache
cache
Read-after-write data dependency and its possible resolution through
data forwarding .

Instr 1

Instr 2

Instr 3

Instr 4

Instr 5

Instr 1

Instr 2

Instr 3

Instr 4

Instr 5

Inserting Bubbles in a Pipeline

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Time dimension
Writes into $8
Instr
Reg
Data
Reg
ALU
cache
file
cache
file
Reg
Reg
Instr
Data
ALU
file
file
cache
Bubble
cache
Reg
Reg
Instr
Data
Bubble
ALU
file
file
cache
cache
Bubble
Reg
Reg
Instr
Data
ALU
file
file
cache
cache
Reg
Reg
Instr
Data
ALU
file
file
cache
cache
Task
dimension
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9

Without data forwarding, three bubbles are needed to resolve a read-after-write data dependency

are needed to resolve a read-after-write data dependency Reads from $8 Reg Data file cache Reg

Reads from $8

Reg Data file cache
Reg
Data
file
cache
Reg Instr file Bubble ALU cache
Reg
Instr
file
Bubble
ALU
cache
Reg Data file cache Reg Instr file Bubble ALU cache Instr Reg Data Reg ALU cache
Instr Reg Data Reg ALU cache file cache file
Instr
Reg
Data
Reg
ALU
cache
file
cache
file

Time dimension

Writes into $8

Reg Reg Instr Data Bubble ALU file file cache cache Reg Reg Instr Data ALU
Reg
Reg
Instr
Data
Bubble
ALU
file
file
cache
cache
Reg
Reg
Instr
Data
ALU
file
file
cache
cache
Reg
Reg
Instr
Data
ALU
file
file
cache
cache

Task

dimensioncache Reg Reg Instr Data ALU file file cache cache Reg Reg Instr Data ALU file

Two bubbles, if we assume

that a register can be updated

and read from in one cycle

Reads from $8

Second Type of Data Dependency

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Instr
Reg
Data
Reg
sw $6,
ALU
mem
file
mem
file
Reorder?
Instr
Reg
Data
Reg
lw $8,
ALU
mem
file
mem
file
Instr
Reg
Data
Reg
Insert bubble?
ALU
mem
file
mem
file
Instr
Reg
Data
Reg
$9 = $8 + $2
ALU
mem
file
mem
file

Without data forwarding, three (two) bubbles are needed to resolve a read-after-load data dependency

Read-after-load data dependency and its possible resolution through

bubble insertion and data forwarding.

Control Dependency in a Pipeline

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Instr Reg Data Reg ALU
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Instr
Reg
Data
Reg
ALU
mem
file
mem
file

Cycle 6

Cycle 7

Cycle 8

$6 = $3 + $5

mem file mem file Cycle 6 Cycle 7 Cycle 8 $6 = $3 + $5 Reorder?

Reorder?

(delayed

branch)

Instr Reg Data Reg ALU mem file mem file
Instr
Reg
Data
Reg
ALU
mem
file
mem
file
Instr Reg mem file
Instr
Reg
mem
file

beq $1, $2,

Data Reg ALU mem file
Data
Reg
ALU
mem
file

Insert bubble?

Instr Reg Data ALU mem file mem
Instr
Reg
Data
ALU
mem
file
mem

Here would need

Reg

file

$9 = $8 + $2

Assume branch

resolved here

1-2 more bubbles

Control dependency due to conditional branch.

Pipeline Timing and Performance

t

t

t

Function unit

Function unit
Pipeline Timing and Performance t Function unit Latching of results Stage 1 Stage 2 Stage 3

Latching

of results

Stage

1

Stage 2
Stage 2

Stage

2

Stage 3
Stage 3

Stage

3

. . .

.

.

.

Stage q  1
Stage q  1

Stage

q 1

Stage q
Stage q

Stage

q

2 Stage 3 . . . Stage q  1 Stage q t / q 

t/ q /q

3 . . . Stage q  1 Stage q t / q  Pipelined form

Pipelined form of a function unit with latching overhead.

Throughput improvement factor

Throughput Increase in a q -Stage Pipeline

8

7

6

5

4

3

2

1

Ideal : /t = 0
Ideal :
/t = 0
/t = 0.05 /t = 0.1
/t = 0.05
/t = 0.1

1

2

3

4

5

6

Number q of pipeline stages

7

8

t

t/q +

or

q

1 + q/t

Throughput improvement due to pipelining as a function of the

number of pipeline stages for different pipelining overheads.

Pipelined Data Path Design

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Next addr NextPC Data 0
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Next addr
NextPC
Data
0
ALUOvfl
Data
1
addr
Address
PC
inst
rs
(rs)
Ovfl
rt
Instr
Data
Reg
ALU
0
cache
cache
file
(rt)
1
Func
0
0
1
imm
SE
1
1
2
Incr
rt
0
rd
1
2
IncrPC
31
SeqInst
Br&Jump
RegDst
DataRead
RetAddr
op
f n
RegWrite
ALUSrc
ALUFunc
DataWrite
RegInSrc

Key elements of the pipelined MicroMIPS data path.

Pipelined Control

Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Next addr NextPC 0 Data
Stage 1
Stage 2
Stage 3
Stage 4
Stage 5
Next addr
NextPC
0
Data
ALUOvfl
Data
1
addr
Address
PC
inst
rs
(rs)
Ovfl
rt
Instr
Data
Reg
ALU
0
cache
cache
file
(rt)
1
Func
0
0
1
imm
SE
1
1
2
Incr
rt
0
rd
1
2
IncrPC
31
2
3
5
SeqInst
Br&Jump
RegDst
ALUFunc
DataRead
RetAddr
op
f n
RegWrite
ALUSrc
DataWrite
RegInSrc

Pipelined control signals.