Sei sulla pagina 1di 8

Pipelining and Retiming 1

Pipelining
Adding regisfers oIong o pofh
spIif combinofionoI Iogic info muIfipIe cycIes
eoch cycIe smoIIer fhon previousIy
increose fhroughpuf
Pipelining and Retiming 2
Pipelining
DeIoy, d, of sIowesf combinofionoI sfoge defermines performonce
Throughpuf ~ I/d : rofe of which oufpufs ore produced
Lofency ~ nd : number of sfoges ^ cIock period
PipeIining increoses circuif ufiIi;ofion
Pegisfers sIow down dofo, synchroni;e dofo pofhs
Wove-pipeIining
no pipeIine regisfers - woves of dofo fIow fhrough circuif
reIies on equoI-deIoy circuif pofhs - no shorf pofhs
Pipelining and Retiming 3
When and How to Pipeline?
Where is fhe besf pIoce fo odd regisfers7
spIiffing combinofionoI Iogic
overheod of regisfers (propogofion deIoy ond sefup fime
requiremenfs)
Whof obouf cycIes in dofo pofh7
ExompIe: Io-bif odder, odd 8-bifs in eoch of fwo cycIes
Pipelining and Retiming 4
Retiming
Process of opfimoIIy disfribufing regisfers fhroughouf o circuif
minimi;e fhe cIock period
minimi;e fhe number of regisfers
Pipelining and Retiming 5
Retiming (contd)
Fosf opfimoI oIgorifhm (Leiserson & Soxe I983)
Pefiming ruIes:
remove one regisfer from eoch inpuf ond odd one fo eoch oufpuf
remove one regisfer from eoch oufpuf ond odd one fo eoch inpuf
Pipelining and Retiming 6
a
b
c
d
x
D Q
a
b
d
x
D Q
D Q
a
b
x
c
D Q
D Q
D Q
x
c
a
b
D Q
D Q
Retiming examples
Shorfening crificoI pofhs
Creofe simpIificofion opporfunifies
Pipelining and Retiming 7
Optimal Pipelining
Add regisfers - use refiming fo find opfimoI Iocofion
8 7 13 10
5 6
8 7 13 10
5 6
Pipelining and Retiming 8
Example - Digital Correlator
y
f
~ (x
f
, o
0
) + (x
f-I
, o
I
) + (x
f-Z
, o
Z
) + (x
f-3
, o
3
)
(x
f
, o
0
) ~ 0 if x o, I ofherwise (ond posses x oIong fo fhe righf)
+ +

+

host
y
t
x
t
a
0
a
1
a
2
a
3
Pipelining and Retiming 9
Example - Digital Correlator (contd)
DeIoys: odder, 7, comporofor, 3, hosf, 0
+ +

+

host
+ +

+

host
cycle time = 2+
cycle time = 13
Pipelining and Retiming 10
Retiming Algorithm
Pepresenfofion of circuif os direcfed groph
nodes: combinofionoI Iogic
edges: connecfions befween Iogic fhof moy or moy nof incIude
regisfers
weighfs: propogofion deIoy for nodes, number of regisfers for edges
pofh deIoy (D): sum of propogofion deoIys oIong pofh nodes
pofh weighf (W): sum of edge weighfs oIong pofh
oIwoys 0, no osynchronous feedbock
ProbIem sfofemenf
given: cycIe fime, T, ond o circuif groph
odjusf edge weighfs (number of regisfers) so fhof oII pofh deIoys
T, unIess fheir pofh weighf I, ond fhe oufpufs fo fhe hosf ore fhe
some (in bofh funcfion ond deIoy) os in fhe originoI groph
Pipelining and Retiming 11
Retiming Algorithm Approach
Compufe pofh weighfs ond deIoys befween eoch poir of nodes
W ond D mofrices
Choose o cycIe fime T
Defermine if if is possibIe fo ossign new weighfs so fhof oII pofhs wifh
deIoys greofer fhon T hove o weighf fhof is I or greofer (use Iineor
progromming)
Choose o smoIIer cycIe fime ond repeof unfiI fhe smoIIesf T is found
Pipelining and Retiming 12
Computing W and D
W mofrix: number of regisfers on pofh from u v
D mofrix: fofoI deIoy oIong pofh from u v
W h 1 2 3 + 5 6 7
h 0 1 2 3 + 3 2 1
1 0 0 1 2 3 2 1 0
2 0 1 0 1 2 1 0 0
3 0 1 2 0 1 0 0 0
+ 0 1 2 3 0 0 0 0
5 0 1 2 3 + 0 0 0
6 0 1 2 3 + 3 0 0
7 0 1 2 3 + 3 2 0
D h 1 2 3 + 5 6 7
h 0 3 6 9 12 16 13 10
1 10 3 6 9 12 16 13 10
2 17 20 3 6 9 13 10 17
3 2+ 27 30 3 6 10 17 2+
+ 2+ 27 30 33 3 10 17 2+
5 21 2+ 27 30 33 7 1+ 21
6 1+ 17 20 23 26 30 7 1+
7 7 10 13 16 19 23 20 7
7 7
3 3
7
3 3
0
0 0
0
1
1
1 1
0
v
1
v
2
v
3
v
+
v
5
v
6
v
7
v
h
0 0 0
Pipelining and Retiming 13
Computing W and D
W[u,v] ~ number of regisfers on fhe minimum weighf pofh from u v
Any refiming chonges fhe weighf of oII pofhs by fhe some consfonf
i.e. Pefiming connof chonge which is fhe minimum weighf pofh
D[u,v] ~ moximum deIoy over oII pofhs wifh W[u,v] regisfers
Pefiming does nof offecf D[u,v]
These mofrices confoin oII fhe required regisfer ond deIoy informofion
If refiming removes oII regisfers from fhe pofh u v,
fhen D[u,v] is fhe Iorgesf deIoy pofh fhof resuIfs
Pipelining and Retiming 14
Retiming: One Step at a Time
7 7
3 3
7
3 3
0
0 0
0
1
1
1 1
0
7 7
3 3
7
3 3
0
0 0
0
1
1
0 2
0
7 7
3 3
7
3 3
0
0 0
0
1
1
0 1
1
0 0 0
0 1 0
0 1 0
Pipelining and Retiming 15
Retiming: One Step at a Time (contd)
7 7
3 3
7
3 3
0
0 1
0
1
1
0 1
0
0 0 0
7 7
3 3
7
3 3
0
0 1
0
2
0
0 1
0
0 0 1
7 7
3 3
7
3 3
0
1 1
0
1
0
0 1
0
0 0 1
and after a few more . . .
Pipelining and Retiming 16
Retiming: Problem Formulation
r(v): number of regisfers pushed fhrough o node in fhe forword
direcfion
w
new
(u, v) ~ w
oId
(u, v) + r(u) - r(v)
ProbIem sfofemenf
r(v
h
) ~ 0 (hosf is nof refimed)
w
new
(u, v) ~ w
oId
(u, v) + r(u) - r(v) 0, for oII u, v
r(u) - r(v) - w
oId
(u, v) (no negofive regisfersl)
For oII D[u,v] TcIk,
w
new
(u, v) ~ w
oId
(u, v) + r(u) - r(v) I
r(u) - r(v) - w
oId
(u, v) + I (every Iong pofh hos of Ieosf I reg)
Difference consfroinfs Iike fhis con be soIved by generofing o groph
fhof represenfs fhe consfroinfs ond using o shorfesf pofh oIgorifhm Iike
8eIImon-Ford fo find o sef of r(v) voIues fhof meefs oII fhe consfroinfs
The voIue of r(v) refurned by fhe oIgorifhm con be used fo generofe fhe
new posifions of fhe regisfers in fhe refimed circuif
Pipelining and Retiming 17
Retimed Correlator
7 7
3 3
7
3 3
0
0 0
0
1
1
1 1
0
0 0 0
7 7
3 3
7
3 3
0
1 1
0
1
0
0 1
0
0 0 1
r = 2
r = 2
r = 2
r = 1
r = 1 r = 1
r = 0
r = 0
Pipelining and Retiming 18
Extensions to Retiming
Hosf inferfoce
odd Iofency
muIfipIe hosfs
Areo considerofions
Iimif number of regisfers
opfimi;e Iogic ocross regisfer boundories
peripheroI refiming
incremenfoI refiming
pre-compufofion
0eneroIify
differenf propogofion deIoys for differenf signoIs
widfhs of inferconnecfions
Pipelining and Retiming 19
Analogy: data flowing through the system in a
rhythmic fashion - from main memory through
a series of processing elements and back to
main memory
Systolic Arrays
Sef of idenficoI processing eIemenfs
specioIi;ed or progrommobIe
Efficienf neoresf-neighbor inferconnecfions (in I-D, Z-D, ofher)
SIMD-Iike
MuIfipIe dofo fIows, converging fo engoge in compufofion
Pipelining and Retiming 20
x
5
- x
4
- x
3
- x
2
- x
1
- - - y
1
- y
2
- y
3
-
w
4
w
3
w
2
w
1
Example - Convolution
y
1
= x
1
w
1
+ x
2
w
2
+ x
3
w
3
+ x
4
w
4
y
2
= x
2
w
1
+ x
3
w
2
+ x
4
w
3
+ x
5
w
4
y
3
= x
3
w
1
+ x
4
w
2
+ x
5
w
3
+ x
6
w
4
y
g
=
g
w
1
+
g+1
w
Z
+ , , , +
g+n-1
w
n
Pipelining and Retiming 21
- - - y
1
- y
2
- y
3
x
6
- x
5
- x
4
- x
3
- x
2
- x
1
- - - y
1
- y
2
- y
3
x
6
- x
5
- x
4
- x
3
- x
2
- x
1
- - - y
1
- y
2
- y
3
- - - y
1
- y
2
- y
3
w
4
w
3
w
2
w
1
x
6
- x
5
- x
4
- x
3
- x
2
- x
1
x
6
- x
5
- x
4
- x
3
- x
2
- x
1
x
6
- x
5
- x
4
- x
3
- x
2
-
- y
1
- y
2
- y
3
x
6
- x
5
- x
4
- x
3
- x
2
y
1
- y
2
- y
3
x
6
- x
5
- x
4
- x
3
-
- y
2
- y
3
x
6
- x
5
- x
4
- x
3
- - y
1
- y
2
- y
3
Example - Convolution
Pipelining and Retiming 22
x9x8x7x6x5x4x3x2x1x0
* * * *
w3 w2 w1 w0
y3 =
Convolution - Another Look
Pepeofed vecfor producf
Pipelining and Retiming 23
*
+
*
+
*
+
*
+ 0
Convolution Example
w
3
w
2
w
1
w
0
x
3
x
2
x
1
x
0
y
0
x
7
x
6
x
5
x
4
Pipelining and Retiming 24
*
+
*
+
*
+
*
+ 0
Convolution Example
w
3
w
2
w
1
w
0
x
3
x
2
x
1
y
1
x
7
x
6
x
5
x
4
Pipelining and Retiming 25
*
+
*
+
*
+
*
+ 0
Convolution Example
w
3
w
2
w
1
w
0
x
3
x
2 x
5
y
2
x
7
x
6
x
5
x
4
Pipelining and Retiming 26
*
+
*
+
*
+
*
+ 0
Pipelining and Retiming 27
*
+
*
+
*
+
*
+ 0
Pipelining and Retiming 28
*
+
*
+
*
+
*
+ 0
w
3
w
2
w
1
w
0
x
2
x
0
x
6
x
7
x
4
x
1
x
3
x
5
Pipelining and Retiming 29
c
11
c
12
c
13
c
14
c
21
c
22
c
23
c
24
c
31
c
32
c
33
c
34
c
41
c
42
c
43
c
44
Example: Matrix Multiplication
C ~ A 8 c
ij
~
k~I
n
o
ik
b
kj
Pipelining and Retiming 30
- - - a
14
a
13
a
12
a
11
- - a
24
a
23
a
22
a
21
-
- a
34
a
33
a
32
a
31
- -
a
44
a
43
a
42
a
41
- - -
c
11
c
12
c
13
c
14
c
21
c
22
c
23
c
24
c
31
c
32
c
33
c
34
c
41
c
42
c
43
c
44
b
44
b
43
b
34
b
42
b
33
b
24
b
41
b
32
b
23
b
14
b
31
b
22
b
13

b
21
b
12

b
11

Example: Matrix Multiplication
Pipelining and Retiming 31
Systolic Algorithms
ZD ConvoIufion
Imoge processing
FFT
Sfring mofching
Dynomic progromming
DMA comporison
Mofrix compufofions
LU decomposifion
QP focfori;ofion
Pipelining and Retiming 32
Systolic Architectures
HighIy poroIIeI
"fine-groined" poroIIeIism
deep pipeIining
LocoI communicofion
wires ore shorf - no gIoboI communicofion (excepf CLI)
Iineor orroy no cIock skew
increosingIy imporfonf os wire deIoys increose (reIofive fo gofe deIoys)
Lineor orroys
mosf sysfoIic oIgorifhms con be done wifh o Iineor orroy
incIude memory in eoch ceII in fhe orroy
Iineor orroy o beffer mofch fo I/O Iimifofions
Confrosf fo superscoIor ond vecfor orchifecfures
Pipelining and Retiming 33
Systolic Computers
Cusfom chips - eorIy I980's
Worp (CMU) - I987
Iineor orroy of I0 or more processing ceIIs
opfimi;ed infer-ceII communicofion for Iow-Iofency
pipeIined ceIIs ond communicofion
condifionoI execufion
compiIer porfifions probIem info ceIIs ond generofes microcode
i-Worp (InfeI) - I990
successor fo Worp
fwo-dimensionoI orroy
fime-muIfipIexing of physicoI busses befween ceIIs
3Zx3Z orroy hos Z00fIops peok performonce
nof o commercioI success
CurrenfIy confined fo ASIC impIemenfofions
Pipelining and Retiming 34
Digital Correlator Revisited
OpfimoIIy refimed circuif (cIock cycIe I3)
How con we increose fhe cIock frequency7
Work on muIfipIe dofo sefs of fhe some fime
+ +

+

host
Pipelining and Retiming 35
C-slowing a Circuit
PepIoce every regisfer wifh C regisfers
Mow refime: (cIock cycIe now 7)
+ +

+

host
+ +

+

host
Pipelining and Retiming 36
*
+
*
+
*
+
*
+ 0
C-slowing/Retiming for Resource Sharing
CorreIofor circuif
Pipelining and Retiming 37
*
+
*
+
*
+
*
+
*
+
*
+
*
+
*
+
C-slowed by 4
*
+
*
+
*
+
*
+
Insert Data every 4 cycles (one data set)
*
+
*
+
*
+
*
+
Computation Active only every 4 Cycles
*
+
*
+
*
+
*
+
Retime and remove extra Pipelining
*
+
*
+
*
+
*
+
*
+
*
+
*
+
*
+
Computation spread over time
OnIy need one muIfipIier ond one odder
We con use fhis mefhod fo scheduIe for ony number of resources

Potrebbero piacerti anche