Sei sulla pagina 1di 35

VLSI Digital Signal Processing Systems

Folding
Lan-Da Van (), Ph. D. Department of Computer Science National Chiao Tung University Taiwan, R.O.C. Fall, 2010
ldvan@cs.nctu.edu.tw http://www.cs.nctu.tw/~ldvan/

VLSI Digital Signal Processing Systems

Outline
Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions

Lan-Da Van

VLSI-DSP-6-2

VLSI Digital Signal Processing Systems

Introduction (1/2)
Systematically determine the control circuits in DSP architectures by folding transformation, where multiple algorithm operations are time-multiplexed to a single functional unit. Use for synthesis of DSP architectures that can be operated at single or multiple clocks. Use to reduce the number of hardware functional units (FUs) by a factor of N at the expense of increasing computation time by a factor of N. Lead to an architecture that uses a large number of registers and thus present the register minimization technique.
Lan-Da Van VLSI-DSP-6-3

VLSI Digital Signal Processing Systems

Introduction (2/2)

Lan-Da Van

VLSI-DSP-6-4

VLSI Digital Signal Processing Systems

Outline
Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions

Lan-Da Van

VLSI-DSP-6-5

VLSI Digital Signal Processing Systems

Folding Transformation (1/3)


A systematic techniques for designing control circuits for hardware where several algorithm operations are time-multiplexed on a single functional unit. Notations

U, V: nodes (operations) of the original DFG HU, HV: nodes (functional units) of the folded DFG W(x): x-th iteration of node W e U V: an edge e from node U to noe V w(e): # of delays of the edge e Folding factor N # of operations that share one FU An ordered set of operations that executed by the same FU the position of an operation U in folding set is actually the folding order of U The folding set are typically obtained from a scheduling and allocation algorithm (ref. Appendix B) The folding set represents underlying folding transformation
Lan-Da Van VLSI-DSP-6-6

Folding set

VLSI Digital Signal Processing Systems

Folding Transformation (2/3)


PU: # of the pipeline stages of HU. PU = 0 indicates that HU is not pipelined. e DF(U V): (folding equation) # of cycles that the result of HU must be stored

DF (U V ) [ N (l w(e))] v ] [ Nl P u] U
Negative value of folding equation DF is possible before retiming the folding equations.

Nw(e) P v u U

Lan-Da Van

VLSI-DSP-6-7

VLSI Digital Signal Processing Systems

Folding Transformation (3/3)


w(e)

U(l)

V(l+w(e))

N folded PU+DF

N folded

HU(Nl+u)

HV(N(l+w(e))+v)
VLSI-DSP-6-8

Lan-Da Van

VLSI Digital Signal Processing Systems

Folding Retimed Biquad Filter (1/2)


Folding factor N = 4 Folding set S1 = {4, 2, 3, 1}, S2 = {5, 8, 6, 7}, where S1 denote all add operation and S2 denote all multiply operation. Assume that

addition and multiplication require 1 and 2 u.t. respectively. 1-stage adders and 2-stage pipelined multipliers are available.

Lan-Da Van

VLSI-DSP-6-9

VLSI Digital Signal Processing Systems

Folding Retimed Biquad Filter (2/2)


folding equations

Lan-Da Van

VLSI-DSP-6-10

VLSI Digital Signal Processing Systems

Retiming (1/3)
What situations will be suffered if the folding equation DF is negative? Retiming (moving delay elements) the original DFG prior to folding Constraint: e DF(UV)= Nwr(e)PU +vu>=0 -----(1) Substitute wr(e)=w(e)+r(V)r(U) into (1) e r(U)r(V)<= DF(UV)/N

Since the retiming values of the nodes are restricted to be integers, the above equations can be rewritten as

r(U)r(V)<=DF(UV)/N

Lan-Da Van

VLSI-DSP-6-11

VLSI Digital Signal Processing Systems

Retiming (2/3)
Example: DF(12)=Nw(e)-PU+vu=0-1+1-3=-3

r(1)-r(2)<= floor{DF(12)/N} =floor{-3/4}=-1

Lan-Da Van

VLSI-DSP-6-12

VLSI Digital Signal Processing Systems

Retiming (3/3)

r(1)=-1, r(2)=0, r(3)=-1, r(4)=0 r(5)=-1, r(6)=-1, r(7)=-2, r(8)=-1

Lan-Da Van

VLSI-DSP-6-13

VLSI Digital Signal Processing Systems

Outline
Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions

Lan-Da Van

VLSI-DSP-6-14

VLSI Digital Signal Processing Systems

Lifetime Analysis
Lifetime analysis is a procedure used to compute the minimum number of registers required to implement a DSP algorithm in hardware.

Linear lifetimes analysis Circular lifetime analysis

In lifetime analysis, the number of live variables at each time unit is computed, and the maximum number of live variables at any time unit is determined. Forward-backward register allocation technique

Lan-Da Van

VLSI-DSP-6-15

VLSI Digital Signal Processing Systems

Linear Lifetime Analysis

Variables {a , b , c} max {0,1,2,2,2,2,2,2}=2

Periodicity Implicit

Three iterations with N=6


Lan-Da Van

VLSI-DSP-6-16

VLSI Digital Signal Processing Systems

Matrix Transpose Example (1/3)


Transpose

abc def ghi

adg beh c f i

ihgfedcba

Matrix Transpose

ifchebgda

Lan-Da Van

VLSI-DSP-6-17

VLSI Digital Signal Processing Systems

Matrix Transpose Example (2/3)


Tzlout = zero-lantacy output time Tdiff = Tzlout Tinput Toutput = Tzlout + max{-Tdiff}

Lan-Da Van

VLSI-DSP-6-18

VLSI Digital Signal Processing Systems

Matrix Transpose Example (3/3)


Linear Lifetime Chart Circular Lifetime Chart

The minimum register number is 4.


Lan-Da Van VLSI-DSP-6-19

VLSI Digital Signal Processing Systems

Procedures of Forward-Backward Register Allocation


Steps:
Step 1: Determinate the minimum number of registers using lifetime analysis. Step 2: Input each variable at time step according to the beginning of its lifetime. Step 3: Each variable is allocated in a forward manner until it is dead or it reaches the last register. Step 4: Since the allocation is periodic, the allocation of the current iteration also repeats itself in subsequent iterations. Thus, we hash the position for registers at period of N. Step 5: If a variable that reaches the last register and is still alive, then these variables are allocated to a register in a backwardly manner. Step 6: Repeat Steps 4 and 5 as required until the allocation is completed.
Lan-Da Van VLSI-DSP-6-20

VLSI Digital Signal Processing Systems

Register Allocation for Matrix Transpose Example

Lan-Da Van

VLSI-DSP-6-21

VLSI Digital Signal Processing Systems

Outline
Introduction Folding Transformation Register Minimization Techniques Register Minimization in Folded Architecture Conclusions

Lan-Da Van

VLSI-DSP-6-22

VLSI Digital Signal Processing Systems

Procedures of Register Minimization in Folded Architectures


Steps: Step 1: Perform retiming for folding Step 2: Write the folding equations Step 3: Use the folding equations to construct a lifetime table Step 4: Draw the lifetime chart and determine the required number of registers Step 5: Perform forward-backward register allocation Step 6: Draw the folded architecture that uses the minimum number of registers

Lan-Da Van

VLSI-DSP-6-23

VLSI Digital Signal Processing Systems

Folding Architecture Example

Lan-Da Van

VLSI-DSP-6-24

VLSI Digital Signal Processing Systems

Folded Architecture for Matrix Transpose Example

Lan-Da Van

VLSI-DSP-6-25

VLSI Digital Signal Processing Systems

Biquad Filter Example (1/4)


Retiming
Step 1: Retiming

Invalid folding: DF(12) = -3 DF(64) = -4 DF(84) = -3 DF(73) = -3


Lan-Da Van VLSI-DSP-6-26

VLSI Digital Signal Processing Systems

Biquad Filter Example (2/4)


Step 2: Folding Equations DF(UV) = Nw(e) Pu + v - u DF(12) = 4(1) 1 + 1 3 = 1 DF(15) = 4(1) 1 + 0 3 = 0 DF(16) = 4(1) 1 + 2 3 = 2 DF(17) = 4(1) 1 + 3 3 = 3 DF(18) = 4(2) 1 + 1 3 = 5 DF(31) = 4(0) 1 + 3 2 = 0 DF(42) = 4(0) 1 + 1 0 = 0 DF(53) = 4(0) 2 + 2 0 = 0 DF(64) = 4(1) 2 + 0 2 = 4 DF(73) = 4(1) 2 + 2 3 = 1 DF(84) = 4(1) 2 + 0 1 = 1 Step 3: Construct the lifetime table Tinput = u + Pu Toutput = u + Pu + maxv{DF(UV) }

Lan-Da Van

VLSI-DSP-6-27

VLSI Digital Signal Processing Systems

Biquad Filter Example (3/4)


Step 4: Draw the Lifetime Chart

Step 5: Register Allocation

Folding Factor = 4

The minimum number of registers is 2.


Lan-Da Van VLSI-DSP-6-28

VLSI Digital Signal Processing Systems

Biquad Filter Example (4/4)


Step 6: Folded Architecture

Lan-Da Van

VLSI-DSP-6-29

VLSI Digital Signal Processing Systems

IIR Filter Example (1/4)


Step 1: Retiming

Retiming

Invalid folding: DF(31) = -3 DF(41) = -2


Lan-Da Van VLSI-DSP-6-30

VLSI Digital Signal Processing Systems

IIR Filter Example (2/4)


Step 2: Folding Equations Step 3: Construct the lifetime table

DF(UV) = Nw(e) Pu + v - u DF(12) = 4(1) 1 + 1 3 = 0 DF(23) = 4(1) 1 + 0 3 = 5 DF(24) = 4(1) 1 + 2 3 = 2 DF(31) = 4(1) 1 + 3 3 = 1 DF(41) = 4(2) 1 + 1 3 = 0

Tinput = u + Pu Toutput = u + Pu + maxv{DF(UV) }

Lan-Da Van

VLSI-DSP-6-31

VLSI Digital Signal Processing Systems

IIR Filter Example (3/4)


Step 4: Draw the Lifetime Chart

Step 5: Register Allocation

Folding Factor = 2

The minimum number of registers is 3.


Lan-Da Van VLSI-DSP-6-32

VLSI Digital Signal Processing Systems

IIR Filter Example (4/4)


Step 6: Folded Architecture

Lan-Da Van

VLSI-DSP-6-33

VLSI Digital Signal Processing Systems

Conclusions
Present a systematic transformation of timemultiplexed architectures Explore folding techniques to reduce # of functional units Explore register minimization technique to reduce # of registers

Lan-Da Van

VLSI-DSP-6-34

VLSI Digital Signal Processing Systems

References
K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation, Wiley, 1999. S. Y. Huang, Handout of text book, 2004.

Lan-Da Van

VLSI-DSP-6-35

Potrebbero piacerti anche