Sei sulla pagina 1di 14

IMPLEMENTATION OF DSP 1 IMPLEMENTATION OF DSP 2

FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

TOPICS
• Discrete Fourier Transform
PIPELINE IMPLEMENTATIONS OF THE
• Fast Fourier Transform
FAST FOURIER TRANSFORM (FFT) – Decimation in time
– Decimation in frequency
• FFT pipelines:
– Radix-2 multi-path delay commutator
– Radix-2 single-path delay feedback
• Consulted work: – Delay buffer implementation
Chiueh, T.D. and P.Y. Tsai, OFDM Baseband Receiver Design for Wireless – Radix-4 algorithms
Communications, John Wiley and Sons Asia, (2007).
Second edition of 2012 available as e-book via UT Library.

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 3 IMPLEMENTATION OF DSP 4


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DISCRETE FOURIER TRANSFORM (DFT) INVERSE DFT (IDFT)


• Consider a block of N samples of a (possibly complex-valued) • The inverse DFT is almost the same computation as the DFT:
data stream:

• The discrete Fourier transform of this block is given by:

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 5 IMPLEMENTATION OF DSP 6
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

TWIDDLE FACTORS MATRIX REPRESENTATION OF DFT


• Define: • The DFT can be expressed in matrix form:

• Then DFT becomes:

• is called a twiddle factor (it is a number on the unit circle


in the complex plane).
• Multiplying with a twiddle factor is a vector rotation. How to • Number of complex multiplications involved (including trivial
implement? ones with 1, j, etc.):
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 7 IMPLEMENTATION OF DSP 8


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DFT VISUALIZATION FAST FOURIER TRANSFORM


• The DFT basically matches frequencies. • Reduces the number of calculations in a DFT from ଶ
to
• The following picture originates from Wikipedia (entry: “DFT
matrix”): • First published by Cooley and Tukey in 1965.
• Check e.g. Wikipedia page for historical background dating
back to Gauss in 1805 (earlier than Fourier!).

• Two variants:
– Decimation in time
– Decimation in frequency.

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 9 IMPLEMENTATION OF DSP 10
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-TIME FFT (1) DECIMATION-IN-TIME FFT (2)


• Split odd and even terms in DFT definition: • Consider first half of outputs:

• Rewrite, using :

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 11 IMPLEMENTATION OF DSP 12


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-TIME FFT (3) DECIMATION-IN-TIME FFT (4)


• DFT has been expressed in terms of half-size DFTs: • Now consider second half of outputs:

DFT of even
samples

DFT of odd
samples
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 13 IMPLEMENTATION OF DSP 14
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-TIME FFT (5) DECIMATION-IN-TIME FFT (6)


• Make use of the following identities: • The expression for the second half becomes the same as for
the first except for a minus sign:

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 15 IMPLEMENTATION OF DSP 16


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-TIME FFT (7) DECIMATION-IN-TIME BUTTERFLY


• elementary FFT operation with two inputs and two outputs
• consisting of one complex multiplication, one complex addition
N/2-point and one complex subtraction:
DFT

N/2-point
DFT

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 17 IMPLEMENTATION OF DSP 18
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-TIME FFT (8) 8-POINT DECIMATION-IN-TIME FFT


• One decimation-in-time step defines an N-point DFT in terms of
two N/2-point DFTs. They are combined by means of
butterflies.
• Applying the principle recursively, results in a computation that
consists of butterflies only.

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 19 IMPLEMENTATION OF DSP 20


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

COMPLEXITY REDUCTION IMPLEMENTATION ISSUES


• Buffering of input is needed because of block-based nature.
• Number of computations has been reduced:
– From for the DFT • One could e.g. use a ping-pong memory:
– While the input is filling one memory, the FFT could consume samples of
the other memory.
– To for the FFT
– After processing one block of samples, the memories change roles.
• Note the bit reversal of the addresses:
– Address order can be found from increasing binary addresses read in
reverse order 4 = 1002 e.g. becomes 1 = 0012.

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 21 IMPLEMENTATION OF DSP 22
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-FREQUENCY FFT (1) DECIMATION-IN-FREQUENCY FFT (2)


• Split input block in first and second half and consider the • Shift index in second sum:
outputs with even index:

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 23 IMPLEMENTATION OF DSP 24


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-FREQUENCY FFT (3) DECIMATION-IN-FREQUENCY FFT (4)

• Using simplification rules mentioned earlier: • This is a half-size DFT applied to an input stream consisting of
pairs of the original input stream:

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 25 IMPLEMENTATION OF DSP 26
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-FREQUENCY FFT (5) DECIMATION-IN-FREQUENCY FFT (6)


• Now consider the outputs with odd index:
• With the usual type of simplifications:

• This is a half-size DFT applied on a sequence obtained by


taking the difference of pairs of the input and multiplying them
with factor .
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 27 IMPLEMENTATION OF DSP 28


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DECIMATION-IN-FREQUENCY FFT (7) DECIMATION-IN-FREQ. BUTTERFLY


• Similar to the decimation-in-time butterfly, but location of
multiplication is now at the output side.
N/2-point
DFT Decimation in time Decimation in frequency

N/2-point
DFT

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 29 IMPLEMENTATION OF DSP 30
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

8-POINT DECIMATION-IN-FREQ. FFT SUMMARY FFT BASICS


• Decimation-in-time (DIT) FFT:
– “Divide and conquer” approach to DFT, based on
grouping even and odd inputs in DFT definition
– Butterfly: multiply before add/subtract
• Decimation-in-frequency (DIF) FFT:
– “Divide and conquer” approach based on grouping
even and odd outputs
– Butterfly: multiply after add/subtract
• Both are of radix-2 type: problem size is
reduced by 2 at each stage

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 31 IMPLEMENTATION OF DSP 32


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

FFT IMPLEMENTATIONS RADIX-2 MULTI-PATH DELAY


• FFTs can be realized on all kind of platforms: COMMUTATOR (R2MDC)
– Programmable processors
– Dedicated hardware • Pipeline solution:
• Here attention is paid to one type of implementation, viz. the – “Stream-like” processing of block-based algorithm.
FFT pipelines. An FFT pipeline transforms a computation with a • Examples based on 8-point FFT:
block processing nature into one of a streaming nature. – solutions scale for higher powers of 2.
• Issues of concern:
– Area
• Originally proposed in:
– Speed
– Power Rabiner, L.R. and B. Gold, Theory and Application of Digital Signal
Processing, Prentice-Hall, Englewood Cliffs, New Jersey, (1975).
– Memory access

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 33 IMPLEMENTATION OF DSP 34
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

R2MDC FOR 8-POINT FFT R2MDC STEP-BY-STEP


A
• Consists of

Butterfly
Butterfly

Butterfly
– Commutators, switches that send the data either straight ahead or 4D 2D 1D
crisscross
– Butterfly blocks (either for DIT or DIF)
2D 1D
– Delay buffers
B

Butterfly
Butterfly

Butterfly
4D 2D 1D

2D 1D
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 37 IMPLEMENTATION OF DSP 38
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

RADIX-2 SINGLE-PATH DELAY R2SDF ELEMENT


FEEDBACK (R2SDF)
• Element:
– Either shifts first half of input in
delay buffer and second half of
Delay line
• Alternative pipeline solution, with optimized memory size output out of delay buffer (blue);
– Or shifts second half of input into
butterfly together with first half from

Butterfly
delay buffer, first half of output to
• Originally proposed in: next stage and second half output
Groginsky, H.L. and G.A. Works, A Pipeline Fast Fourier Transform, IEEE into delay buffer (red).
Transactions on Computers, Vol.C-19(11), pp. 1015-1019, (November
1970).

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 39 IMPLEMENTATION OF DSP 40


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

SYMBOLILC REPRESENTATION OF R2SDF FOR 8-POINT FFT


R2SDF ELEMENT
Delay line

feedback 4D 2D 1D
Delay line
Butterfly

Butterfly

Butterfly

Butterfly
Butterfly

single path single path

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 41 IMPLEMENTATION OF DSP 42
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

R2SDF STEP-BY-STEP R2SDF EFFICIENCY


B
A 4D 2D 1D • It needs to operate at same speed as input sample rate.
E F
• Hardware utilization is 50%.
Butterfly

Butterfly

Butterfly
• Number of delay elements for N-point FFT:
C D G H
– R2SDF:

E: . . . . . . 0 1 2 3 4 5 6 7 – R2MDC:
A: . . . . 0 1 2 3 4 5F:6 .7 . . . 0 1 2 3 4 5 6 7
B: 0 1 2 3 4 5 6 7 G: . . . . 0 1 2 3 4 5 6 7
• R2SDF and R2MDC have the same number of (complex)
C: 0 1 2 3 4 5 6 7 H: . . . . . . 0 1 2 3 4 5 6 7
adders and multipliers.
D: . . . . 0 1 2 3 4 5 6 7
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 43 IMPLEMENTATION OF DSP 44


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DELAY-BUFFER IMPLEMENTATION DUAL-PORT RAM DELAY BUFFER


• The straightforward implementation of a delay buffer in
hardware would be a shift register. • Dual-port RAM:
– One read port with its own
• Such an implementation is not convenient (think of e.g. a 1024- data and address
point FFT): – One write port with its own

Dual-port RAM
– Memory elements implemented by D-flipflops are large! data and address Write Write
– Shifting all data at every clock cycle consumes a lot of energy! • Cyclic buffer: L+1 locations addr. data
to store L items.
+1 Read Read
• Much better idea to keep the data in RAM and shift the address
pointer(s) instead. addr. data

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 45 IMPLEMENTATION OF DSP 46
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

DELAY BUFFER WITH TWO SINGLE- SPECIAL-PURPOSE RAMS


PORT RAMS input
WL-1 • Replace address decoder of RAM by single-bit shift register.
• Idea is to WL-1
• Only one D-flipflop has output 1.
alternatively read • Not much switching activity in shift-register (no waste of power).
from and write to the
two RAMs. Addr. Addr.
• Two single-port

wen

wen
RAMs are cheaper 1 1
+1
than one dual-port
RAM1 RAM2
RAM.
• Use LSB of address
WL
to connect to “write
enable” (wen). output
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 47 IMPLEMENTATION OF DSP 48


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

RADIX-4 FFT RADIX-4 DIF (1)


• Has both “decimation in time” as “decimation in frequency”
variants.
• Idea is to express DFT as the combination of 4 DFTs whose
sizes are one fourth of the original DFT.
• Takes advantage of the following symmetries:

• This leads to a reduction of the number of multipliers


(multiplication by j can be performed without multiplier).
© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 49 IMPLEMENTATION OF DSP 50
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

RADIX-4 DIF (2) RADIX-4 DIF BUTTERFLY

Scaling with +1, -1, +j, -j


© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 51 IMPLEMENTATION OF DSP 52


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

MULTIPLIER & ADDER COMPLEXITY RADIX-22 BUTTERFLY


• Radix-2 decomposition:
– Number of stages:
– Complex multiplications per stage: in total:
– Complex additions per stage: in total:

• Radix-4 decomposition:
– Number of stages:
– Complex multiplications per stage: in total:
– Complex additions per stage: in total:

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands
IMPLEMENTATION OF DSP 53 IMPLEMENTATION OF DSP 54
FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

RADIX-22 BUTTERFLY EVALUATION RADIX-4 MULTI-PATH DELAY


COMMUTATOR (R4MDC) PIPELINE FOR
• Radix-22 has the same number of multipliers as radix-4.
• It has fewer additions:
16-POINT FFT

• In a similar way, a radix-8 butterfly can be decomposed in a 12D 3D


radix-23 butterfly.

Butterfly

Butterfly
8D 1D 2D

4D 2D 1D

3D

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

IMPLEMENTATION OF DSP 55 IMPLEMENTATION OF DSP 56


FFT BASICS + FFT PIPLELINES April 6, 2018 FFT BASICS + FFT PIPLELINES April 6, 2018

RADIX-4 SINGLE-PATH DELAY EVALUATION RADIX-4 PIPELINES


FEEDBACK (R4SDF) PIPELINE FOR • The multi-path structure:
16-POINT FFT – Has a high memory overhead;
– Has 100% multiplier utilization;
4D 1D – Can operate at one quarter of sample frequency.
4D 1D
• The single-path structure:
4D 1D – Has minimal memory;
– Has a 75% multiplier utilization (multiplier drawn outside butterfly!)
Butterfly
Butterfly

– Operates at sample frequency.

© Sabih H. Gerez, University of Twente, The Netherlands © Sabih H. Gerez, University of Twente, The Netherlands

Potrebbero piacerti anche