Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1, January 2009
1. Introduction
(2)
j
WN = e N is twiddle factor [10]. Since 16 and
64 is a power of four, radix-4 decimation-in-frequency
algorithm is used to break the DFT formula into four
smaller DFTs. The FFT is the speed-up algorithm of
DFT [7]. The final sets of transforms look like
where
N/4-1
0 kn
N
3N
X(4k)= [x(n)+x(n+ N
4 )+x(n+ 2 )+x(n+ 4 )]WN WN/4
n=0
(3)
N/4-1
n kn
N
N
3N
X(4k+1)= [x(n)-jx(n+ 4 )-x(n+ 2 )+jx(n+ 4 )]WN WN/4
n=0
(4)
N/4-1
2n kn
N
N
3N
X(4k+2)= [x(n)-x(n+ 4 )+x(n+ 2 )-x(n+ 4 )]WN WN/4
n=0
(5)
N/4-1
3n kn
X(4k+3)= [x(n)+jx(n+ N )-x(n+ N )-jx(n+ 3N )]W W
N N/4
4
2
4
n=0
(6)
For 16-point FFT for k=0, 1, 2, 3 we get 16
equations and 64-point FFT k varies from 0 to 15. The
flow graph of 16-point FFT is seen in Figure 1. In this
2
2009
Stage 2
Com
muta
tor
Butte
rfly
C1 C2 C3
Butte
rfly
Stage v
Com
muta
tor
Butte
rfly
out
C4 C5 C6
Coefficient
2. Implementation
2.1. Commutator
In realtime applications, input data is a sequential
stream. Therefore, it does not match the FFT algorithm
since the FFT requires temporal re-ordering of data.
For this reason, the commutator is needed to reorder
the input data. Among several pipelined FFT
architectures, Radix-4 Single-path Delay Commutator
Nt
Nt
Nt
O1
Nt
O2
Nt
O3
Nt
m1
0
1
2
3
O4
c1
1
0
0
0
c2
1
1
0
0
c3
1
1
1
0
ar
ai
add/sub
add/sub
br
bi
add/sub
cr
ci
add/sub
dr
di
add/sub
Re
Exor
add/sub
0->addition
1->subtraction
Im
m1
0
1
2
3
c1 c2 c3
0 0 0
1 0 1
0 1 1
1 1 0
Original
quantized
coefficient
7fff, 0000
7fff, 0000
7fff, 0000
7fff, 0000
7fff, 0000
7641, cf04
5a82, a57d
30fb, 89be
Coefficient
sequence
m1 = 2,3
Wo
W2
W4
W6
Wo
W3
W6
W9
Original
quantized
coefficient
7fff, 0000
5a82, a57d
0000, 000
a57d, a57d
7fff, 0000
30fb, 89be
a57d, a57d
89be, 30fb
Constant
5a82
Inverter
Mux
Comm
on sub
exp
block
Input
Switch
Unit
Inverter
S5
X
S1
Constant
7641
Constant
30fb
Inverter
S4
S2
Swap
R
Output
Switch
Unit
I
S1
4
2009
5a82
inverter
mux
7f62
inverter
S2
S3
swap
S5
S4
7d8a
inverter
18fd
inverter
S3
swap
S5
S4
7a7d
inverter
2528
inverter
S3
swap
S5
S4
inverter
7641
S3
swap
O
U
T
S
W
I
T
inverter
30fb
S5
S4
70c2
inverter
3c56
inverter
S3
swap
3.3. Reports
S5
S4
inverter
6a6d
S3
swap
inverter
471c
S5
S4
u
n
i
t
Input: 1, 1 ,1 ,1, 2 ,2 ,2 ,2 ,1 ,1 ,1 ,1 , 0 , 0 , 0 , 0,
1, 1 ,1 ,1, 2 ,2 ,2 ,2 ,1 ,1 ,1 ,1 , 0 , 0 , 0 , 0,
1, 1 ,1 ,1, 2 ,2 ,2 ,2 ,1 ,1 ,1 ,1 , 0 , 0 , 0 , 0,
1, 1 ,1 ,1, 2 ,2 ,2 ,2 ,1 ,1 ,1 ,1 , 0 , 0 , 0 , 0
Output: 64, 0, 0, 0, -16.109-24.109i, 0, 0, 0, 0, 0, 0,
0, 9.9864-1.9864i,0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1.3273-6.6727i , 0, 0, 0, 0, 0, 0, 0,
4.7956+3.2044i ,0, 0, 0, 0, 0, 0, 0,
4.7956-3.2044i, 0, 0, 0, 0, 0, 0, 0,
1.3273+6.6727i, 0, 0, 0, 0, 0, 0, 0,
9.9864+1.9864i, 0, 0, 0, 0, 0, 0, 0,
16.109+24.109i, 0, 0, 0.
inverter
0c8b
I
n
s
w
i
t
c
h
62f2
inverter
5133
inverter
S3
swap
S5
S4
S5
S1
FFT Size
16-point
64-point
256-point
Frequency(MHz)
200
166.66
125
D
E
M
U
X
Shift-Add
Module
Shift-Add
Module
M
U
X
o/p
Attributes
Leakage(mw)
Internal (mw)
Net ( mw)
Switching (mw)
Total (m w)
Area(mm2)
64-point
0.0015
21.372
2.555
23.927
47.856
0.3216
256-point
0.0038
69.153
7.590
76.744
153.49
0.8417
3. Results
16-point
0.0012
11.106
1.779
12.885
25.772
0.2112
Leakage
(w)
Internal
(mw)
Net
( mw)
Switching
(mw)
Total
(m w)
Cell Area
(mm2)
Comm
(I )
Comm
(II)
Multiplier
Butter
fly
0.31
0.09
0.3
0.25
3.978
0.993
3.5793
3.4008
0.430
0.128
0.9848
0.9624
4.408
1.122
4.5641
4.3632
8.817
2.444
9.128
8.726
0.069
0.018
0.0442
0.0420
Leakage
(w)
Internal
(mw)
Net
(mw)
Switching
(mw)
Total
(mw)
Cell Area
(mm2)
Comm
(I )
Comm
(II )
Comm
(III )
Multiplier
Butter
fly
1.14
0.31
0.09
0.97
0.25
References
15.303
3.9781
0.9932
6.676
3.40
1.572
0.4303
0.1287
1.889
0.96
16.875
4.4083
1.122
8.566
4.36
33.751
8.817
2.444
17.13
8.72
0.2606
0.0690
0.0185
0.159
0.04
Comm
(I)
Comm
(II)
Comm
(III)
[2]
[3]
Comm
(IV )
4.4
1.14
0.31
0.09
60.601
15.303
3.9781
0.9932
6.140
1.572
0.4303
0.1287
66.742
16.875
4.4083
1.122
133.487
33.751
8.817
2.444
1.0270
0.2606
0.0690
0.0185
[4]
[5]
Cell Area
(mm2)
Multiplier
Butterfly
1.5
0.25
21.859
3.4008
5.276
0.9624
27.135
4.3632
54.271
8.726
0.044224
0.0420
4. Conclusion
In this paper a pipelined architecture for 16 point, 64point and 256-point radix-4 DIF FFT in fixed point
representation is implemented. Low power FFT
processor is implemented by using multiplier less (shift
add) approach for multiplying twiddle coefficient.
This paper presents a multiplier-less pipelined FFT
processor architecture suitable for shorter FFTs. This
design approach can also be applied to the longer
FFTs. The multiplier-less architecture employs the
minimum number of shift and addition operations to
realize the complex multiplications. By combining a
commutator architecture and low power butterfly
architecture with this approach, the resulting power
savings are around 43% and 59% for 64-point and 16point radix-4 FFTs, respectively, as compared to a
conventional FFT architecture based on non-booth
coded wallace tree multiplier. The parameterization
[6]
[7]
[8]
6
2009