Sei sulla pagina 1di 5

Design of a High Speed Reconfigurable Fast

Fourier Transform Architecture for High Rate


WPAN Applications
Author 1

Abstract In this paper, we propose the design of a high


speed reconfigurable FFT architecture which is a key
component in the Orthogonal Frequency Division
Multiplexing modulation scheme. OFDM has been adopted
in the physical layer design of high rate WPANs. This
proposed architecture consists hardware components for
radix-2 ,radix-22,radix-23,radix-24,radix-25 algorithms to
compute FFT according to the user requirements.In
particular,this generalized radix-2 k FFT architecture uses
complex constant multiplier instead of a complex booth
multiplier to reduce the number of complex multiplications.
In addition,the size of the twiddle factor memory is reduced
by using a Canonic Signed Digit multiplication
technique.Thus,the proposed architecture is built with
reduced hardware requirements and less number of complex
multiplications in computing FFT. The twiddle factor memory
is also reduced thereby increasing the speed of the
computation.

algorithms simultaneously achieve a simple butterfly and a


reduced number of twiddle factor multiplications. The radix-2
algorithm is a well known simple algorithm for FFT
processors, but it requires many complex multipliers.
Recently various radix 2k FFT algorithms and architectures
have been studied in order to reduce the number of complex
multipliers.
In this brief, a reconfigurable FFT architecture to compute
512-point FFT using radix-2 k algorithm with the high-speed
is proposed.The key ideas for achieving high date throughput,
reduced hardware complexity are described .The power
consumption and hardware cost can be saved in our processor
by using the higher radix FFT algorithm and less memory
and complex multipliers.
The organization of this brief is as follows. Section II
describes the radix 2k FFT algorithm, and Section III
describes the proposed 512-point radix 2 k FFT architecture. In
Section IV, the implementation and comparison are
presented. Finally, conclusions are provided in Section V.

Index Terms reconfigurable FFT architecture,radix-2k


algorithm,complex constant multiplier,CSD multiplication.

I.

INTRODUCTION

The present day real time applications demand, the FFT


to be calculated at very high throughput rates, sometimes in
the range of Giga samples per second.These high
performance requirements appear in applications such as
millimeter Wave wireless local area network ,wireless
personal area network systems and real time video streaming
services in short range indoor environments. The FFT/IFFT
processor has a high hardware complexity in the OFDM
modulation of high rate WPAN systems. One OFDM symbol
in the IEEE 802.11ad standards consists of a length of 512
subcarriers. Therefore, FFT processor conducts the FFT
computation with 512-point arithmetic and should provide a
high throughput rate of atleast 2.115 GS/s.
The radix of the algorithm greatly influences the
architecture of the FFT processor and the complexity of the
implementation.A small radix is desirable because it results in
a simple butterfly.Nevertheless, a high radix reduces the
number of twiddle factor multiplications.The radix 2 k

II . RADIX -2K ALGORITHM


The N-point DFT is formulated as
N 1

X ( k ) x( n)W Nnk , k 0,1,...N 1


n 0

(1)
Where the twiddle factors is defined as W nk e
N

2nk
N

.The

n denotes the time index and the k denotes the frequency


index. The radix 2k algorithm can be derived by integrating
twiddle factor decomposition through a divide and conquer
approach.
A.

Radix -22 Algorithm

Consider the first two steps of decomposition in radix-2 DIF


FFT together.Applying a 3-dimensional linear index map as
follows

N
N
N
n1 n 2 n3 {n1 , n 2 0,1n3 0 ~
1}
2
4
4
(2)
N
k k1 2k 2 4k 3 {k1 , k 2 0,1k 3 0 ~
1}
4
n

The DFT has the form of

(6).Applying this cascade decomposition recursively to the


remaining DFTs of length N 4 in Eq.(6), the complete
1
N
N
nkradix -22 FFT algorithm is obtained.Equation (7) represents
X ( k 1 2k 2 4k 3 )
x( n1 n 2 n 3 )W N
the first two columns of butterflies with only trivial
2
4
n3 0 n2 0 n1 0
multiplication of (-j) which can be implemented using only
real-imaginary swapping and sign inversion.
The radix-22 algorithm is characterized according to the
merit that it has the same multiplicative complexity and as
the radix-4 algorithm,but still retains simple structures of the
radix-2 butterfly.
N
1
1
4

N
1
4 1

N
( n2n3)(k12k24k3)
k1
4
N 2 3 N
n30n20 2

N
{B ( n n )}W
4

B.

Radix-23 Algorithm

To derive the radix-23 algorithm, the first three steps in


cascade decomposition are considered.The linear index
mapping transforms into 4-dimensional linear index maps,
N
N
N
n
n1
n2
n3 n 4 N
2
4
8
(8)
k k1 2k 2 4k 3 8k 4 N

(3)

Applying 4-dimensional linear index map to Eq.(1)


N
1
1
8

where the first butterfly structure has the form of

1
1
N
N
N
N X (k 1 2k 2 4k 3 8k 4 ) x ( N n1 N n 2 N n 3
k1
B ( n 2 n 3 ) x ( n 2 n 3 ) ( 1) x ( n 2 n 3 )
2
4
8
n 4 0 n 3 0 n 2 0 n1 0
4
4
4
2
k1
N
2

(4)
Decomposing the composite twiddle factor,it can be expressed
in Eq.(5).
(

WN

N
n2 n3 )( k1 2 k 2 4 k3 )
4

( j)

n2 ( k1 2 k 2 )

n3 ( k1 2 k 2 )
N

n3k 3
N
4

(5)
Substituting the Eq.(5) into Eq.(3) and expanding the
summation with regard to index n 2 ,we have a set of 4 DFTs
of length

WN 2

n1

( 1)

N
N
n 2 n 3 n 4 )( k1 2 k 2 4 k 3 8 k 4 )
4
8

n1k1

( j)

n 2 ( k1 2 k 2 )

n 3 ( k1 2 k 2 4 k 3 )
8

[ H Nk1k2 (n3 )WNn3 ( k1 2 k2 ) ]WNn3k3

n3 0

length N 8 is identified.

H Nk1k2 (n3 )

where a secondary butterfly structure

is

expressed
as

[T

n4 0

k1k 2 k 3
N 8

( n 4 )W Nn4 ( k1 2 k2 4 k3 ) ]W Nn48k 4

(11)
Where the third butterfly has the expression of

H Nk1k2 ( n3 ) B Nk1 ( n3 ) ( 1) k2 ( j )k1 B Nk1 ( n3


2

N
)
4

W8( k1 2 k2 ) .Since they are a constant scalar with

2
(1 j )) k1 a constant multiplier can be used
2
instead of a programmable multiplier such as the Booth
multipliers. Full complex multiplications are used to apply
( j ) k2 (

(7)
After these two columns, full multiplications are used to

WNn3 ( k1 2 k2 ) in

N
)
8

(12)
Equation (12) reveals that the butterfly contains twiddle
factors with

apply the decomposed twiddle factor

N
1
8

TNk1k82k3 ( n4 ) H Nk1k42 ( n4 ) ( 1) k3 W8( k1 2 k2 ) H Nk1k42 ( n4

(6)

(10)

with regard to index n1 , n 2 and n3 ,a set of 8 DFTs of

X ( k1 2k 2 4k 3 8k 4 )
N
1
4

n4 k 4
N 8

Substitute Eq.(10) into Eq.(9) and expand the summation

X ( k1 2k 2 4k 3 )

(9)
with the cascade decomposition, the twiddle factor can be
expressed in the form of

Eq.

the decomposed twiddle factor,

WNn4 ( k1 2 k2 4 k3 ) ,after

the

third column.The complete algorithm can be obtained by


repeating the procedure.
C.

The radix -25 algorithm can be expressed as various formulas


using a common factor algorithm. The radix -2 5 algorithm is
given as follows.

Radix - 24 Algorithm
Applying a 6-D linear index map

First the 4 steps in the cascade decomposition must be


considered.The linear index mapping is transformed into 5dimensional linear index map,
N
N
N
N
n
n1
n2
n3
n4 n5 N
2
4
8
16
(13)
k k1 2k 2 4k3 8k 4 16k5 N
Applying 5-dimensional linear index map to Eq.(1)

X ( k1 2k 2 4k 3 8k 4 16k 5 )
N
1
1
16

n5 0 n4 0 n3 0 n2 0 n1 0

n4 (k12k2 4k38k4 ) n5 (k12k2 4k38k4 ) n5k5


16
N
N 16

}W

(15)

n5 0

k5
( n5 )WNn5 ( k1 2 k 2 4 k3 8 k 4 ) ]WNn516

32

kk k

TN 182 3 ( n 5 ) ( 1) k 4 W16( k1 2 k 2 4 k 3 ) TN 182 3 (n 5


k k k3

(16)

D. Radix -25 Algorithm

N
N
N
N
n2 n3
n4
n5
4
8
16
32

[J

n6 0

N 32

k6
( n6 , k1 , k 2 ,k 3 , k 4 , k 5 )WNn6 ( k1 2 k2 4 k3 8 k4 16 k5 ) ]WNn632

(21)
The radix -25 algorithm is expressed as follows :
N

n1

N
N
N
N
n2 n3 n4 n5 n6 )( k1 2 k 2 4 k 3 8 k 4 16 k 5 32 k 6 )
4
8
16
32

(22)
Generally,programmable complex multiplier is used for
complex multiplications;however, if the twiddle factor has a
small number of coefficients,then the complex constant
multiplier can be used for the twiddlefactor multiplications.
The complex multiplication of the twiddle factors,

W32n , W16n and W8n can be implemented in the canonic

In this brief, a reconfigurable radix -2 k FFT architecture is


proposed. The proposed architecture consists of butterfly
units, complexBooth multipliers, complex constant
multipliers,single path delay feedback buffers and a control
unit.

N (17)
16

TN 1 82 3 ( n5 ) H Nk1 ,4k2 ( n5 ) ( 1) k3 W8( k1 2 k2 ) H Nk1 ,4k2 ( n5


(18)

III .PROPOSED ARCHITECTURE

( n5 ) denotes the third butterfly unit.

kk k

signed digit (CSD) constant multiplier, which contains the


fewest number of non-zero digits [10]. Hence, the area and
power consumption of the complex multipliers can be
reduced.

G Nk1k162k 3k 4 ( n 5 )

where, TN 1 82

where the fourth butterfly structure has the expression of

kk k

k6
( 1) n4 k4 ( j ) n5k 4 ( 1) n5k5 WNn6 ( k1 2 k 2 4 k3 8 k4 16 k5 )W Nn632

X ( k1 2k 2 4k 3 8k 4 16k 5 )
k1k 2 k 3k 4
N 16

( 1) n1k1 ( j ) n2 k1 ( 1) n2 k2 W8n3 ( k 2 k2 ) ( 1) n3k3 W32( 2 n4 n5 )( k1 2 k 2 4 k

DFTs of length N 16 .

[G

x( 2 n

with regard to index n1 , n 2 , n3 and n4 ,we have a set of 16

N
1
1
32

WN 2

Substituting Eq.(15) into Eq.(14) and expand the summation

N
1
16

(19)

X ( k 1 2k 2 4k 3 8k 4 16k 5 32k 5 )

n 6 0 n5 0 n 4 0 n 3 0 n 2 0 n1 0

W {( )1 ( j) W

The common factor algorithm takes the form of

N
N
N
N
x ( n1 n 2 n 3 n 4 n 5 )(20)
W Nnk
2
4
8
16
N

n1k1 n2 (k12k2 ) n3 (k12k2 4k3 )


8

.W

k k1 2k 2 4k 3 8k 4 16k5 32k 6

(14)
With the cascade decomposition the twiddle factor can be
expressed in the form of

nk
N

N
N
N
N
N
n1
n2
n3
n4
n5 n6
2
4
8
16
32

N
)
8

A.

Butterfly Units

The butterfly units

perform complex additions and

subtractions of two input data: x (n ) and x ( n

N
) . The
2

behavior of the butterfly units is as follows. All input values


are saved into the delay buffers

until the

N
2

th

input is

entered. Then, the butterfly units conduct calculations


between the input values and delay buffer outputs, after
enteringthe (

N
N
) 1st input. During the last
2
2

clock

cycles, all butterfly calculations are performed at each stage.


Butterfly unit 1 (BU1) conducts complex additions and
subtractions only. However, butterfly unit 2 (BU2) includes
twiddle factor W4 multiplication utilizing the multiplexers
and control signals.
Fig .5.Modelsim Output
B . Complex Multiplier and Constant Multiplier
The twiddle factor multiplication is conducted using fixed
width complex multipliers. The twiddle factor values stored in
the read-only memory (ROM) are used as the multiplicand in
the complex Booth multiplier. The complex Booth multiplier
is widely used for high-speed operation .
The proposed FFT processor uses constant multipliers based
on the canonical signed digit (CSD) representation for the
complex multiplication arithmetic in appropriate stages .
Mostly the existing research is using complex Booth
multipliers for the twiddle factor multiplication. However, in
this design, the complex CSD constant multiplier has been
used for the twiddle factor multiplication.The constant
multiplier using the CSS technique is implemented using the
common calculation patterns X1, X2, and X3. The proposed
FFT processor applied CSD constant multiplier instead of
complex multiplier at several stages. Thus, the hardware
complexity of complex multiplier is decreased by at least 54%
in comparison with using complex Booth multiplier. In
addition, the twiddle factor LUT size is reduced to 50%
compared to the designs using the complex Booth multipliers.

Fig .6.Hardware Implementation of Shift Add Module

V. SIMULATION AND IMPLEMENTATION


VERILOG is frequently used for two different goals:
simulation of electronic designs and synthesis of such
designs. Synthesis is a process where a VERILOG is
compiled and mapped into an implementation technology
such as an FPGA or an ASIC. Many FPGA vendors have free
tools to synthesize VERILOG for use with their chips, where
ASIC tools are often very expensive.

Fig .7. Hardware Implementation of add/sub unit


IV .CONCLUSION
In this paper ,a radix -2 k algorithm and 512 point
reconfigurable radix -2k FFT architecture have been proposed
for OFDM-bsed WPAN applications.The number of complex
Booth multipliers and twiddle factor LUTs are reduced using
the radix -2k algorithm. The proposed radix -2k FFT processor
is the most area-efficient architecture for the 512-point SDF

FFT processors.The proposed architecture has potential


applications in high-rate OFDM-based WPAN systems.

REFERENCES
[1] Taesang Cho and Hanho Lee, A high speed low- complexity modified radix
-25 FFT processor for high rate WPAN applications, IEEE Trans. Very Large
Scale Integr.(VLSI) Syst., vol. 21, no. 1, pp. 187191, January.2013.
[2]Y. Lin, H. Liu, and C. Lee, A 1-GS/s FFT/IFFT processor for UWB
applications, IEEE J. Solid-State Circuits, vol. 40, no. 8, pp. 17261735, Aug.
2005.
[3] J. Lee and H. Lee, A high-speed two-parallel radix -2 4 FFT/IFFT processor
for MB-OFDM UWB systems, IEICE Trans. Fundam., vol. E91-A, no. 4, pp.
12061211, Apr. 2008.
[4] Y. Chen, Y. Tsao, Y. Wei, C. Lin, and C. Lee, An indexed-scaling pipelined
FFT processor for OFDM-based WPAN applications, IEEE Trans. Circuits
Syst. II, Exp. Briefs, vol. 55, no. 2, pp. 146150, Feb. 2008.
[5] M. Shin and H. Lee, A high-speed four-parallel radix -2 4 FFT processor for
UWB applications, in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), 2008, pp.
960963.
[6] S. Tang, J. Tsai, and T. Chang, A 2.4-GS/s FFT processor for OFDM based
WPAN applications, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 57, no. 6,
pp. 451455, Jun. 2010.
[7] S. Huang and S. Chen, A green FFT processor with 2.5-GS/s for IEEE
802.15.3c (WPANs), in Proc. Int. Conf. Green Circuits Syst. (ICGCS), 2010,
pp. 913.
[8] T. Cho, H. Lee, J. Park, and C. Park, A high-speed low-complexity
modified radix -25 FFT processor for gigabit WPAN applications, in Proc.
IEEE Int. Symp. Circuits Syst. (ISCAS), 2011, pp. 12591262.
[9] A. Cortes, I.Velez, and J. F. Sevillano, Radix rk FFTs: Matrical
representation and SDC/SDF pipeline implementation, IEEE Trans. Signal
Process., vol. 57, no. 7, pp. 28242839, Jul. 2009.
[10] K. Cho, K. Lee, J. Chung, and K. Parhi, Design of low-error fixedwidth
modified booth multiplier, IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 12, no. 5, pp. 522531, May 2004.
[11] R. I. Hartley, Subexpression sharing in filters using canonic signed
digit multipliers, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 43,no. 10, pp.
677688, Oct. 1996.

Potrebbero piacerti anche