Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
02003 IEEE
0-7803-7761-31031117.00
E137
datapath widths are either 8 or 16 bits. The details of the VLSI
eh
realization are described in the following subsections.
Conlml Unit
Cacff. Unit
Butlsrtly
ROM
Figure 3. Block diagram of the FFT processor,
U-138
itage number r according to following equations. k t the binary memory write operation. To.reduce the critical path delay, we divide
representation of the current butterfly index be given by the whole operations of the buttertly unit into (s+?) different steps
.........b2.bl.bO)2
Bl = (bn,.2,bn,.3 (6) (the fust step for memory read operation, the following s steps for
arithmetic operation. and the last step for memory write operation)
where m = l o g , N is the number of stages for the rad&-? as indicated in Figure 6. Due to the in-place computation. we have
implementation. From the data flow graph. the elements hi's of BI to schedule the tasks assigned to the pipelined butterfly unit such
can be used as variables in conjunction with the value t to generate that no control hazard occurs during memory accesses. A control
proper ROM addresses. Specifically, we first generate a vector from hazard (see Figure 7(a)) results from the conflict when the butterfly
the ,present r value based on Eq. (7) and then the desired ROM unit intends to access more than two data in the same memory bank.
address p(B1, r ) can be computed by using the vector as a mask to Figure 7(b) shows the schedule to eliminate the control hazard
filter out unwanted b,'s according to Eq. (8). providing that only the single-port memory h available in the
2r-, implementation. The arrangement of Figure 7(b) results in only 50%
1 , qr O
-1 = [ q , , , ~ 2 . q n , ~ J . . . ~ . qfor = 11,22,. .... m (7) hardware utilization of the pipelined butterfly unit. On the contrary,
100% hardware utilization can be achieved if the dual-port memory
is employed in the design. Note that the area occupied by the
memory module is not only proportional to the number of stored
Equation (7) can be easily implemented by resetting a s M register data, but it is also proponional to the number of ports. Obviously,
and then shifting in a "one" from the least significant bit when the the chip area of a dud-port memory is much higher than that of a
stage advances once. And. Eq. (8) represents the masked output of single-port memory.
the bit reversal of the current butterfly index. In both cases, their Since we use a 24.576 MHz clock in our FFT processor, the
implementation cost is almost negligible. arithmetic operation can be fnished within one clock cycle (s =I).
Each buttertly operation. thus, only takes three clock cycles, each
Bank-index
C"", 1 for memory read operation, arithmetic operation. and memory write
operation. In addition, only 50% hardware utilization is achieved
because the single-port memory is employed in our design to reduce
the hardware cost.
iz 02
w
m- m.
I Read I Computation IWntc
Figure 4. The block diagram of the address-generate unit Figure 6. Radix-2 DIT pipelined butterfly unit
T . T , r . - - , ~ , . , T . . T . - - T T . T
3.2 Buttemy Unit
The butterfly unit is the core of F l T processors to determine
the desired clock speed and the resulting throughput. In this work,
the butterfly unit was designed with the simple rad&-2 DIT-FFT
algorithm. As shown in Figure 5 , the arithmetic operations consist os. I I R I C~ I c. I4
of calculating a pair of complex values, A'=A+BW and B'=A-BW,
from a pair of complex inputs, A and B, and the twiddle coefficient
W.
L - - - ~ _ _ _ _ - - - _ - _ _Mulipliar
_____- ~
11-139
4. CHIP REALIZATION AND COMPARISON Results show that our implementation has the potentials of
consuming less silicon area and facilitating the extension for high
AU the modules in our design have been successfully transmission rate requirement.
implemented based on the TSMC (Taiwan Semiconductor
Manufacturing Company) 0.35 jnn lP4M CMOS process and REFERENCES
simulated using Synopsys and Cadence tool. Based on the [I] ETS 300 401, "Radio broadcasting system: Digital audio broadcasting
speciiicatians of DAB channel demodulator, the resulting FFT (DAB)to mobile. portable and fixed receivers", ETSI, 2'edition.. May
processor is capable of completing the four operational modes 1997.
(mode I: 2048 points, mode II: 512 points, mode I11 256 points, 121 J. A. Husiken. F. V. Lax. A. Delaruelle, and N. J. L. Philips.
and mode I V 1024 points) with a clock frequency of 24.576 MHz. "Specification. partitioning and design of a DAB channel decoder." in
Proc.VLSI Signal Processing Workhap, pp. 21-29. 1993.
The corresponding physical layout is shown in Figure 8, in which it
131 M. B o k . D. Clawin, K. Gieske. F. Hofmnn. T. Mlasko, M. J. Ruf. and
includes 2 x 1 0 2 4 ~ 1 6SRAMs (two banks, each containing 1024x16 G. Spreitz "The receiver engine chipset for digital audio broadcasting,"
bits) and 2 x 1 0 2 4 ~ 8ROMs (one for the real part and another for the in hoc. URSI Int. Symp. Signals. System. and Electronics. pp. 338-342,
imaginary part). In terms of the 2-input NAND gate, the total 1998.
number of gate counts is 4351, excluding the used memories. The 141 A. Delamelk, J. Huisken. 1. V. Loan. and F. Welten. "A chip set for a
resulting core s u e of the chip implementation is about 2086x1806 digital audio broadcasting channel decoder." in hoc. IEEE Custom
pn2 and the overall chip size including U 0 pads is 2856x2594 pn'. Integrated Circuit Coni.. pp. 13.4.1-13.4.4. 1995.
151 A. Delaruelle. J. Huisken. 1. van Laan, and F. Welten. "A channel
demodulator IC for digital audio broadcasting,'' in hoc. IEEE Custom
Integrated Circuits Conf. 1994. pp. 47-50. 1994.
161 S. He. and M. Torkelson. "Design and implementation of a 1024-point
pipeline F l T processor." in Proc. IEEE Custom Integrated Circuits Coni,,
pp. 131-134,1998.
171 E. Bidet, D. Castelain. C. Jaanblanq. and P. Senn. "A fast single-chip
implementation of 8192 complex paint FTT." IEEE I. Solid-State
Circuits, vol. 30. no. 3. pp. 300-305, March 1995.
181 E. Cedn. Richard C. S . Morling and I. Kale. "An extensible complex fast
Fourier transform processor chip for real-time specmm analysis and
m~suremenf."IEEE Trans. Instrumentation and Measuremnt. vol. 47.
no. 1. pp.95-99, Feb. 1998.
191 H. F. Lo, M. D. Shieh. and C. M. Wu, "Design of an efficient FFI
processor far DAB system" in Proc. IEEE Inl. Symp. Circuits and
System. 654-657.2001
[IO1 E. 0.Brigham The Fnsf Fourier Tonsform and ifs Applications.
Prentice-Hall Inc.. 1988.
Figure 8. The layout of the developed FFT processor, [Ill M. Biver, H. Kaeslin, and C. TormMsini. "In-place updating of path
metiics in Viterbi decaders," IEEE J. Solid-State Circuits. vol. 24.pp.
We compare the performance of our implementation with the 1158-1159,Aug.1989.
following FFT implementations: the pipelined architecture I71 and
the single butterfly architecture IS]. The circuit complexities of Table 1. Comparisons of different implementations
these designs are compiled in Table I.The pipelined architecture in E. Bidet A. Delaruelle
171 might be the preferred choice for high-speed applications, but it Proposed
171 151
is not suitable for the application of DAB system. The memory
bandwidth problem of [5] is solved by introducing more
No. of butterfly
unit
logy, radix-r 1 I , radix-4 I I . radix-2
complicated structure (the radix-4 butterfly unit) and utilizing more
memory resources. Note that the operation frequency of [5] is 3*( logy -1) CM"'
CM 1 CM
Arithmetic
12.288MHz. By taking advantages of efficient memory partition and 4 *log: Adder'" 4 Adder Sub
1 Adder
components
employing the pipelined butterfly unit, our design can reduce the 4 Sub
I I
4*log: Sub"' 4 Registe
required area complexity and it still fits in the DAB specifications.
For DAB applications, it is clear that our design outperforms Gate counts of 8160*( log: -1)
Delaruelle's work. arithmetic 9156 2954
components +896* log:
5. CONCLUSION Memory size 2048 (dual- ort) 2x2048
No. of clock
Up to date, lots of efforts have been devoted to the 4xA,")
cycles
development of low-cost DAB products. Of the key techniques to
N = 2048 2458 1 I264 22528
build a DAB receiver. the FFT is one of the key components, which
is very suitable for ASIC implementation. This paper explores Note: (1) C M %bit complex-number multiplier, (2) A d d 16-bits
efficient solutions for hardware implementations of the FFT N
processor such that they can fit in the specification of the Eureka- adder, (3) S u b 16-bit subtractor. (4) A , = --log:, and ( 5 ) A2 =
4
147 standard under limited hardware resources. AU the functional
blocks are designed, simulated, and verified using the Synopsys and
Cadence software and the fmd layout is ready for VLSI fabrication
based on the 0.35 p n TSMC process and Compass cell library.
11-140