Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Outline
Objective Introduction Digital Signal Processing (DSP) Field Programmable Gate Array (FPGA) Distributed Arithmetic (DA) Architecture New approach to LUT design of FIR digital filter Simulation Results Advantages Applications Conclusion Future scope References
Objective
Digital filters are a very important part of DSP. Advancements of digital signal processing functions in FPGA has put great efforts in designing efficient architectures for DSP functions. Conventional design of an Digital FIR Filter based on the direct implementation of a K-tap FIR filter requires K multiply-and-accumulate (MAC) blocks. we first present DA, which is a multiplier-less architecture. There is an exponential increase in the size of the memory with respect to order of the filter. Proposed architecture is designed with a new approach to LUT based multiplier whose memory is reduced to half.
Motivation
The size of each LUT is fixed. No multiplier units so complexity reduces. Low power consumption. Performance increases and hence speed increases. The multipliers are fast and efficient. The multipliers can be cascaded with each other or CLB logic for larger or more complex functions. The memory used for implementing the LUT multiplier is exactly half of that of the DA architecture. Decoders, Adders, Registers, Latches used in implementing for the design of multiplier has reduced.
Introduction
Digital Signal Processing (DSP) is one of the most active area in VLSI
applications Traditionally, DSP algorithms are implemented either using general purpose
DSP processors (Low speed, less expensive, flexible) or using ASICs (High
speed, expensive, less flexible) FPGAs provide solutions that maintain both the advantages of the approach
Introduction
DSP applications include multiply-and-accumulate (MAC) blocks, which
FIR Filters
I/O
external connections
Out A B C D
LUT
Clock
L o o k U p Ta b l e ( L U T )
The basic features of LUT is : o Complete times table of all possible input combinations Look-Up Table is possible to store binary data within solid-state devices. Those o One address bit solid-state in each devices storage "cells" withinfor each bit memoryinput are easily addressed by o Table size grows exponentially driving the "address" lines of the device with the proper binary value(s). o Very limited use o Fast - result is N-inputs can be used away Look-up table with just a memory access to implement any combinatorial function of N inputs
y[n]
For the above multiplier, y[n] purely depends on x[n]. Thus, a look-up table (LUT) can be used to implement the multiplier.
For example, a 256 X 16 bit memory can be used to implement a 8-bit ,multiplier if one of its input is always constant.
In this equation, the hk are the fixed coefficients, K is the number of filter taps and xk are the input data words. These ones have a standard fixed-point format number. Using registers, memory resources and a scaling accumulator does the implementation of digital filters using this arithmetic.
o Let A be a fixed coefficient and X be an input word to be multiplied with A. o If X is an unsigned binary number of word length L, there can be 2L possible values of X and accordingly there can be 2L possible values of product C=A.X. o Therefore, for the conventional implementation of memory based multiplication, a memory unit of 2L words is required to be used as look-up-table consisting of pre-computed product values corresponding to all possible values of X.
Table: LUT words and product values for input word length L=4
P r o p o s e d L U T- B a s e d M u l t i p l i e r f o r 4 - B i t Input
The proposed LUT-based multiplier for input word size is shown in the following figure:
L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
The various modules included in the above block diagram are : o 4 to 3 bit Address Encoder :
The 4-to-3 bit input encoder is shown in Fig. 3(b). It receives a four-bit input word (x3x2x1x0) and maps that onto the three-bit address word, according to the logical relations.
L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o 3 to 8 Line Address Decoder : The decoder takes the 3-bit address from the input encoder, and generates 8 word-select signals, to select the referenced-word from the memory-array.
Fig: 3 to 8 decoder
L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o Control Circuit :
The number of shifts required to be performed on the output of the LUT and the control-bits and for different values of are shown Table. The control circuit accordingly generates the control-bits given by,
L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
o Barrel Shifter : The LUT output is required to be shifted through 1 location to left when the input operand is one of the values. Two left-shifts are required if is either (0 1 0 0) or (1 1 0 0). Only when the input word is (1 0 0 0), three shifts are required. For all other possible input operands, no shifts are required. Since the maximum number of left-shifts required on the storedword is three, a two-stage logarithmic barrel-shifter is adequate to perform the necessary left-shift operations.
L U T- B a s e d M u l t i p l i e r f o r 4 - B i t I n p u t (contd)
NOR cell :
o The RESET bit is fed to one of the inputs of all those NOR gates, and the other input lines of 8 NOR gates of NOR cell are fed with 8 bits of LUT output in parallel. o When RESET = 1, the output is 0. o When RESET = 0, the outputs of NOR gates is just the compliment of the LUT output-bits.
Conclusion
Traditionally, direct implementation of a K-tap FIR filter requires K multiplyand-accumulate (MAC) blocks, which are expensive to implement in FPGA due to logic complexity and resource usage. An alternative to computing the multiplication is to decompose the MAC operations into a series of lookup table (LUT) accesses and summations.
Advantage of this method is the LUTs readily available in the FPGAs can be
utilized efficiently. This work presents the proposed DA architectures for FIR filters, i.e., multiplier less architecture. Then, the complexity is reduced. Hence there is low power consumption. Then performance increases. Then the speed increases.
Future Scope
Future scope of this project is to improve the architecture of the Distributed arithmetic FIR filter such that it uses the hardware resources of the latest FPGA
families.
In vertex-5 and Vertex-6 family FPGAs, 6-input LUTs were introduced. Future work includes changing the architecture which uses 6-input LUTs for storing coefficient sums and SRL(Shift register logic) macros to implement shift operations such that total number of slices used will be reduced.
Bibliography
References: o DIGITAL SIGNAL PROCESSING Principles, Algorithms, and Applications by John G.Proakis, Dimitris G.Manolakis o DIGITAL SIGNAL PROCESSING by NagoorKani o SWITCHING THEORY AND LOGIC DESIGN by R.P.Jain o Wang Sen, Tang Bin, Zhu Jun, Distributed Arithmetic for FIR Filter Design on FPGA o o o o Websites: www.wikipedia.org/wiki/FIR www.wikipedia.org/wiki/daFIR www./ipcores/distributedarithmeticFIRd.cfm www.daFIR.cfm