Sei sulla pagina 1di 7

FIR FILTER ARCHITECTURE ENHANCEMENT USING 16-BIT

APPROXIMATE MULTIPLIER

Abstract—​ FIR filter is valuable in numerous


applications, for example, present day signal
handling and correspondence frameworks.
In this paper an enhanced FIR filter
planned by utilizing 16X16 approximated
multiplier dependent on parallel prefix
adder is proposed. This proposed 16X16
approximate multiplier structured with
four 8X8 approximate multipliers, three
parallel prefix adder[PPA] and one OR
gate. The parallel prefix adders gives the
less insertion delay, this prompts increment
in the superior for the count in the less time.
The 8X8 multiplier structured utilizing estimated
tree compressor[ATC] and convey maskable snake
[CMA]. The proposed multiplier is compared and a
regular Wallace tree multiplier diminished basic
way delay by 10%. This proposed multiplier improves
the execution of the FIR channel. It is executed in
Xilinx ISE rendition 14.7.

Keywords​-Approximate multiplier, Parallel prefix adder,


Brent-kung adder

I. INTRODUCTION
Multipliers are among the fundamental parts of the
numerous computerized systems and, henceforth, their
power dissipation and speed are important.
For transportable applications any place the capacity
utilization is that the most huge parameter, one should
to decrease the power dissipation the
mostly double. one in each
of the best manners by which to
scale back the dynamic power dissipation,
so power dispersal during this paper,
is to attenuate the entire change, i.e., the whole assortment of
sign advances of the framework.

VLSI Integer adders do applications in


Math and Logic Units (ALUs),
chip and memory tending to
units. Speed of the adder regularly chooses the
minimum clock cycle time in a chip.
The requirement for a Parallel
Prefix Adder (PPA) is that it is fundamentally
quick when compared with a ripple carry adder.
PPA is a group of adders determined
from the generally realized look ahead adders.
These adders are appropriate for
increments with more extensive word lengths. PPA
circuits utilize a tree system to reduce the
latency to 2^n (logn10) where 'n' is
the number of bits. This part manages
the structure proposition and execution
of new prefix snake engineering for 8-bit,
16-bit, 32-bit and 64-bit expansion. The
proposed structures have the least
number of calculation hubs when
contrasted and existing one's. This
decrease in equipment of the proposed
structures receives a reward in the
type of diminished power and power-delay
item.

Arithmetic circuits are the ones which


perform basic activities like
addition, subtraction, duplication,
division, parity calculation. The majority of the
time, structuring these circuits is the equivalent
as structuring muxers, encoders and
decoders. In hardware, an adder or
summer is a computerized circuits that
performs expansion of numbers. In numerous
PCs and other sort of processors,
adders are different types of the processor,
numerous PCs and different sorts of
processors, where they are utilized to
figure locations, table and comparable.
The double adder is the one sort of
component in most advanced circuit plans
counting advanced sign processors(DSP)
also, chip information way units.

In this way quick and precise activity of


advanced framework relies upon the
performance of adders. Henceforth
improving the presentation of adder is the
primary zone of research in VLSI framework
plan.

II. PREVIOUS WORK


2.1 8X8 APPROXIMATE MULTIPLIER
Multiplier consists of three parts: (i) partial product generation with an AND gate; (ii) PPR with
an adder tree; and (iii) addition to give the final result using a CPA. both Power consumption as
well as circuit complexity can be overcome by the PPR, and the multiplier’s critical path is
dominated by the propagated carry chain in the CPA.

2.2 APPROXIMATE TREE COMPRESSOR


Figure 2.1(a) shows the half adder, for which the equation can be obtained: {c,s}=a +
b=2c+s=(c+s)+c, where {,} and + denote concatenation and addition, respectively.

Fig.2.1(a) Accurate half adder and (b) incomplete adder cell.

The value c is generated by a AND b and s is generated by a XOR b,so(c+s) can be generated by
a OR b. Based on the above, consider the basic logic cell shown in Fig. 1(b), for which the
following equations can be obtained: p=c+s, q=c, {c,s}=a+b=p+q. By extending the row of
iCACs from two to inputs, n/2 Ps and n/2 Qs are obtained. If the sum of the n/2 Qs is used
instead of the n/2 Qs themselves, the number of Qs is reduced to one. Remember that P is always
greater than or equal to S, and Q is equal to C. By exploiting these facts, OR gates can be used to
generate the approximate sum of the n/2 Qs without significant loss of accuracy. This
approximate sum is called the accuracy compensation vector and is denoted by V. This method is
named approximate tree compressor (ATC). An ATC with inputs is called an ATC- n, and the
structure of an ATC with eight inputs (ATC-8) is shown in Fig.2.2. The rectangles represent
rows of iCACs and the number of iCACs in each row (rectangle) is dependent on the bit width of
the inputs. For example, if there are eight -bit inputs (D1, D2, …, D8), four rows of iCACs are
required to build a -bit ATC-8. This reconstruction generates four approximate sums, P1, P2, P3,
and P4, and four error recovery vectors, Q1, Q2, Q3, and Q4. OR gates generate the accuracy
compensation vector V. As a result, the eight inputs have been reduced to five

Fig.2.2 Structure of an approximate tree compressor with eight inputs.

2.3 CARRY-MASKABLE ADDER


A CMA is proposed to control the accuracy flexibly and dynamically. A Kbit CMA comprises
(K-1) carry-mask able full adders and one carry-mask able half adder, and its structure is similar
to that of a -bit CPA. The structures of the proposed carry-mask able half and full adders are
shown in Fig. 4. In the proposed half adder, when mask_x is 0, S is equal to x OR y and Cout is
equal to 0. Otherwise, when mask_x is 1, S is equal to x XOR b and Cout is equal to x AND y.
In other words, the operation of the proposed half adder can be controlled by the active-low
signal mask. When mask is disabled (=1), it functions as an accurate half adder, and when mask
is enabled (=0), Cout is masked to 0 and it functions as an OR gate with output S. The operation
of the proposed full adder is similar to the half adder: when mask is disabled (=1), it functions as
an accurate full adder, and when mask_x is enabled (=0), Cout is equal to Cin and S is the output
of an OR gate.

Fig.2.3 (a) Carry-mask able half adder, (b) Carry-maskable full adder

2.4 OVERALL STRUCTURE


An n-bit multiplier consists of "n rows, each of which has partial products (PP), so there are nxn
PPs in total. Using the ATC introduced in the previous section, the rows can be replaced by
n/2+1 rows. Figure 2.4 shows an example of an 8-bit multiplier with 8x8 PPs. The PPR is
performed in three stages (Stage 1, Stage 2, and Stage 3) and the CPA is performed in Stage 4.
The PP generation step is not shown. Each dot represents a PP. The least significant bit (right
side) is bit 0, and the most significant bit (left side) is bit 14. The solid rectangles in Stage 1
represent ATCs and the dashed rectangles represent rows of seven iCACs. Every row of iCACs
includes PPs that are not processed: for example, the PP at position 0 in the first row and the one
at position 8 in the second row of the first iCAC block in ATC-8 are not processed.

Fig.2.4 Structure of an 8-bit multiplier with 8 x 8 partial products

In Stage 1, eight rows of PPs are reduced to four rows (P1, P2, P3, and P4) and one accuracy
compensation vector (V1) by an ATC-8. The four rows are further reduced to two rows (P5 and
P6) and another accuracy compensation vector (V2) by an ATC-4. A final row of iCACs then
processes P5 and P6 and generates P7 and Q7. In summary, Stage 1 uses an ATC-8, an ATC-4,
and a row of seven iCACs to compress the $$ PPs to four rows (P7, V1, V2, and Q7). In Stage 2,
there are four PPs for each of bits 4 to 10. In order to achieve a lower path delay, OR gates are
used to sum V1 and V2 approximately. The empty circles for V1 and V2 represent the bits which
are summed using OR gates. Seven OR gates are required in total and the four rows are
compressed to three. In Stage 3, full adders and half adders are used to compress the three rows
to two. Two half adders are required for bits 1 and 13, and eleven full adders are required for bits
2 to 12. Addition using a CPA is required after PPR to produce the final result

III PROPOSED WORK


3.1 PARALLEL PREFIX ADDERS
The PPA is like a Carry Look Ahead Adder. The production of the carriers the prefix adders can
be designed in many different ways based on the different requirements. We use tree structure
form to increase the speed of arithmetic operation. Parallel prefix address are faster adders and
these are faster adders and used for high performance arithmetic structures in industries. The
parallel prefix addition is done in 3 steps. 1. Pre-processing stage 2. Carry generation network 3.
Post processing stage Pre-processing stage: In this stage we compute, the generate and propagate
signals are used to generate carry input of each adder. A and B are inputs. These signals are
given by the equation 1&2. Pi=Ai xor Bi (1) Gi =Ai and Bi (2) Carry generation network: In this
stage we compute carries corresponding to each bit. Execution is done in parallel form [4]. After
the computation of carries in parallel they are divided into smaller pieces. carry operator contain
two AND gates, one OR gate. It uses propagate and generate as intermediate signals which are
given by the equations 3&4

Fig.3.1 Black cell, Gray cell

P(i:k)=P(i:j) . P(j-1:k) (3)


G(i:k)=G(i:j)+(G(j-1:k) . P(i:j)) (4)
Post processing Stage:
This is the concluding step to compute the summation of input bits. It is common for all the
adders and the sum bits are computed by logic equation 5& 6:
Ci-1= (Pi and Cin) or Gi (5)
Si=Pi xor Ci-1 (6)

3.2 BRENT-KUNG ADDER


Brent-Kung viper is an entirely remarkable logarithmic snake building that gives a perfect
number of stages from commitment to all yields yet with disproportionate stacking on each
center stage. It is one of the parallel prefix adders. Parallel prefix adders are exceptional class of
adders that rely upon the usage of produce and spread signs. The cost and wiring multifaceted
nature is less in brent kung adders. Regardless, the entryway level significance of Brent-Kung
adders is 0 (log2(n)), so the speed is lower. The square chart of 4-bit Brent-Kung snake is
appeared in Fig.3.2.

Fig.3.2 Block Diagram of 16-Bit Brent Kung Adder

Customary Carry Select Adder includes double Ripple Carry Adders and a multiplexer. Brent
Kung Adder has reduced deferral when appeared differently in relation to Ripple Carry Adder. In
this way, Regular Linear BK CSA is structured utilizing Brent Kung Adder. Customary Linear
KS CSA comprises of a solitary Brent Kung viper for Cin=O and a Ripple Carry Adder for
Cin=1. It has four gatherings of same size. Each gathering comprises of single Brent Kung snake,
single RCA and multiplexer. We use tree structure in Brent Kung adder to build the speed of
number juggling activity.
3.3 16X16 APPROXIMATE MULTIPLIER USING PARALLEL PREFIX ADDER
The 16×16 bit approximate multiplier consists of four 8×8 bit approximate multiplier, three
parallel prefix adder and one OR gate. The use of parallel prefix adder enhances the
computational speed of the multiplier. In this multiplier the parallel prefix adder is Brent-Kung
adder. It will improve the speed of the multiplication. Usage of parallel prefix adders gives the
results in minimum time, this leads to increase in the performance of the calculations in less
time.

Fig.3.3 16x16 Approximate multiplier with PPR

3.4 FIR FILTER


The output of an FIR filter of length N can be computed using the relation y(n)= ∑ ℎ(𝑖). 𝑥(𝑛 − 𝑖)
𝑁−1 𝑖=0 (1) The computation of (1) can be expressed by the recurrence relation

Fig.3.4 FIR filter with 16x16 Approximate multiplier

IV. EXPERIMENTAL RESULTS

Fig.4.1 Simulation output of FIR filter with 16x16 Approximate multiplier

Device utilization summary:


----------------------------------
Selected Device: 3s1200efg320-4
Number of Slices: 458 out of 8672 5%
Number of Slice Flip Flops: 6 out of 17344 0%
Number of 4 input LUTs: 819 out of 17344 4%
Number of IOs: 307
Number of bonded IOBs: 307 out of 250 122%
Number of GCLKs: 1 out of 24 4%

Timing Summary:
----------------------
Speed Grade: -4
Minimum period: 4.333ns (Maximum Frequency: 230.784MHz)
Minimum input arrival time before clock: 4.932ns
Maximum output required time after clock: 45.904ns
Maximum combinational path delay: 45.619ns
Critical path delay FIR filter with Wallace tree multiplier 54.675 ns
FIR filter with Approximate multiplier [Serial adder] 49.040 ns
FIR filter with Approximate multiplier [PPA]

V CONCLUSION
A 16x16 surmised multiplier based
FIR channel has been proposed in this paper
that has a shorter basic way delay than
the ordinary structure. The multiplier
was assessed at both the circuit and
application levels. The trial results
show that the proposed multiplier
had the option to speedups while keeping up a
fundamentally littler circuit region than that
of the traditional Wallace tree
multiplier. The proposed multiplier
conveyed more prominent enhancements in basic
way delay than other already
concentrated surmised multipliers. At long last,
the capacity of our proposed multiplier to
improve speed of the FIR filter

Potrebbero piacerti anche