Sei sulla pagina 1di 57

A Project Report

on

IMPLEMENTATION OF LUT BASED MULTIPLIER FOR


SHORT WORD LENGTH DSP SYSTEMS
Submitted in partial fulfilment of the requirements for the award of the degree of

BACHELOR OF TECHNOLOGY
In
ELECTRONICS AND COMMUNICATION ENGINEERING

Submitted by

A. DURGA BHAVANI - 15FE1A0405


D. ANJANEYULU - 15FE1A0437
A. SWETHA - 15FE1A0404
P. PRADEEP - 15FE5A0428

Under the esteemed guidance of


Mr.J.VEERAYYA.,M.Tech
Assistant professor

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING


VIGNAN’S LARA INSTITUTE OF TECHNOLOGY AND SCIENCE
(An ISO 9001:2008 Certified, Approved by AICTE, Affiliated to JNTUK, Kakinada)
VADLAMUDI-522213, GUNTUR Dist., ANDHRAPRADESH.
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING
VIGNAN’S LARA INSTITUTE OF TECHNOLOGY & SCIENCE
(An ISO 9001:2008 Certified, Approved by AICTE, Affiliated to JNTU, KAKINADA)
VADLAMUDI-522213, GUNTUR Dist., ANDHRA PRADESH.

CERTIFICATE

This is to certify that the project work entitled “IMPLEMENTATION OF LUT


BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS” is a bonafide
work done by A.DURGA BHAVANI (15FE1A0405), D.ANJANEYULU (15FE1A0437),
A.SWETHA (15FE1A0404), P.PRADEEP (15FE5A0428) under my guidance and
submitted in partial fulfilment of the requirements for the award of the degree of Bachelor of
Technology in Electronics & Communication Engineering by Jawaharlal Nehru
Technological University, Kakinada.

Project Guide Head of the Department


Mr. J.VEERAYYA.,M.Tech Mr. M. SUMAN.,M.Tech
Assistant Professor Assistant Professor

EXTERNAL EXAMINER
ACKNOWLEDGEMENT

We are grateful to the Department of Electronics and Communication Engineering,


VIGNAN’S LARA INSTITUTE OF TECHNOLOGY & SCIENCE, which gives us an
opportunity to have profound technical knowledge there by enabling us to complete the
project.

We would like to thank our beloved principal Dr. K. PHANEENDRA KUMAR, for

providing a great support for us in completing our project and for giving us the opportunity of
doing the project.

We feel elated to thank Mr. M. SUMAN Associate Professor and our Head of the
Department, for inspiring us all the way and arranging all the facilities and resources needed
for our project.
We are very thankful to our beloved coordinators Mrs. P. V. N. LAKSHMI, Mr.
B.HARISH andMr. S. NAGARAJU for inspiring all the way and arranging all the
facilitiesand resources needed for project. Their efforts in this aspect are beyond the preview
of the acknowledgement.

It is with immense pleasure that we would like to express our indebted gratitude to
our guide Mr. J.VEERAYYA Assistant Professor who guided us a lot and encouraged us in
every step of our project work. His invaluable moral support and guidance through the
project helped us to a great extent. We are thankful to him for his valuable suggestions and
discussions during this project.

We express our hearty thanks to all the staff members and non-teaching staff for all
their help and co-operation extended in bringing out this project successfully in time.

Project Associates:
A. DURGA BHAVANI (15FE1A0405)
D. ANJANEYULU (15FE1A0437)
A. SWETHA (15FE1A0404)
P. PRADEEP (15FE5A0428)

DECLARATION
DECLARATION

We hereby declare that the work described in this project work, entitled
“IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD
LENGTH DSP SYSTEMS” which is submitted by us in partial fulfilment for the award of
Bachelor of Technology (B.Tech) in the Department of Electronics and Communication
Engineering to the Vignan’s Lara Institute of Technology and Science, Vadlamudi affiliated
to Jawaharlal Nehru Technological University Kakinada, Andhra Pradesh, is the result of
work done by us under the guidance of Mr. J. VEERAYYA, Assistant Professor.

The work is original and has not been submitted for any Degree/Diploma of this or
any other university.

Project Associates:

A. DURGA BHAVANI (15FE1A0405)


D. ANJANEYULU (15FE1A0437)
A. SWETHA (15FE1A0404)
P. PRADEEP (15FE5A0428)

Place: Vadlamudi.
Date:
ABSTRACT

Short word length (SWL) DSP systems offer good performance as that process less
data typically up to three bits. SWL systems can also be designed using the FPGAs. FPGAs
come with many built-in primitives like look-up tables, flip-flops, additional carry logic,
memories and DSP systems.
This project illustrates a way to use LUT to design three bit (3*3) constant coefficient
unsigned integral multiplier for SWL DSP systems. The major difference between the
conventional constant coefficient memory based multiplier and proposed one is the amount of
memory consumed. In the convention design, for example, for each constant multiplier (0-7),
respective product values would be pre calculated and stored in eight block memories. While
in proposed design, only one LUT based memory module is consumed. This LUT based
memory holds only the product values of the fixed coefficient multiplier 2 and for other
coefficients same product values are modified at the output as per proposed algorithm steps.

i
INDEX
CONTENTS PAGE NO.
ABSTRACT I
LIST OF FIGURES Iv
LIST OF TABLES V
ABBREVIATIONS Vi

CHAPTER 1: INTRODUCTION 1-8


1.1 Introduction 1
1.2 Introduction to VLSI 1
1.2.1 History of VLSI 1
1.2.2 VLSI design process 3
1.2.3 Front end steps 4
1.2.4 Back end steps 5
1.3 VHDL 7
1.4 Reason for implementing the project 7
1.5 Project organization 8

CHAPTER 2: LITERATURE SURVEY 9-14


2.1 Array multipliers 9
2.2 High performance Vedic BCD multiplier using 10
kldfjgklfjgjfgjkfj modified binary to BCD converter
2.3 Parallel decimal multipliers using binary 12
flkgjkjgkfjgjglkfjjmultipliers
2.4 Existing system based LUT 14

CHAPTER 3: LUT BASED MULTIPLIER 15-22


3.1 Introduction to LUT 15
3.2 Implementation of LUT based multiplier for 17
kldfjdfk short word length DSP systems

ii
CHAPTER 4: SOFTWARE DEVELOPMENT 23-34

4.1 Xilinx 23

4.2 FPGA families 23

4.2.1 Old families 25


4.2.2 High-performance families 25
4.2.3 Low Cost Family 25
4.3 Virtex family 25

4.4 Kintex 26

4.5 Artix 26

4.6 Zynq 27
4.7 Spartan family 27
4.8 User interface 27
4.9 Steps to implement the design 28

CHAPTER 5: RESULT 37-39

5.1 Output waveform 37


5.2 Performance parameters 39

CHAPTER 6: CONCLUSION AND FUTURE SCOPE 40

6.1 Conclusion 40

6.2 Future Scope 40

REFERENCES 41

APPENDIX

iii
LIST OF FIGURES

FIG. NO. FIGURE NAME PAGE NO.


1.1 Front end & Back end flow 3
2.1 Array Multiplier 9
2.2 Architecture of 2x2 Vedic BCD multiplier using 11
modified binary to BCD converter
2.3 Decimal multiplier without decimal partial 12
Products
2.4 Architecture of 2x2 parallel decimal multipliers 13
using binary multipliers
2.5 Block diagram of existing system based on LUT 14
3.1 2- input LUT 15
3.2 3- input LUT 17
3.3 Block diagram of LUT based memory to store 18
the product values
3.4 Flow Chart of Proposed system 20
3.5 FPGA based system design 21
4.1 Project Navigator Desktop Icon 28

4.2 Creating new project 29


4.3 Specifying the project properties 30
4.4 Selecting the source type 31
4.5 Specifying the ports 32
4.6 6 Code for the required design 33
4.7 Showing how to save the file 33
4.8 Showing how to do check syntax 34
4.9 Showing how to do simulation 35
4.10 Showing how to synthesis and observing the 36
performance parameters
5.1 Simulated waveform of existing method 37
5.2 Simulated waveform of Proposed method 38

iv
LIST OF TABLES

TABLE TABLE NAME PAGE


NO. NO.
2.1 Performance Parameters of “Parallel Decimal 22
Multipliers Using Binary Multipliers”
3.1 Product values of multiplier 2 and multiplicands 0-7 18

5.1 Performance parameters of proposed system 39

v
ABBREVIATIONS

VLSI : Very Large Scale Integration


IC : Integrated Circuit
CPU : Central Processing Unit
SSI : Small Scale Integration
MSI : Medium Scale Integration
LSI : Large Scale Integration
RTL : Register Transfer Language
DC : Design Compiler
CTS : Clock Tree Synthesis
SPEF : Standard Parasitic Exchange Format
SBPF : Synopsys binary Parasitic Format
STA : Static Timing Analysis
EM : Electro Migration
VHDL Very High Speed Integrated Circuit Hardware Description
: Language
(VHSIC HDL)
FPGA : Field Programmable Gate Array
LUT : Look Up Table
BCD : Binary Coded Decimal
BRAM : Block Random Access Memory
ISE : Integrated Synthesis Environment
CAD : Computer Aided Design
HKMG : High-K Metal Gate
TSMC : Taiwan Semiconductor Manufacturing Company

vi
IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER 1
INTRODUCTION

1.1 Objective
To improve the performance parameters of the multiplier such as thespeed of the
multiplier by reducing the propagation delay and reduce the area of the multiplier by using
the concepts of Vedic mathematics. Consequently, several digital signal processing
algorithms can be improved in terms of performance parameters such as speed, area as they
involve lot of multiplication operations and thereby creating compact VLSI designs.

1.2 Introduction to VLSI


Very-large-scale integration (VLSI) is the process of creating an integrated circuit
(IC) by combining billions of transistors into a single chip. VLSI began in the 1970s when
complex semiconductor and communication technologies were being developed. Before the
introduction of VLSI technology most ICs had a limited set of functions they could perform.
An electronic circuit might consist of a CPU, ROM, RAM and other glue logic. VLSI lets
IC designers add all of these into one chip.

1.2.1 History of VLSI


The History of the transistor dates to the mid-1920s when several inventors attempted
devices that were intended to control current in solid-state diodes and convert them into
triodes. With the invention of transistors at Bell Labs in 1947, the field of electronics shifted
from vacuum tubes to solid-state devices.
With the small transistor at their hands, electrical engineers of the 1950s saw the
possibilities of constructing far more advanced circuits. However, as the complexity of
circuits grew, problems arose.
One problem was the size of the circuit. A complex circuit like a computer was
dependent on speed. If the components were large, the wires interconnecting them must be
long. The electric signals took time to go through the circuit, thus slowing the computer.

DEPARTMENT OF EC E, VLITS PAGE 1


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

The Invention of the Integrated Circuit by Jack Kilby and Robert Noyce solved this
problem by making all the components and the chip out of the same block (monolith) of
semiconductor material. The circuits could be made smaller, and the manufacturing process
could be automated. This led to the idea of integrating all components on a single silicon
wafer, which led to small-scale integration (SSI) in the early 1960s, medium-scale integration
(MSI) in the late 1960s, and then large-scale integration (LSI) as well as VLSI in the 1970s
and 1980s, with tens of thousands of transistors on a single chip (later hundreds of thousands,

then millions, and now billions (109)).

DEPARTMENT OF EC E, VLITS PAGE 2


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

1.2.2 VLSI Design Process


Problem Synthesis

Solution Synthesis
Specification verification

Behavioural
Representation
Verification
ok No

RTL coding
(VHDL/Verilog) Yes

Place & route


Functional
Verification
Parasitic
extraction

Verification
Static timing
Succeeded analysis
No

Yes
Timing
Back end ok No

Yes

Post verification
synthesis

Verification
ok No
Yes

The end

Fig1.1 Front end & Back end flow

DEPARTMENT OF EC E, VLITS PAGE 3


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

The VLSI design flow can be divided into two parts: Frontend design flow and
Backend design flow. Both together, allow the creation of a functional chip from scratch to
production.
1.2.3 Front end steps:
i. System specifications:
This indicates the requirements of the circuit that has to be designed. It is required to
make it simple for the designer with an idea for a particular application to turn that idea into
working system on a very large scale integrated chips.
ii. Behavioural representation:
This involves the description of how the circuit should communicate with outside
world. Typical issues at this representation level includes the number of I/O terminals and
their relation. This helps in optimizing the circuit.
iii. RTL Coding:
In digital circuit design, register transfer level is a design abstraction which models a
synchronous digital circuit in terms of the flow of digital signals between hardware registers
and the logical operations performed on those signals. RTL abstraction is used in hardware
description languages like VHDL or Verilog to create high level representations of a circuit
from lower level representations and ultimately actual wiring can be derived.
iv. Functional verification:
It is defined as the process of verifying that an RTL design meets its specification
from a functional perspective.

DEPARTMENT OF EC E, VLITS PAGE 4


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

1.2.4 Back end steps:


i. Synthesis:
Synthesis is responsible for converting the RTL description into a structural gate
level based netlist. This netlist instantiates every element (standard cells and macros) that
compose the circuit and its connections. Synthesis can be described as follows:
Synthesis = Translation + Optimization + Mapping

ii. Synthesis verification:


This step is to verify a set of reports, which have information about timing, area,
fanout and shows the violations to the defined constraints. These reports must be interpreted
to check if there are violations (setup time, hold times, area, max transition, etc.).
In case of violations DC can try to fix them by running optimization algorithms. If DC
cannot fix the violations, one must go back to RTL coding. With these reports it is possible to
check if the design is synthesizable and therefore, if it is possible to proceed.
The final verification before proceeding to Place & Route is to run Formality, which
is a logical verification tool. It takes the final netlist generated by DC and checks the logical
equivalence with the RTL description.
iii. Place & Route:
Place & Route is the backend stage that converts the gate level netlist produced during
synthesis into a physical design. Although the name denotes for two phases, the Place &
Route stage can be divided in three steps: Placement, Clock Tree Synthesis (CTS) and
Routing.
Placement involves placing all macros and cells into a certain and predefined space. It
is done in two phases. The first one, called Coarse Placement, places the standard cells in
order to optimize timing and/or congestion but not taking in account overlapping prevention.
The second phase, which is named Legalize, eliminates overlap problems by placing the
overlapping cells in the closest available space.
iv. Parasitic extraction:
Parasitic extraction has the objective to create an accurate RC model of the circuit so
that future simulations and timing, power and IR Drop analyses can emulate the real circuit
response. Only with this information, all the analyses and simulations can report results close
to the real functioning of the circuit. This way this stage needs to precede all signoff analyses.

DEPARTMENT OF EC E, VLITS PAGE 5


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Star RCXT is the Synopsys tool capable of performing parasitic extraction. It takes
the post-layout Milkyway database and the NXTGRD files provided by the foundry (cells
parasitic information) and produces SPEF (Standard Parasitic Exchange Format) and SBPF
(Synopsys Binary Parasitic Format) files.
v. Static Timing Analysis (STA):
STA is a method to obtain accurate timing information without the need to simulate
the circuit. It allows detecting setup and hold times violations, as well as skew and slow paths
that limit the operation frequency.
Synopsys PrimeTime allows running STA over a physical design, for each corner.
Taking as inputs the post-layout netlist and parasitic and standard cells information it outputs
a series of reports, which give the possibility to detect timing violations .
vi. Post-layout Verification :
Once again, formality should be run to check the logical equivalence of the post-
layout netlist with the RTL description.
The huge number of transistors in a circuit can make the voltage level drop below a defined
margin that ensures that the circuit works properly. IR Drop analysis allows checking the
power grid to ensure that it is strong enough to hold that minimum voltage level. Synopsys
PrimeRail is the tool that outputs IR-drop and EM analyses reports.

DEPARTMENT OF EC E, VLITS PAGE 6


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

1.3 VHDL:
VHDL is a versatile and powerful hardware description language which is useful for
modeling electronic systems at various levels of design abstraction. VHDL (VHSIC HDL or
very high speed integrated circuit hardware description language) and VERILOG are
programming languages that help us automate the design of IC’s. These languages enable us
to simulate our designs before actually sending it out to fabrication. It helps us to design IC’s
at the RTL level and also enables us to translate this RTL design to a discrete netlist of logic
gates which will then be sent out to layout and routing for proper placement of these logic
gates in the chip. Not only these technologies allow us to simulate, but also allows us to
emulate our design on FPGA, which then enables us to check for possible faults in our
design. These languages are not only used to design the IC’s but also to verify the designs
using test benches (A set of programs that is intended to test the design). These test benches
can also be written in VHDL and verilog languages. These technologies are used in both
design and verification process flow. To be specific it corresponds to RTL level of design.

1.4 Reason for Implementing The Project:


Multipliers play a prominent role in today’s digital world. Many researchers are
studying to design multipliers which have the features of high speed, low power
consumption, regularity of layout and hence less area.
Statistics shows that more than 70% instructions in microprocessor perform addition
and multiplication operations. So these operations dominate the execution time. The demand
of high speed processing has been increasing as a result of expanding computer and signal
processing applications.
Low power consumption is also an important issue in multiplier design. To reduce
significant power consumption, it is good to reduce the number of operation thereby reducing
dynamic power which is a major part of total power consumption. So, the need of high speed
and low power multiplier has increased.
Designer mainly concentrate on high speed and low power efficient circuit design.
The objective of a good multiplier is to provide a physically packed together, high speed and
low power consumption unit. Low power, high speed, less delay multipliers in VLSI can be
designed with unique logical approaches.
There are many anticipated logical approaches for less area and delay and each
approach has its specific advantages in terms of speed and power.

DEPARTMENT OF EC E, VLITS PAGE 7


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

An efficient multiplier should have following characteristics: -


Accuracy: - A good multiplier should give correct result.
Speed: - Multiplier should perform operation at high speed.
Area: - A multiplier should occupy less number of slices and LUTs.
Power: - Multiplier should consume less power.

1.5 Project organization:


The outline of the document is arranged as follows. In chapter 2 the existing methods
and in chapter 3 the proposed methods are discussed clearly. In chapter 4 software used is
presented clearly. In chapter 5 Simulation and Synthesis results and comparison of proposed
method with existing methods are discussed. In Section 6 Conclusion & Future scope are
mentioned. In appendix code is given.

DEPARTMENT OF EC E, VLITS PAGE 8


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER-2
LITERATURE SURVEY

2.1 Array multiplier:


An array multiplier is a digital combinational circuit that is used for the multiplication
of two binary numbers by employing an array of full adders and half adders. This array is
used for the nearly simultaneous addition of the various product terms involved. To form the
various product terms, an array of AND gates is used before the adder array.
An array multiplier is a vast improvement in speed over the traditional bit serial multipliers in
which only one full adder along with a storage memory was used to carry out all the bit
additions involved and also over the row serial multipliers in which product rows (also
known as the partial products) were sequentially added one by one via the use of only one
multi-bit adder. The tradeoff for this extra speed is the extra hardware required to lay down
the adder array. But with the much-decreased costs of these adders, this extra hardware has
become quite affordable to a designer.
Every full adder in the system has two outputs:
1. The sum bit drives to down.
2. The carry bit moves to the lower left full adder.
In spite of the vast improvement in speed, there is still a level of delay that is involved in an

array multiplier before the final product is achieved. This is due to the carry propagation.

Fig 2.1 Array Multiplier

DEPARTMENT OF EC E, VLITS PAGE 9


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

2.2 High Performance Vedic BCD Multiplier using Modified Binary To


BCD Converter:
The basic array multipliers have relatively good performance, but the modified Booth
algorithm reduces the partial product’s numbers by half with respect to the array
multiplication. Therefore, the delay of the multiplier is reduced. Its power dissipation is
comparable to the array multiplier due to the circuitry overhead in the Booth algorithm.
However, circuit techniques can cause this multiplier to have low-power characteristics. The
faster and lower delay multipliers adopt the Wallace tree with modified Booth encoding. But
it would lead to larger power dissipation and area due to the interconnected wires. Hence, it is
not recommended for low-power consumption applications.
Vedic urdhva-tiryagbhyam multiplication process is used to reduce circuitry overhead
in hardware design and “divide and conquer” approach is introduced with it, which reduces
delay. Urdhva tiryagbhyam method of implementation is shown in chapter-3. There are
several ways to implement decimal multiplier:
One way is to prefer the multiplication directly in decimal, another approach is to
convert the operands into binary; then perform the binary multiplication and lastly convert the
result back to decimal. A third approach of decimal multiplication involves performing
decimal digit-by-digit multiplication in binary and then converting the resulting binary partial
product to decimal. These partial products are added as appropriate to form the final decimal
product. To reduce the circuitry overhead and easier implementation; we are using modified
third approach, in which N decimal digits (N is even) are divided into equal parts; then apply
Vedic urdhva triyakbhyam multiplication. After this, decimal adders are used to get the final
output digits. List of BCD adders for implementation purpose are 4x4, 4x2, 4x3 and 2x2 etc.
which reduce slices and speed up overall multiplication process. Architecture of 2x2 BCD
digits multiplication is shown in Fig 2.2. This architecture depicts that if we want to design
2x2 digits multiplier we can use 1x1 multipliers and the Vedic multiplication process is same
as mentioned in the chapter-3. For 4x4 digits multiplier can be implemented easily; by using
2x2 digits multipliers, in this way NxN can also be implemented. In table no. 2.2
Performance parameters of High Performance Vedic BCD Multiplier using Modified Binary
to BCD Converter are shown.

DEPARTMENT OF EC E, VLITS PAGE 10


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Fig 2.2 Architecture Of 2x2 Vedic BCD Multiplier using Modified Binary To BCD Converter

Method Slices Delay

High Performance Vedic BCD


Multiplier using Modified Binary 139 24.29ns
to BCD Converter

Table.No 2.1 Performance parameters of High Performance Vedic BCD Multiplier using
Modified Binary to BCD Converter

DEPARTMENT OF EC E, VLITS PAGE 11


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

2.3 PARALLEL DECIMAL MULTIPLIERS USING BINARY


MULTIPLIERS

To implement decimal multiplication with binary multipliers, we convert the numbers


to binary, do the multiplication and convert the result to decimal. Two design approaches can
be considered: -
1. Use a complete binary multiplier and converters of the size of the operands and of the
result.
2. Subdivide the operands and consider partial products to be added.

A7-0 B7-0

32 32

BCD to BIN BCD to BIN

27 27

AxB

54

BCD to BIN

16

D15-0 = A7-0 X B7-0

Fig 2.3 Decimal Multiplier With Out Decimal Partial Products

The block diagram of Decimal Multiplier without Decimal Partial Products is shown
in fig 2.3. The first approach only uses the converters, besides the multiplier, while the
second approach needs extra adders, to add the partial products, and several converters.
However, the converters utilized in the first approach are larger and will utilize much more
area than the converters used in the second approach. In the first case, the architecture uses
two 8-digit to 27- bit decimal to binary converters, one 27x27 multiplier, and one 54-bit to
16-digit binary to decimal converter.
In the second case, the 8-digit operands are divided into two groups of 4-digits each.
In this case, there are four multiplications implemented with binary multipliers, that is, each
4-digit number is converted to binary and then multiplied. The inner partial products are
added in binary before being converted to decimal to be added to the other partial decimal
products (after binary to decimal conversion).

DEPARTMENT OF EC E, VLITS PAGE 12


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

As expected, the larger binary to decimal converters are very expensive in terms of
area and so the second approach using partial products and smaller converters, is better both
in terms of area and performance.
After converting, the sub-groups of digits of the operands to binary and performing
the cross multiplications, the aligned partial products are added in binary and then converted
to decimal. The three partial products indicate the operations performed and the number of
digits. After this alignment, the three final partial products are added in decimal.

4 Digits 4 Digits

4 Digits 4 Digits

8 Digits

8 Digits 4 Digits

8 Digits

8 Digits 8 Digits

16 Digits

Fig 2.4: Architecture Of 2x2 Parallel Decimal Multipliers Using Binary Multipliers

Method Slices Delay in ns

Performance Parameters Of
Parallel Decimal Multipliers
612 58
Using Binary Multipliers

Table. No 2.2 Performance Parameters of Parallel Decimal Multipliers Using Binary Multipliers

DEPARTMENT OF EC E, VLITS PAGE 13


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

2.4 Existing system based on LUT:


 The existing system is a basic 3*3 LUT based Multiplier shown in fig 2.5.
 For 3*3 multiplication, we need a memory block of 6 bits wide and 64(2^6) words
depth.
 It is simply a look-up table with the addresses arranged so that the first 3 bits of the
address can be configured as the multiplier and the second 3 bits of the address can be
configured as the multiplicand.

DRAWBACKS:
 As the multiplier and multiplicand increases, the total memory would increase.
 Even for small bit width multipliers (such as 2, 3) more memory elements are needed.

Fig: 2.5 Block diagram of existing system based on LUT

DEPARTMENT OF EC E, VLITS PAGE 14


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER-3
LUT BASED MULTIPLIER

3.1 Introduction to LUT:


A LUT, which stands for Look up Table, in general terms is basically a table that
determines what the output is for any given input(s). In the context of combinational logic, it
is the truth table. This truth table effectively defines how your combinatorial logic behaves.
In other words, whatever behaviour you get by interconnecting any number of gates (like
AND, NOR, etc.), without feedback paths (to ensure it is state-less), can be implemented by a
LUT.
The way FPGAs typically implement combinatorial logic is with LUTs, and when the
FPGA gets configured, it just fills in the table output values, which are called the "LUT-
Mask", and is physically composed of SRAM bits. So the same physical LUT can implement
Y=AB and Y=AB', but the LUT-Mask is different, since the truth table is different.
A two input LUT (lookup table) is can be represented generically like this:

Fig 3.1: 2-input LUT

 A LUT consists of a block of SRAM that is indexed by the LUT's inputs. The output of the
LUT is whatever value is in the indexed location in its SRAM.
 Although we think of RAM normally being organized into 8, 16, 32 or 64-bit words, SRAM
in FPGA’s are 1 bit in depth. So for example a 3 input LUT uses an 8x1 SRAM (2³=8)
 Because RAM is volatile, the contents have to be initialized when the chip is powered up.
This is done by transferring the contents of the configuration memory into the SRAM.
 The output of a LUT is whatever you want it to be.

DEPARTMENT OF EC E, VLITS PAGE 15


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

For a two-input AND gate,

Address in ([1:0]) Output

00 0

01 0

10 0

11 1

For your second example, only the truth table changes:

Address in ([1:0]) Output

00 0

01 1

10 0

11 0

Finally, A xor B:
Address in ([1:0]) Output

00 0

01 1

10 1

11 0

 So it is not the same LUT in each case, since the LUT defines the output. Obviously, the
number of inputs to an LUT can be far more than two.

 The LUT is actually implemented using a combination of the SRAM bits and a MUX.

DEPARTMENT OF EC E, VLITS PAGE 16


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

 Here the bits across the top 0 1 0 0 0 1 1 1 represents the output of the truth table for this
LUT. The three inputs to the MUX on the left a, b, and c select the appropriate output value.

Fig 3.2: 3- input LUT

Fig multiplier
3.2 Implementation of LUT based 3.2: 3-input LUT
for short word length DSP
systems: fffff

Figbe3.2:
In binary representation, new data may 3-inputeasily
obtained LUT by just shifting the bits either

left or right. Other goodness in base2 systems is that, the doubling of any value is easy to get
by post fixing the zero as the LSB, as shown in table 1: moving from 2 to 4 in decimal is
Fig 3.2: 3-input LUT
possible in binary with appending zero in the end of base2 representation of 2; similar is the
case with moving from 3 to 6, 6 to 12, and 5 to 10 so on so forth.
This aspect gives us the opportunity Fig to design memory
3.2: 3-input LUTbased area optimized systems,
especially multiplier, as with storing the pre-calculated product values of constant multiplier
like 2 or 3 gives us the opportunity to get the product values of higher factors of constant
multiplier. This scheme works very well for some data but issue is with the numbers whose
least common factor is not available for example 5, 7, 9, 11.

DEPARTMENT OF EC E, VLITS PAGE 17


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Proposed Algorithm:
In the proposed algorithm decimal 2 is taken as the least multiple of all the data.
Multiplier Multiplicand Product value
2 0 0
2 1 2
2 2 4
2 3 6
2 4 8
2 5 10
2 6 12
2 7 14

Table 3.1: product values for multiplier 2 and multiplicands 0-7

Hence, in memory pre-calculated product values for the constant multiplier 2 are stored
and the product of all other multipliers (0, 1, 3, 4, 5, 6, and 7) is achieved by modifying the
output in some ways using two combinational functions: not and concatenation .

Fig.3.3 Block diagram of LUT based memory to store the product values

In Fig.3.3, LUT based memory, storing pre-calculated product values for constant
multiplier 2 is shown. The memory address is 3-bit wide represented by W that has maximum
8 address locations with 2n approach. Each product value stored is of four bits wide
represented with L; hence, making total of 32 bits for total 8 addresses.

DEPARTMENT OF EC E, VLITS PAGE 18


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

As for 3x3 multiplications the output should of 6 bits (W+L). But in our approach, we
only need 4 bits; as shown in fig.3.3. Therefore, we can apply 2 bits at last as appendix to get
the required output. The proposed algorithm for this type of multiplier is given as under.
Algorithm:
This algorithm starts by storing product value in LUT memory where multiplier is 2
and multiplicand is 0, 1, 2, 3, 4, 5, 6, and 7. The Address is the value of Multiplicand.
Step 1: If the Multiplier is 2 and multiplicand is any value from the range 0-7, then the
product value stored at that particular memory location (defined by multiplicand) is net
output. But, if the Multiplier is other than 2 and multiplicand is from 0-7 then re-look in
memory to find if the required output is already calculated. If yes, take the output from that
particular address.
Example: let’s suppose the multiplier be 2 and multiplicand be 5, then 2x5=10 and 10
(10101)2 is already stored at memory location 5. But if the multiplier is not 2 and product
value is still available in memory (like 3x4=12 (1100)2 already stored at location 6), then
simply get the product value at output stored at that particular location.
Step 2: If the Multiplier is other than 2 and the value is not available in memory; look for
nearby value (one less or one greater of expected product) and flip the last bit.
Example: 3x5=15 and this value is not available in memory, so take the nearby value 14
(1110)2 and flip the last bit to make 14(1110)2 to 15(1111)2.
Step 3: If the nearby value is not present, look for any least factor of the product value and
append the bit(s) in the last.
Example: 4x5=20. Here 10 (1010)2 is the least factor of 20 (10100)2. So take the output 10
available at memory location 5 and append zero in the last to get the double of 10 that is 20.
Similarly, suppose multiplier is 3 and multiplicand is 7. Here the product would be
21(10101)2. But 21 are not available in memory. So, this can be achieved by taking 10
(1010)2 at the output and then appending 1 in the last. This will transform the 10 (1010)2 in
to 21(10101)2.
Step 4: If above steps does not give the required output, append two bits in the last to get the
required data.
Example: 7x5 resulting product value is 35 (100011)2 and when we append (11)2in the last
to 8 (1000)2, we can easily get the output 35(100011).

DEPARTMENT OF EC E, VLITS PAGE 19


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Flow chart:
The flow diagram for the addressing scheme and combinational logic is given as under:

Fig.3.4. Flow chart of proposed algorithm

DEPARTMENT OF EC E, VLITS PAGE 20


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Proposed LUT Based Design:

Fig.3.5: FPGA based system design


In Fig. 3.5, FPGA based system design is shown. It can be observed that the circuit
consists of 4 functional elements: Memory module, Multiplexer, Not gate and Concatenation
operator.
Explanation of each block:
Memory: The Memory Module the Look-up table based memory has the capacity of 32 bit.
Total 8 product values of constant multiplier 2 are stored at 8 memory locations of Look-up
table. The input to this memory module is generated by 3 bit address. The address is the
multiplicand being selected from the range of 0 to 7 in the case of 3×3 multiplier.
Multiplexer: In the design, two multiplexers are used. The function of first multiplexer is to
select between the pre-calculated product and the bit flipped value (when the nearby value of
the product is present). The second multiplier is used if the algorithm comes in its third step,
i-e, when the bit filliping does not work, but the bit concatenation is required in order to
double the data.
NOT Gate: The Not gate amongst the two combinational Logics, one is the not gate. The
function of not gate in this design is just to bring the flip in the last bit of the data taken from
the LUT based memory. Flipping the last bit brings the effect of adding one in the memory
output data.
Concatenation: The Concatenation operator The second combinational operation is to
concatenate. As it is discussed above that for 3x3 multiplier the final output would be 6 bits.

DEPARTMENT OF EC E, VLITS PAGE 21


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

The word length of the data stored in the memory is 4. So, availing the opportunity to
complete the final count to 6 bits, we can append 2 bits in the last of the data taken from the
memory at maximum: to making the data double and even triple when required.

DEPARTMENT OF EC E, VLITS PAGE 22


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER-4
SOFTWARE DEVELOPMENT
4.1 Xilinx:
XILINX ISE (INTEGRATED SYNTHESIS ENVIRONMENT) is a software tool
produced by Xilinx for synthesis and analysis of HDL designs, enabling the developer to
synthesize their designs, perform timing analysis, examine RTL diagrams, simulate a
design’s reaction to different stimuli, and configure the target device with the programmer.
XILINX is an American technology company which is primarily a supplier of
programmable logic devices. It is known for inventing the field programmable gate
array(FPGA). It is the first semiconductor company with the design and sale of hardware
devices and semiconductor chips.

4.2 FPGA families:


The FPGA family is a group of chips, each of which is based on a somewhat different
FPGA architecture, and each individual chip in this family is called a sibling. All siblings
have equivalent maximum logic capacities. Instead of attempting to implement all application
circuits in one very flexible FPGA chip, we use the most suitable sibling for each application
circuit. Each sibling is tailored to a certain class of application circuits in some way -- say by
replacing many programmable switches with hard-wired links between logic blocks and by
using longer routing segments. This sibling implements certain circuits very efficiently, but
its reduced flexibility means that some circuits may no longer fit into it at all. We overcome
this reduced flexibility by choosing the architecture of the remaining siblings so that they can
efficiently implement any circuit which will not fit into this chip well.
With good choices for the architecture of each chip, a small number of siblings will
be able to implement any application circuit more efficiently than a single highly flexible
FPGA.
Since a single FPGA is sufficient for prototyping but even a small production run
may require 50 chips, FPGA revenues come primarily from sales of chips intended for use in
production hardware. This is precisely where the siblings concept would be of the greatest
use.

DEPARTMENT OF EC E, VLITS PAGE 23


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

As part of the synthesis procedure, the CAD software would determine not only the
technology mapping, placement and routing of the circuit, but also the best sibling to use.
Since each sibling should be smaller (and hence cheaper) than a general purpose FPGA, the
production volume at which it is cost effective to switch to mask programmed logic will be
increased. As well, the higher speed of siblings will allow the FPGA implementation of
circuits which previously could not meet performance specifications without using custom or
semi-custom logic.
In April 2012, the company introduced the Vivado Design Suite - a next-generation
SoC-strength design environment for advanced electronic system designs.
Prior to 2010, Xilinx offered two main FPGA families: the high performance Virtex
series and the high-volume Spartan series. With the introduction of 28 nm FPGAs in June
2010, Xilinx replaced the high-volume Spartan family with the Kintex family and the low-
cost Artix family.
In newer FPGA products, Xilinx minimizes total power consumption by the adoption
of a high-k metal gate (HKMG) process, which allows for low static power consumption.
Through the use of a HKMG process, Xilinx has reduced power use while increasing logic
capacity. Virtex-6 and Spartan-6 FPGA families are said to consume 50 percent less power,
and have up to twice the logic capacity compared to the previous generation of Xilinx
FPGAs.
In June, 2010 Xilinx introduced the Xilinx 7 series: the Virtex-7, Kintex-7, and Artix-
7 families, promising improvements in system power, performance, capacity, and price.
These new FPGA families are manufactured using TSMC's 28 nm HKMG process. Xilinx
shipped the world’s first 28 nm FPGA device, the Kintex-7, making this the programmable
industry’s fastest product rollout. In March 2011, Xilinx introduced the Zynq-7000 family,
which integrates a complete ARM CORTEX – A9 MPCore processor-based system on a 28
nm FPGA for system architects and embedded software developers.
In Dec, 2013, Xilinx introduced the Ultra Scale series: Virtex Ultra Scale and Kintex
Ultra Scale families. These new FPGA families are manufactured by TSMC in its 20 nm
planar process. At the same time it announced Ultra Scale SoC architecture, called Zynq
UltraScale+ MPSoC, in TSMC 16 nm FinFET process.

DEPARTMENT OF EC E, VLITS PAGE 24


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

4.2.1 Old families:


 XC3000, XC4000, XC5200
 Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs.
4.2.2 High-performance families:
 Virtex (220 nm)
 Virtex-E, Virtex-EM (180 nm)
 Virtex-II (130 nm)
 Virtex-II PRO (130 nm)
 Virtex-4 (90 nm)
 Virtex-5 (65 nm)
 Virtex-6 (40 nm)
4.2.3 Low Cost Family:
 Spartan/XL – derived from XC4000
 Spartan-II – derived from Virtex
 Spartan-IIE – derived from Virtex-E
 Spartan-3 (90 nm)
 Spartan-3E (90 nm) – logic optimized
 Spartan-3A (90 nm) – I/O optimized
 Spartan-3AN (90 nm) – non-volatile,
 Spartan-3A DSP (90 nm)
 Spartan -6(45nm)

4.3 Virtex family


The Virtex series of FPGAs have integrated features that include FIFO and ECC
logic, DSP blocks, PCI-Express controllers, Ethernet MAC blocks, and high-speed
transceivers. In addition to FPGA logic, the Virtex series includes embedded fixed function
hardware for commonly used functions such as multipliers, memories, serial transceivers and
microprocessor cores. These capabilities are used in applications such as wired and wireless
infrastructure equipment, advanced medical equipment, test and measurement, and defense
systems.

Xilinx's most recently announced Virtex, the Virtex 7 family, is based on a 28 nm


design and is reported to deliver a two-fold system performance improvement at 50 percent
lower power compared to previous generation Virtex-6 devices.

DEPARTMENT OF EC E, VLITS PAGE 25


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

In addition, Virtex-7 doubles the memory bandwidth compared to previous generation


Virtex FPGAs with 1866 Mbit/s memory interfacing performance and over two million logic
cells.
The Virtex-6 family is built on a 40 nm process for compute-intensive electronic
systems, and the company claims it consumes 15 percent less power and has 15 percent
improved performance over competing 40 nm FPGAs.

The Virtex-5 LX and the LXT are intended for logic-intensive applications, and the
Virtex-5 SXT is for DSP applications. With the Virtex-5, Xilinx changed the logic fabric
from four-input LUTs to six-input LUTs.

Legacy Virtex devices (Virtex, Virtex-II, Virtex-II Pro, Virtex 4) are still available,
but are not recommended for use in new designs.

4.4 Kintex

The Kintex-7 family is the first Xilinx mid-range FPGA family that the company
claims delivers Virtex-6 family performance at less than half the price while consuming 50
percent less power. The Kintex family includes high-performance 12.5 Gbit/s or lower-cost
optimized 6.5 Gbit/s serial connectivity, memory, and logic performance required for
applications such as high volume 10G optical wired communication equipment, and provides
a balance of signal processing performance, power consumption and cost to support the
deployment of Long Term Evolution (LTE) wireless networks.

4.5 Artix
The Artix-7 family delivers 50 percent lower power and 35 percent lower cost
compared to the Spartan-6 family and is based on the unified Virtex-series architecture.
Xilinx claims that Artix-7 FPGAs deliver the performance required to address cost-sensitive,
high-volume markets previously served by ASSPs, ASICs, and low-cost FPGAs.
The Artix family is designed to address the small form factor and low-power
performance requirements of battery-powered portable ultrasound equipment, commercial
digital camera lens control, and military avionics and communications equipment. With the
introduction of the Spartan-7 family in 2017, which lack high-bandwidth transceivers, the
Artix-7's position in the Xilinx cost-optimized portfolio was clarified as being the
"transceiver optimized" member.

DEPARTMENT OF EC E, VLITS PAGE 26


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

4.6 Zynq:

The Zynq-7000 family of SoC addresses high-end embedded-system applications,


such as video surveillance, automotive-driver assistance, next-generation wireless, and
factory automation. Zynq-7000 integrate a complete ARM Cortex-A9 MP Core-processor-
based 28 nm system. For software developers, Zynq-7000 appear the same as a standard,
fully featured ARM processor-based system-on-chip (SOC), booting immediately at power-
up and capable of running a variety of operating systems independently of the programmable
logic.

4.7 Spartan family:

The Spartan series targets low cost, high-volume applications with a low-power
footprint e.g. displays, set-top boxes, wireless routers and other applications.

The Spartan-6 family is built on a 45-nanometer [nm], 9-metal layer, dual-oxide


process technology. The Spartan-6 was marketed in 2009 as a low-cost option for
automotive, wireless communications, flat-panel display and video surveillance applications.

The Spartan-7 family, built on the same 28nm process used in the other 7-Series
FPGAs, was announced in 2015, and became available in 2017. Unlike the Artix-7 family
and the "LXT" members of the Spartan-6 family, the Spartan-7 FPGAs lack high-bandwidth
transceivers.
4.8 User interface:
The primary user interface of the ISE is the project navigator which includes the
design hierarchy(sources), a source code editor (workplace), an output console(transcript),
and a process tree(processes).
The design hierarchy consists of design files(modules), whose dependencies are
interpreted by the ISE and displayed as a tree structure. For single chip designs there may be
one module, with other modules included by the main module, similar to the main() sub
routine in C. design constraints are specified in modules, which include pin configuration and
mapping.
The processes hierarchy describes the operations that the ISE will perform on the
currently active module. The hierarchy includes compilation functions, their dependency
functions and other utilities. This window also denotes issues or errors that arise with each
function.

DEPARTMENT OF EC E, VLITS PAGE 27


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

The transcript window provides status of currently running operations, and informs
engineers on design issues. Such issues may be filtered to show Warnings, Errors, or both.

4.9 Steps to implement the design:


Step: 1 Start the Xilinx.
To start the ISE software, double-click the ISE Project Navigator icon on your desktop, or
select Start > All Programs > Xilinx ISE Design Suite > ISE Design Tools > Project
Navigator.

Fig 4.1: Project Navigator Desktop Icon

DEPARTMENT OF EC E, VLITS PAGE 28


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 2: Creating a New Project


To create a new project using the New Project Wizard, do the following:
1. From Project Navigator, select File > New Project.
The New Project Wizard appears.
2. In the Location field, browse to F:\xilinx_tutorial or to the directory in which
the project is installed.
2. Verify that HDL is selected as the Top-Level Source Type, and click Next.
X-R

rget – Figure 3-2


Fig 4.2: Creating new project

DEPARTMENT OF EC E, VLITS PAGE 29


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 3: Select the following values in the New Project Wizard—Device Properties page:
 Product Category: All
 Family: Spartan3E
 Device: XC3S500E
 Package: VQ100
 Speed: -5
 Synthesis Tool: XST (VHDL/Verilog)
 Simulator: ISim (VHDL/Verilog)
 Preferred Language: VHDL.
This will determine the default language for all processes that generate HDL files.
Other properties can be left at their default values.
Click Next, then Finish to complete the project creation

Fig 4.3 Specifying the project properties

DEPARTMENT OF EC E, VLITS PAGE 30


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 4: Select the source type. In this Verilog module is selected.


 File name should be given.
 In that file, a Verilog code should be written for the desired application.

Fig 4.4: Selecting the source type

DEPARTMENT OF EC E, VLITS PAGE 31


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 5: Definig the module i.e., specifying the ports for the module.
The ports may be input or output or in-out.
For the proposed system the input and output ports taken are
a - input
b - input
c - output

Fig 4.5: Specifying the ports

DEPARTMENT OF EC E, VLITS PAGE 32


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step-6: Write the VHDL code for the required design.

Fig 4.6 Code for the required design

Step-7: For saving the file


Click on File  Select save

Fig 4.7 Showing how to save the file

DEPARTMENT OF EC E, VLITS PAGE 33


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step-8:For checking the syntax


 In the project navigator window click on simulationclick on behavioural check
syntax.
 It will check the errors in the code i.e, syntax errors.
 The code can simulated only after doing the check syntax and assuring that there are
no syntax errors.

Fig 4.8: Showing how to do check syntax

DEPARTMENT OF EC E, VLITS PAGE 34


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 9: In the project navigator window click on simulation->click on simulate behavioural


model.
 The model is simulated only if the logic inside the code is correct.

Fig 4.9: Showing how to do simulation

DEPARTMENT OF EC E, VLITS PAGE 35


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Step 10: For synthesis and observing the performance parameters


In the project navigator window click on implementation  synthesize XST

Fig 4.10: Showing how to synthesis and observing the performance parameters

DEPARTMENT OF EC E, VLITS PAGE 36


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER-5
RESULTS

The proposed optimized architecture is successfully implemented in VHDL and


synthesized using Xilinx-ISE I4.7.

5.1 Output Waveform:


Existing system output:
The waveform for the multiplication of the two inputs for example 7 and 4 is given below:

Fig 5.1: Simulated waveform of existing method

DEPARTMENT OF EC E, VLITS PAGE 37


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

Proposed system output:


The waveform for the multiplication of the two inputs for example 5 and 6 is given below:

Fig 5.2: Simulated waveform of proposed method

DEPARTMENT OF EC E, VLITS PAGE 38


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

5.2 Performance Parameters:


Thus the proposed system reduces the memory and delay compared to the existing
system. The parameters are compared by using the given table no.5.1.

Method Slices Time Delay

Existing system 11 8.40ns

Proposed method 7 7.129ns

Table. No 5.1 Performance parameters of proposed method “LUT based multiplier for short word length DSP
systems”

These results are obtained by doing synthesis of both existing and proposed systems
individually. The above table states that the existing system consumes 9 slices of memory and
time delay of 9.35 ns. The proposed system consumes 8 slices of memory and time delay of
7.756ns.

DEPARTMENT OF EC E, VLITS PAGE 39


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

CHAPTER 6
CONCLUSION AND FUTURE SCOPE

6.1 CONCLUSION:
In the short word length systems, the concern with memory based design is the
amount of minimum memory that we can configure in a given FPGA. For example, 32 bit
memory needed to store the product values of constant multiplier 2 (considering eight
multiplicand, i-e, 0-7) would be incorporated within no less then block RAM of 9K (The
minimum configurable block Memory in Spartan 6 FPGA); hence resulting in wastage of
unused memory. Hence once used in Short Word Length Based system would result in in-
efficient resource utilization. So, to use memory based multiplication or using the DSP48 for
Short Word Length systems is suggested as less feasible. Consequently, the choice is to use
the customized (LUT based) implementations, as one proposed here. The proposed
multiplication algorithm may be used in any area of DSP, besides Short Word Length
processing.

6.2 Future Scope:


 Point to the future work is to observe this multiplier by incorporating it in FIR Filter
and Adaptive Filter design.
 The designed architecture can be extended for higher digits and also the performance
can be improved by using advanced methodologies.

DEPARTMENT OF EC E, VLITS PAGE 40


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

REFERENCES

[1] C. Shi, J. Hwang, S. McMillan, A. Root, and V. Singh, "A system level resource
estimation tool for FPGAs," Field Programmable Logic and Application, pp. 424-433,
2004.

[2] M. R. Singh and A. Rajawat, "A Review of FPGA-based design methodologies for
efficient hardware Area estimation," IOSR Journals (IOSR Journal of Computer
Engineering), vol. 1, pp. 1-6.

[3] C. Lavin, M. Padilla, S. Ghosh, B. Nelson, B. Hutchings, and M. Wirthlin, "Using hard
macros to reduce FPGA compilation time," 2010, pp. 438-441.

[4] A. Palchaudhuri and R. S. Chakraborty, "A Fabric Component Based Approach to the
Architecture and Design Automation of High-Performance Integer Arithmetic Circuits on
FPGA," in Computational Intelligence in Digital and Network Designs and Applications,
ed: Springer, 2015, pp. 33-68.

[5] A. Corporation, "AN 584: Timing Closure Methodology for Advanced FPGA Designs,"
2014. 12.19.

[6] N. Benvenuto, L. Franks, and F. Hill Jr, "Dynamic programming methods for designing
FIR filters using coefficients-1, 0 and+ 1," Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 34, pp. 785-792, 1986.

[7] A. Z. Sadik and Z. M. Hussain, "Short word-length LMS filtering," 2007, pp. 1-4.

[8] T. D. Memon, P. Beckett, and A. Z. Sadik, "Power-area-performance characteristics of


FPGA-based sigma-delta fir filters," Journal of Signal Processing Systems, vol. 70, pp. 275-
288, 2013 2013.

[9] A. C. Thompson, Techniques in Single-Bit Digital Filtering: RMIT University, 20040.,


2004.

[10] M. K. Jaiswal and H. K.-H. So, "DSP48E efficient floating point multiplier
architectures on FPGA," in VLSI Design and 2017 16th International Conference on
Embedded Systems (VLSID), 2017 30th International Conference on, 2017, pp. 1-6.

DEPARTMENT OF EC E, VLITS PAGE 41


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

APPENDIX
EXISTING SYSTEM CODE:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;
entity existing_method is
Port ( a : in STD_LOGIC_vector(2 downto 0);
b : in STD_LOGIC_vector(2 downto 0);
data : out STD_LOGIC_VECTOR (05 downto 0));
end existing_method;
architecture Behavioral of existing_method is
type lut is array(0 to 2**6-1)of std_logic_vector(5 downto 0);
signal my_lut:lut:=(
0=>"000000",
1=>"000000",
2=>"000000",
3=>"000000",
4=>"000000",
5=>"000000",
6=>"000000",
7=>"000000",
8=>"000000",
9=>"000001",
10=>"000010",
11=>"000011",
12=>"000100",
13=>"000101",
14=>"000110",
15=>"000111",
16=>"000000",
17=>"000010",
18=>"000100",
19=>"000110",
20=>"001000",
DEPARTMENT OF EC E., VLITS
IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

21=>"001010",
22=>"001100",
23=>"001110",
24=>"000000",
25=>"000011",
26=>"000110",
27=>"001001",
28=>"001100",
29=>"001111",
30=>"010010",
31=>"010101",
32=>"000000",
33=>"000100",
34=>"001000",
35=>"001100",
36=>"010000",
37=>"010100",
38=>"011000",
39=>"011100",
40=>"000000",
41=>"000101",
42=>"001010",
43=>"001111",
44=>"010100",
45=>"011001",
46=>"011110",
47=>"100011",
48=>"000000",
49=>"000110",
50=>"001100",
51=>"010010",
52=>"011000",
53=>"011110",
54=>"100100",
55=>"101010",

DEPARTMENT OF EC E., VLITS


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

56=>"000000",
57=>"000111",
58=>"001110",
59=>"010101",
60=>"011000",
61=>"100011",
62=>"101010",
63=>"110001");
signal n: integer;
signal k:std_logic_vector( 5 downto 0);
begin
process (a,b,k)
begin
k(2 downto 0)<=b;
k(5 downto 3)<=a;
n<= to_integer(unsigned(k));
end process;
ly:process (n)
begin
if(n>=0)then
data<= my_lut(n);
end if;
end process ly;
end Behavioral;

DEPARTMENT OF EC E., VLITS


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

PROPOSED SYSTEM CODE:

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_STD.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity proposed_method is
Port ( a : in STD_LOGIC_VECTOR (02 downto 0);
b : in STD_LOGIC_VECTOR (02 downto 0);
c:out STD_LOGIC_VECTOR (05 downto 0));
end proposed_method;
architecture Behavioral of proposed_method is
type lut is array(0 to 2**3-1)of std_logic_vector(3 downto 0);
constant my_lut:lut:=(
0=>"0000",
1=>"0010",
2=>"0100",
3=>"0110",
4=>"1000",
5=>"1010",
6=>"1100",
7=>"1110");
--signal n:std_logic_vector(3 downto 0);
signal p:std_logic_vector(5 downto 0);
signal n,w,q,f: integer;
begin
w<=to_integer(unsigned(a));
q<=to_integer(unsigned(b));
p<=std_logic_vector(to_unsigned(n,6));
process(a,b,p)
variable s,k,x,j,t:natural range 0 to 50:=0;
variable u:std_logic_vector(5 downto 0);
variable g:std_logic_vector(3 downto 0);
begin
for k in 0 to 7 loop

DEPARTMENT OF EC E., VLITS


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

if(k>=0)then
g:=my_lut(k);
u:="00"&g;
x:=to_integer(unsigned(u));
j:=to_integer(unsigned(p));
if(p=u)then
c<=u;
else if(j-1=x)then
c<=u(5 downto 1)&'1';
else if(p=u(4 downto 0)&'0')then
c<=u(4 downto 0)&'0';
else if(p=u(4 downto 0)&'1')then
c<=u(4 downto 0)&'1';
else if(p=u(4 downto 1)& not u(0)&'0')then
c<=u(4 downto 1)&not u(0)&'0';
else if(p=u(4 downto 1)&not u(0)&'1')then
c<=u(4 downto 1)&not u(0)&'1';
else if(p=u(3 downto 1)& not u(0)& "00")then
c<=u(3 downto 1)&not u(0)& "00";
else if(p=u(3 downto 0)&"01")then
c<=u(3 downto 0)&"01";
else if(p=u(3 downto 0)&"10")then
c<=u(3 downto 0)&"10";
else if(p=u(3 downto 0)&"11")then
c<=u(3 downto 0)&"11";
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;

DEPARTMENT OF EC E., VLITS


IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS

end if;
end loop;
end process;
end Behavioral;

DEPARTMENT OF EC E., VLITS

Potrebbero piacerti anche