Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
on
BACHELOR OF TECHNOLOGY
In
ELECTRONICS AND COMMUNICATION ENGINEERING
Submitted by
CERTIFICATE
EXTERNAL EXAMINER
ACKNOWLEDGEMENT
We would like to thank our beloved principal Dr. K. PHANEENDRA KUMAR, for
providing a great support for us in completing our project and for giving us the opportunity of
doing the project.
We feel elated to thank Mr. M. SUMAN Associate Professor and our Head of the
Department, for inspiring us all the way and arranging all the facilities and resources needed
for our project.
We are very thankful to our beloved coordinators Mrs. P. V. N. LAKSHMI, Mr.
B.HARISH andMr. S. NAGARAJU for inspiring all the way and arranging all the
facilitiesand resources needed for project. Their efforts in this aspect are beyond the preview
of the acknowledgement.
It is with immense pleasure that we would like to express our indebted gratitude to
our guide Mr. J.VEERAYYA Assistant Professor who guided us a lot and encouraged us in
every step of our project work. His invaluable moral support and guidance through the
project helped us to a great extent. We are thankful to him for his valuable suggestions and
discussions during this project.
We express our hearty thanks to all the staff members and non-teaching staff for all
their help and co-operation extended in bringing out this project successfully in time.
Project Associates:
A. DURGA BHAVANI (15FE1A0405)
D. ANJANEYULU (15FE1A0437)
A. SWETHA (15FE1A0404)
P. PRADEEP (15FE5A0428)
DECLARATION
DECLARATION
We hereby declare that the work described in this project work, entitled
“IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD
LENGTH DSP SYSTEMS” which is submitted by us in partial fulfilment for the award of
Bachelor of Technology (B.Tech) in the Department of Electronics and Communication
Engineering to the Vignan’s Lara Institute of Technology and Science, Vadlamudi affiliated
to Jawaharlal Nehru Technological University Kakinada, Andhra Pradesh, is the result of
work done by us under the guidance of Mr. J. VEERAYYA, Assistant Professor.
The work is original and has not been submitted for any Degree/Diploma of this or
any other university.
Project Associates:
Place: Vadlamudi.
Date:
ABSTRACT
Short word length (SWL) DSP systems offer good performance as that process less
data typically up to three bits. SWL systems can also be designed using the FPGAs. FPGAs
come with many built-in primitives like look-up tables, flip-flops, additional carry logic,
memories and DSP systems.
This project illustrates a way to use LUT to design three bit (3*3) constant coefficient
unsigned integral multiplier for SWL DSP systems. The major difference between the
conventional constant coefficient memory based multiplier and proposed one is the amount of
memory consumed. In the convention design, for example, for each constant multiplier (0-7),
respective product values would be pre calculated and stored in eight block memories. While
in proposed design, only one LUT based memory module is consumed. This LUT based
memory holds only the product values of the fixed coefficient multiplier 2 and for other
coefficients same product values are modified at the output as per proposed algorithm steps.
i
INDEX
CONTENTS PAGE NO.
ABSTRACT I
LIST OF FIGURES Iv
LIST OF TABLES V
ABBREVIATIONS Vi
ii
CHAPTER 4: SOFTWARE DEVELOPMENT 23-34
4.1 Xilinx 23
4.4 Kintex 26
4.5 Artix 26
4.6 Zynq 27
4.7 Spartan family 27
4.8 User interface 27
4.9 Steps to implement the design 28
6.1 Conclusion 40
REFERENCES 41
APPENDIX
iii
LIST OF FIGURES
iv
LIST OF TABLES
v
ABBREVIATIONS
vi
IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS
CHAPTER 1
INTRODUCTION
1.1 Objective
To improve the performance parameters of the multiplier such as thespeed of the
multiplier by reducing the propagation delay and reduce the area of the multiplier by using
the concepts of Vedic mathematics. Consequently, several digital signal processing
algorithms can be improved in terms of performance parameters such as speed, area as they
involve lot of multiplication operations and thereby creating compact VLSI designs.
The Invention of the Integrated Circuit by Jack Kilby and Robert Noyce solved this
problem by making all the components and the chip out of the same block (monolith) of
semiconductor material. The circuits could be made smaller, and the manufacturing process
could be automated. This led to the idea of integrating all components on a single silicon
wafer, which led to small-scale integration (SSI) in the early 1960s, medium-scale integration
(MSI) in the late 1960s, and then large-scale integration (LSI) as well as VLSI in the 1970s
and 1980s, with tens of thousands of transistors on a single chip (later hundreds of thousands,
Solution Synthesis
Specification verification
Behavioural
Representation
Verification
ok No
RTL coding
(VHDL/Verilog) Yes
Verification
Static timing
Succeeded analysis
No
Yes
Timing
Back end ok No
Yes
Post verification
synthesis
Verification
ok No
Yes
The end
The VLSI design flow can be divided into two parts: Frontend design flow and
Backend design flow. Both together, allow the creation of a functional chip from scratch to
production.
1.2.3 Front end steps:
i. System specifications:
This indicates the requirements of the circuit that has to be designed. It is required to
make it simple for the designer with an idea for a particular application to turn that idea into
working system on a very large scale integrated chips.
ii. Behavioural representation:
This involves the description of how the circuit should communicate with outside
world. Typical issues at this representation level includes the number of I/O terminals and
their relation. This helps in optimizing the circuit.
iii. RTL Coding:
In digital circuit design, register transfer level is a design abstraction which models a
synchronous digital circuit in terms of the flow of digital signals between hardware registers
and the logical operations performed on those signals. RTL abstraction is used in hardware
description languages like VHDL or Verilog to create high level representations of a circuit
from lower level representations and ultimately actual wiring can be derived.
iv. Functional verification:
It is defined as the process of verifying that an RTL design meets its specification
from a functional perspective.
Star RCXT is the Synopsys tool capable of performing parasitic extraction. It takes
the post-layout Milkyway database and the NXTGRD files provided by the foundry (cells
parasitic information) and produces SPEF (Standard Parasitic Exchange Format) and SBPF
(Synopsys Binary Parasitic Format) files.
v. Static Timing Analysis (STA):
STA is a method to obtain accurate timing information without the need to simulate
the circuit. It allows detecting setup and hold times violations, as well as skew and slow paths
that limit the operation frequency.
Synopsys PrimeTime allows running STA over a physical design, for each corner.
Taking as inputs the post-layout netlist and parasitic and standard cells information it outputs
a series of reports, which give the possibility to detect timing violations .
vi. Post-layout Verification :
Once again, formality should be run to check the logical equivalence of the post-
layout netlist with the RTL description.
The huge number of transistors in a circuit can make the voltage level drop below a defined
margin that ensures that the circuit works properly. IR Drop analysis allows checking the
power grid to ensure that it is strong enough to hold that minimum voltage level. Synopsys
PrimeRail is the tool that outputs IR-drop and EM analyses reports.
1.3 VHDL:
VHDL is a versatile and powerful hardware description language which is useful for
modeling electronic systems at various levels of design abstraction. VHDL (VHSIC HDL or
very high speed integrated circuit hardware description language) and VERILOG are
programming languages that help us automate the design of IC’s. These languages enable us
to simulate our designs before actually sending it out to fabrication. It helps us to design IC’s
at the RTL level and also enables us to translate this RTL design to a discrete netlist of logic
gates which will then be sent out to layout and routing for proper placement of these logic
gates in the chip. Not only these technologies allow us to simulate, but also allows us to
emulate our design on FPGA, which then enables us to check for possible faults in our
design. These languages are not only used to design the IC’s but also to verify the designs
using test benches (A set of programs that is intended to test the design). These test benches
can also be written in VHDL and verilog languages. These technologies are used in both
design and verification process flow. To be specific it corresponds to RTL level of design.
CHAPTER-2
LITERATURE SURVEY
array multiplier before the final product is achieved. This is due to the carry propagation.
Fig 2.2 Architecture Of 2x2 Vedic BCD Multiplier using Modified Binary To BCD Converter
Table.No 2.1 Performance parameters of High Performance Vedic BCD Multiplier using
Modified Binary to BCD Converter
A7-0 B7-0
32 32
27 27
AxB
54
BCD to BIN
16
The block diagram of Decimal Multiplier without Decimal Partial Products is shown
in fig 2.3. The first approach only uses the converters, besides the multiplier, while the
second approach needs extra adders, to add the partial products, and several converters.
However, the converters utilized in the first approach are larger and will utilize much more
area than the converters used in the second approach. In the first case, the architecture uses
two 8-digit to 27- bit decimal to binary converters, one 27x27 multiplier, and one 54-bit to
16-digit binary to decimal converter.
In the second case, the 8-digit operands are divided into two groups of 4-digits each.
In this case, there are four multiplications implemented with binary multipliers, that is, each
4-digit number is converted to binary and then multiplied. The inner partial products are
added in binary before being converted to decimal to be added to the other partial decimal
products (after binary to decimal conversion).
As expected, the larger binary to decimal converters are very expensive in terms of
area and so the second approach using partial products and smaller converters, is better both
in terms of area and performance.
After converting, the sub-groups of digits of the operands to binary and performing
the cross multiplications, the aligned partial products are added in binary and then converted
to decimal. The three partial products indicate the operations performed and the number of
digits. After this alignment, the three final partial products are added in decimal.
4 Digits 4 Digits
4 Digits 4 Digits
8 Digits
8 Digits 4 Digits
8 Digits
8 Digits 8 Digits
16 Digits
Fig 2.4: Architecture Of 2x2 Parallel Decimal Multipliers Using Binary Multipliers
Performance Parameters Of
Parallel Decimal Multipliers
612 58
Using Binary Multipliers
Table. No 2.2 Performance Parameters of Parallel Decimal Multipliers Using Binary Multipliers
DRAWBACKS:
As the multiplier and multiplicand increases, the total memory would increase.
Even for small bit width multipliers (such as 2, 3) more memory elements are needed.
CHAPTER-3
LUT BASED MULTIPLIER
A LUT consists of a block of SRAM that is indexed by the LUT's inputs. The output of the
LUT is whatever value is in the indexed location in its SRAM.
Although we think of RAM normally being organized into 8, 16, 32 or 64-bit words, SRAM
in FPGA’s are 1 bit in depth. So for example a 3 input LUT uses an 8x1 SRAM (2³=8)
Because RAM is volatile, the contents have to be initialized when the chip is powered up.
This is done by transferring the contents of the configuration memory into the SRAM.
The output of a LUT is whatever you want it to be.
00 0
01 0
10 0
11 1
00 0
01 1
10 0
11 0
Finally, A xor B:
Address in ([1:0]) Output
00 0
01 1
10 1
11 0
So it is not the same LUT in each case, since the LUT defines the output. Obviously, the
number of inputs to an LUT can be far more than two.
The LUT is actually implemented using a combination of the SRAM bits and a MUX.
Here the bits across the top 0 1 0 0 0 1 1 1 represents the output of the truth table for this
LUT. The three inputs to the MUX on the left a, b, and c select the appropriate output value.
Fig multiplier
3.2 Implementation of LUT based 3.2: 3-input LUT
for short word length DSP
systems: fffff
Figbe3.2:
In binary representation, new data may 3-inputeasily
obtained LUT by just shifting the bits either
left or right. Other goodness in base2 systems is that, the doubling of any value is easy to get
by post fixing the zero as the LSB, as shown in table 1: moving from 2 to 4 in decimal is
Fig 3.2: 3-input LUT
possible in binary with appending zero in the end of base2 representation of 2; similar is the
case with moving from 3 to 6, 6 to 12, and 5 to 10 so on so forth.
This aspect gives us the opportunity Fig to design memory
3.2: 3-input LUTbased area optimized systems,
especially multiplier, as with storing the pre-calculated product values of constant multiplier
like 2 or 3 gives us the opportunity to get the product values of higher factors of constant
multiplier. This scheme works very well for some data but issue is with the numbers whose
least common factor is not available for example 5, 7, 9, 11.
Proposed Algorithm:
In the proposed algorithm decimal 2 is taken as the least multiple of all the data.
Multiplier Multiplicand Product value
2 0 0
2 1 2
2 2 4
2 3 6
2 4 8
2 5 10
2 6 12
2 7 14
Hence, in memory pre-calculated product values for the constant multiplier 2 are stored
and the product of all other multipliers (0, 1, 3, 4, 5, 6, and 7) is achieved by modifying the
output in some ways using two combinational functions: not and concatenation .
Fig.3.3 Block diagram of LUT based memory to store the product values
In Fig.3.3, LUT based memory, storing pre-calculated product values for constant
multiplier 2 is shown. The memory address is 3-bit wide represented by W that has maximum
8 address locations with 2n approach. Each product value stored is of four bits wide
represented with L; hence, making total of 32 bits for total 8 addresses.
As for 3x3 multiplications the output should of 6 bits (W+L). But in our approach, we
only need 4 bits; as shown in fig.3.3. Therefore, we can apply 2 bits at last as appendix to get
the required output. The proposed algorithm for this type of multiplier is given as under.
Algorithm:
This algorithm starts by storing product value in LUT memory where multiplier is 2
and multiplicand is 0, 1, 2, 3, 4, 5, 6, and 7. The Address is the value of Multiplicand.
Step 1: If the Multiplier is 2 and multiplicand is any value from the range 0-7, then the
product value stored at that particular memory location (defined by multiplicand) is net
output. But, if the Multiplier is other than 2 and multiplicand is from 0-7 then re-look in
memory to find if the required output is already calculated. If yes, take the output from that
particular address.
Example: let’s suppose the multiplier be 2 and multiplicand be 5, then 2x5=10 and 10
(10101)2 is already stored at memory location 5. But if the multiplier is not 2 and product
value is still available in memory (like 3x4=12 (1100)2 already stored at location 6), then
simply get the product value at output stored at that particular location.
Step 2: If the Multiplier is other than 2 and the value is not available in memory; look for
nearby value (one less or one greater of expected product) and flip the last bit.
Example: 3x5=15 and this value is not available in memory, so take the nearby value 14
(1110)2 and flip the last bit to make 14(1110)2 to 15(1111)2.
Step 3: If the nearby value is not present, look for any least factor of the product value and
append the bit(s) in the last.
Example: 4x5=20. Here 10 (1010)2 is the least factor of 20 (10100)2. So take the output 10
available at memory location 5 and append zero in the last to get the double of 10 that is 20.
Similarly, suppose multiplier is 3 and multiplicand is 7. Here the product would be
21(10101)2. But 21 are not available in memory. So, this can be achieved by taking 10
(1010)2 at the output and then appending 1 in the last. This will transform the 10 (1010)2 in
to 21(10101)2.
Step 4: If above steps does not give the required output, append two bits in the last to get the
required data.
Example: 7x5 resulting product value is 35 (100011)2 and when we append (11)2in the last
to 8 (1000)2, we can easily get the output 35(100011).
Flow chart:
The flow diagram for the addressing scheme and combinational logic is given as under:
The word length of the data stored in the memory is 4. So, availing the opportunity to
complete the final count to 6 bits, we can append 2 bits in the last of the data taken from the
memory at maximum: to making the data double and even triple when required.
CHAPTER-4
SOFTWARE DEVELOPMENT
4.1 Xilinx:
XILINX ISE (INTEGRATED SYNTHESIS ENVIRONMENT) is a software tool
produced by Xilinx for synthesis and analysis of HDL designs, enabling the developer to
synthesize their designs, perform timing analysis, examine RTL diagrams, simulate a
design’s reaction to different stimuli, and configure the target device with the programmer.
XILINX is an American technology company which is primarily a supplier of
programmable logic devices. It is known for inventing the field programmable gate
array(FPGA). It is the first semiconductor company with the design and sale of hardware
devices and semiconductor chips.
As part of the synthesis procedure, the CAD software would determine not only the
technology mapping, placement and routing of the circuit, but also the best sibling to use.
Since each sibling should be smaller (and hence cheaper) than a general purpose FPGA, the
production volume at which it is cost effective to switch to mask programmed logic will be
increased. As well, the higher speed of siblings will allow the FPGA implementation of
circuits which previously could not meet performance specifications without using custom or
semi-custom logic.
In April 2012, the company introduced the Vivado Design Suite - a next-generation
SoC-strength design environment for advanced electronic system designs.
Prior to 2010, Xilinx offered two main FPGA families: the high performance Virtex
series and the high-volume Spartan series. With the introduction of 28 nm FPGAs in June
2010, Xilinx replaced the high-volume Spartan family with the Kintex family and the low-
cost Artix family.
In newer FPGA products, Xilinx minimizes total power consumption by the adoption
of a high-k metal gate (HKMG) process, which allows for low static power consumption.
Through the use of a HKMG process, Xilinx has reduced power use while increasing logic
capacity. Virtex-6 and Spartan-6 FPGA families are said to consume 50 percent less power,
and have up to twice the logic capacity compared to the previous generation of Xilinx
FPGAs.
In June, 2010 Xilinx introduced the Xilinx 7 series: the Virtex-7, Kintex-7, and Artix-
7 families, promising improvements in system power, performance, capacity, and price.
These new FPGA families are manufactured using TSMC's 28 nm HKMG process. Xilinx
shipped the world’s first 28 nm FPGA device, the Kintex-7, making this the programmable
industry’s fastest product rollout. In March 2011, Xilinx introduced the Zynq-7000 family,
which integrates a complete ARM CORTEX – A9 MPCore processor-based system on a 28
nm FPGA for system architects and embedded software developers.
In Dec, 2013, Xilinx introduced the Ultra Scale series: Virtex Ultra Scale and Kintex
Ultra Scale families. These new FPGA families are manufactured by TSMC in its 20 nm
planar process. At the same time it announced Ultra Scale SoC architecture, called Zynq
UltraScale+ MPSoC, in TSMC 16 nm FinFET process.
The Virtex-5 LX and the LXT are intended for logic-intensive applications, and the
Virtex-5 SXT is for DSP applications. With the Virtex-5, Xilinx changed the logic fabric
from four-input LUTs to six-input LUTs.
Legacy Virtex devices (Virtex, Virtex-II, Virtex-II Pro, Virtex 4) are still available,
but are not recommended for use in new designs.
4.4 Kintex
The Kintex-7 family is the first Xilinx mid-range FPGA family that the company
claims delivers Virtex-6 family performance at less than half the price while consuming 50
percent less power. The Kintex family includes high-performance 12.5 Gbit/s or lower-cost
optimized 6.5 Gbit/s serial connectivity, memory, and logic performance required for
applications such as high volume 10G optical wired communication equipment, and provides
a balance of signal processing performance, power consumption and cost to support the
deployment of Long Term Evolution (LTE) wireless networks.
4.5 Artix
The Artix-7 family delivers 50 percent lower power and 35 percent lower cost
compared to the Spartan-6 family and is based on the unified Virtex-series architecture.
Xilinx claims that Artix-7 FPGAs deliver the performance required to address cost-sensitive,
high-volume markets previously served by ASSPs, ASICs, and low-cost FPGAs.
The Artix family is designed to address the small form factor and low-power
performance requirements of battery-powered portable ultrasound equipment, commercial
digital camera lens control, and military avionics and communications equipment. With the
introduction of the Spartan-7 family in 2017, which lack high-bandwidth transceivers, the
Artix-7's position in the Xilinx cost-optimized portfolio was clarified as being the
"transceiver optimized" member.
4.6 Zynq:
The Spartan series targets low cost, high-volume applications with a low-power
footprint e.g. displays, set-top boxes, wireless routers and other applications.
The Spartan-7 family, built on the same 28nm process used in the other 7-Series
FPGAs, was announced in 2015, and became available in 2017. Unlike the Artix-7 family
and the "LXT" members of the Spartan-6 family, the Spartan-7 FPGAs lack high-bandwidth
transceivers.
4.8 User interface:
The primary user interface of the ISE is the project navigator which includes the
design hierarchy(sources), a source code editor (workplace), an output console(transcript),
and a process tree(processes).
The design hierarchy consists of design files(modules), whose dependencies are
interpreted by the ISE and displayed as a tree structure. For single chip designs there may be
one module, with other modules included by the main module, similar to the main() sub
routine in C. design constraints are specified in modules, which include pin configuration and
mapping.
The processes hierarchy describes the operations that the ISE will perform on the
currently active module. The hierarchy includes compilation functions, their dependency
functions and other utilities. This window also denotes issues or errors that arise with each
function.
The transcript window provides status of currently running operations, and informs
engineers on design issues. Such issues may be filtered to show Warnings, Errors, or both.
Step 3: Select the following values in the New Project Wizard—Device Properties page:
Product Category: All
Family: Spartan3E
Device: XC3S500E
Package: VQ100
Speed: -5
Synthesis Tool: XST (VHDL/Verilog)
Simulator: ISim (VHDL/Verilog)
Preferred Language: VHDL.
This will determine the default language for all processes that generate HDL files.
Other properties can be left at their default values.
Click Next, then Finish to complete the project creation
Step 5: Definig the module i.e., specifying the ports for the module.
The ports may be input or output or in-out.
For the proposed system the input and output ports taken are
a - input
b - input
c - output
Fig 4.10: Showing how to synthesis and observing the performance parameters
CHAPTER-5
RESULTS
Table. No 5.1 Performance parameters of proposed method “LUT based multiplier for short word length DSP
systems”
These results are obtained by doing synthesis of both existing and proposed systems
individually. The above table states that the existing system consumes 9 slices of memory and
time delay of 9.35 ns. The proposed system consumes 8 slices of memory and time delay of
7.756ns.
CHAPTER 6
CONCLUSION AND FUTURE SCOPE
6.1 CONCLUSION:
In the short word length systems, the concern with memory based design is the
amount of minimum memory that we can configure in a given FPGA. For example, 32 bit
memory needed to store the product values of constant multiplier 2 (considering eight
multiplicand, i-e, 0-7) would be incorporated within no less then block RAM of 9K (The
minimum configurable block Memory in Spartan 6 FPGA); hence resulting in wastage of
unused memory. Hence once used in Short Word Length Based system would result in in-
efficient resource utilization. So, to use memory based multiplication or using the DSP48 for
Short Word Length systems is suggested as less feasible. Consequently, the choice is to use
the customized (LUT based) implementations, as one proposed here. The proposed
multiplication algorithm may be used in any area of DSP, besides Short Word Length
processing.
REFERENCES
[1] C. Shi, J. Hwang, S. McMillan, A. Root, and V. Singh, "A system level resource
estimation tool for FPGAs," Field Programmable Logic and Application, pp. 424-433,
2004.
[2] M. R. Singh and A. Rajawat, "A Review of FPGA-based design methodologies for
efficient hardware Area estimation," IOSR Journals (IOSR Journal of Computer
Engineering), vol. 1, pp. 1-6.
[3] C. Lavin, M. Padilla, S. Ghosh, B. Nelson, B. Hutchings, and M. Wirthlin, "Using hard
macros to reduce FPGA compilation time," 2010, pp. 438-441.
[4] A. Palchaudhuri and R. S. Chakraborty, "A Fabric Component Based Approach to the
Architecture and Design Automation of High-Performance Integer Arithmetic Circuits on
FPGA," in Computational Intelligence in Digital and Network Designs and Applications,
ed: Springer, 2015, pp. 33-68.
[5] A. Corporation, "AN 584: Timing Closure Methodology for Advanced FPGA Designs,"
2014. 12.19.
[6] N. Benvenuto, L. Franks, and F. Hill Jr, "Dynamic programming methods for designing
FIR filters using coefficients-1, 0 and+ 1," Acoustics, Speech and Signal Processing, IEEE
Transactions on, vol. 34, pp. 785-792, 1986.
[7] A. Z. Sadik and Z. M. Hussain, "Short word-length LMS filtering," 2007, pp. 1-4.
[10] M. K. Jaiswal and H. K.-H. So, "DSP48E efficient floating point multiplier
architectures on FPGA," in VLSI Design and 2017 16th International Conference on
Embedded Systems (VLSID), 2017 30th International Conference on, 2017, pp. 1-6.
APPENDIX
EXISTING SYSTEM CODE:
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
use IEEE.NUMERIC_STD.ALL;
entity existing_method is
Port ( a : in STD_LOGIC_vector(2 downto 0);
b : in STD_LOGIC_vector(2 downto 0);
data : out STD_LOGIC_VECTOR (05 downto 0));
end existing_method;
architecture Behavioral of existing_method is
type lut is array(0 to 2**6-1)of std_logic_vector(5 downto 0);
signal my_lut:lut:=(
0=>"000000",
1=>"000000",
2=>"000000",
3=>"000000",
4=>"000000",
5=>"000000",
6=>"000000",
7=>"000000",
8=>"000000",
9=>"000001",
10=>"000010",
11=>"000011",
12=>"000100",
13=>"000101",
14=>"000110",
15=>"000111",
16=>"000000",
17=>"000010",
18=>"000100",
19=>"000110",
20=>"001000",
DEPARTMENT OF EC E., VLITS
IMPLEMENTATION OF LUT BASED MULTIPLIER FOR SHORT WORD LENGTH DSP SYSTEMS
21=>"001010",
22=>"001100",
23=>"001110",
24=>"000000",
25=>"000011",
26=>"000110",
27=>"001001",
28=>"001100",
29=>"001111",
30=>"010010",
31=>"010101",
32=>"000000",
33=>"000100",
34=>"001000",
35=>"001100",
36=>"010000",
37=>"010100",
38=>"011000",
39=>"011100",
40=>"000000",
41=>"000101",
42=>"001010",
43=>"001111",
44=>"010100",
45=>"011001",
46=>"011110",
47=>"100011",
48=>"000000",
49=>"000110",
50=>"001100",
51=>"010010",
52=>"011000",
53=>"011110",
54=>"100100",
55=>"101010",
56=>"000000",
57=>"000111",
58=>"001110",
59=>"010101",
60=>"011000",
61=>"100011",
62=>"101010",
63=>"110001");
signal n: integer;
signal k:std_logic_vector( 5 downto 0);
begin
process (a,b,k)
begin
k(2 downto 0)<=b;
k(5 downto 3)<=a;
n<= to_integer(unsigned(k));
end process;
ly:process (n)
begin
if(n>=0)then
data<= my_lut(n);
end if;
end process ly;
end Behavioral;
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.numeric_STD.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity proposed_method is
Port ( a : in STD_LOGIC_VECTOR (02 downto 0);
b : in STD_LOGIC_VECTOR (02 downto 0);
c:out STD_LOGIC_VECTOR (05 downto 0));
end proposed_method;
architecture Behavioral of proposed_method is
type lut is array(0 to 2**3-1)of std_logic_vector(3 downto 0);
constant my_lut:lut:=(
0=>"0000",
1=>"0010",
2=>"0100",
3=>"0110",
4=>"1000",
5=>"1010",
6=>"1100",
7=>"1110");
--signal n:std_logic_vector(3 downto 0);
signal p:std_logic_vector(5 downto 0);
signal n,w,q,f: integer;
begin
w<=to_integer(unsigned(a));
q<=to_integer(unsigned(b));
p<=std_logic_vector(to_unsigned(n,6));
process(a,b,p)
variable s,k,x,j,t:natural range 0 to 50:=0;
variable u:std_logic_vector(5 downto 0);
variable g:std_logic_vector(3 downto 0);
begin
for k in 0 to 7 loop
if(k>=0)then
g:=my_lut(k);
u:="00"&g;
x:=to_integer(unsigned(u));
j:=to_integer(unsigned(p));
if(p=u)then
c<=u;
else if(j-1=x)then
c<=u(5 downto 1)&'1';
else if(p=u(4 downto 0)&'0')then
c<=u(4 downto 0)&'0';
else if(p=u(4 downto 0)&'1')then
c<=u(4 downto 0)&'1';
else if(p=u(4 downto 1)& not u(0)&'0')then
c<=u(4 downto 1)¬ u(0)&'0';
else if(p=u(4 downto 1)¬ u(0)&'1')then
c<=u(4 downto 1)¬ u(0)&'1';
else if(p=u(3 downto 1)& not u(0)& "00")then
c<=u(3 downto 1)¬ u(0)& "00";
else if(p=u(3 downto 0)&"01")then
c<=u(3 downto 0)&"01";
else if(p=u(3 downto 0)&"10")then
c<=u(3 downto 0)&"10";
else if(p=u(3 downto 0)&"11")then
c<=u(3 downto 0)&"11";
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end if;
end loop;
end process;
end Behavioral;