F6 - Introduction To RTL Synthesis and Technology Mapping PDF

F6: Introduction to RTL Synthesis
and Technology Mapping

Outline
• Introduction to Synthesis & Technology Mapping

– ASIC/FPGA Synthesis
– FPGA Architectures
– Memory Synthesis
Advantages of Hardware Description
Languages as Design Entry
• Description of the functionality is done at a
higher level
– HDL description can be simulated and validated at an early stage
in the design process
– HDLs prevent errors, since they provide strong type checking.
– A design written in an HDL is more easily understood than a gate-
level net-list or a schematic description.
• The HDL description can be synthesised into

a gate-level description of a chosen
technology
Simulation/Verification Flow
Quartus
Hardware Synthesis from VHDL
FPGA/ASIC Design Flow
A Crash Course in Technology
Mapping
• Two-level logic mapping
• Multi-level logic mapping
• Mapping to ASIC & FPGAs
Two-level logic Mapping
• Standard Karnaugh Minimization
• Using muxes to implement logic functions
You should know this by now...
xy w
wz 00 01 11 10
00 1 1 0 0
x
01 1 1 1 1
11 0 1 1 0
f
z
10 0 1 1 0
y
f = w’x’+w’z+wy
XOR & XNOR
Hard to find xor-functions in the Karnaugh-

diagramme (diagonals)...
xy
wz 00 01 11 10
00 0 1 0 1
01 1 0 1 0
11 0 1 0 1
10 1 0 1 0
wzxy
Implementing functions using Muxes
xy
00 01 11 10 z’ 00
z
0 1 1 0 0 1 01
0 f
1 0 1 1 0 10
z 11
f = z’x’+x’y+zy xy
An (n+1)-input function can always be implemented on an n-input select mux!

Exercise
How can the following functions
be implemented on a 2:1
Multiplexer?
• Z=B’ (INV)
• Z=AB (AND) X 1
• Z=A+B (OR) Z
Y 0
Note that Z=SX+S’Y

Multi-level Mapping
• ASIC Technology
– DeMorgan & Factorization
• FPGAs
– Shannon Decomposition
Basic Mapping Algorithm
• Traverse the circuit graph
• Find and apply suitable transformations.
• Replace logic with equivalent logic if transformation lead

to (depending on the constraints)
– Shorter Delay (until constraints on critical path are fulfilled)
– Smaller Area
– Lower Power consumption (human-guided)
Reminder: Critical Path
• The Critical Path is the ”longest” or, more
correctly, the slowest path in the circuit, i.e., the
path through the circuit which takes the longest
time to finish execution of its functionality.
• Critical Path is calculated from Input to Output,

Input to a DFF register input, DFF register output
to Output or from DFF register output to DFF
register input
The Critical Path decides the Delay
x Fanout=64
w
0.05
Tx->z63=(0.05+0.038*64)+
z0
y´0
(0.07+0.016*5)=
z1
y´1 Load Fanout
. 2.63 ns
. A circuit’s intrinsic delay represents the RC delay of the
unloaded circuit.
.
x0
z62
y´62
.
y´63 0.07 z63 Fanout=5
. Fanout is the number of circuits
connected to the driving output.
Load=0.038
Load=0.016
.
A circuit loads its predecessor with capacitance! x5
The load is expressed as an RC delay that loads its driver.
Buffering (ctd.)
x
0.05
w Assume one buffer has 2 inverters’
z0 input loads
y´0
.
.
. Tx->z63=(0.05+0.076*2)+
z1
y´31
(0.15+0.006*32)+
0.15
Load=0.076
z32 (0.07+0.016*5)=
y´32
.
. 0.69 ns
.
Fanout=5
0.07 z63 Load=0.016
y´63
Load=0.006
A buffer has higher driving strength (due to lower resitance).
Thus, load is reduced!
Mapping to an ASIC Technology
DeMorgan
f1 f1
x x
y´ y´
z2 z2
c2 x´ c2
x t t
y´ y
c1 z1 c1 z1
w x w
x´
y c0 y´ c0
z0 z0
x´ x´
y f2 y f2
Area=19 gates Area=11 gates

Factorisation
x f1 =(xy’)´ = DeMorgan = (x’+y) = w

y´ w
z2
x´ c2 c2 z2
t x´
y t
c1 z1 y
c1 z1
x w
x w
y´ c0
z0 y´ c0
x´ z0
t
y
f2 = (x’y)´ = DeMorgan = (x+y’ ) = t
Area=11 gates Area=9 gates

Mapping to FPGAs: Shannon-
decomposition
Any boolean function f(xn, …, x1, x0) can be split (recursively)

according to
f(xn, …, x1, x0) = x0 f(xn, …, x1, 1) + x0 f(xn, …, x1, 0)
f11
f1 1
1 f01
f 0
f0 0 1 f
f10 0
1
x0 f00
0 x0
x1
Proof
• RHS:
– If x0=1 then the right term will be zero. Then f equals the left term.
– If x0=0 then the left term will be zero. Then f equals the right term
• LHS:
– if x0=1 then f equals f(xn,...,x1,1) (=left term on RHS)
– if x0=0 then f equals f(xn,...,x1,0) (=right term RHS)
• LHS=RHS
QED
(R)(O)BDDs –
(Reduced)(Ordered) Binary Decision Diagrams
• The order of the variables may influence the size
and synthesis time of the circuit…
x1 x0
x0
x0 x1
x1
0 1 1 1 0 1
0 1 1 1
OBDD BDD ROBDD
Read more in a logic synthesis course…

Mux-networks
xyz Value
111 1
yz 1
011 0 0
x 00 01 11 10
0 0 1 0 1 101 0
1 1
1 1 0 1 0 001 1 0 0
1
0
110 0 1 1
0 0
010 1 z
100 1 1 y
Address 0
000 0
Address
Address pins
This is actually an SRAM...
A Simple FPGA cell
• The simplest possible FPGA-cell is composed of a
single Look-Up-Table (LUT), a D-flipflop and a
bypass Mux.
• A LUT is a programmable RAM (or ROM), that can
model any combinational function of its input.
A 0
B
C LUT D Q 1
S
D
M
CLK
2(2^4)=216=65536 combinations/functions
RESET
Mapping to an FPGA - Where should I cut?
w
c2 z2
z2
x´ t t
0
c2
x LUT
y y
0 LUT
c1 z1 0
z1
x w x
w c1
0 LUT
y
y´ c0 0
0
LUT z0
z0
t c0
LUT
0
Area=5 LUTs
Mapping to an FPGA - Where should I cut?
w
c2 z2 0
z2
c2
x´ t x
y LUT
y z1
c1 z1 0
c1
x LUT
x w y
z0
y´ c0 0
c0
z0 x LUT
t y
Area=3 LUTs
(x & y need to be connected to all LUTs)
Example: Bad Variable ordering
F(x)=x4(x0+x1+x2+x3)+x4(x5+x6+x7+x8)
x0 first: f1= x4+x4(x5+x6+x7+x8) (x0=1)

f0= x4(x1+x2+x3)+x4(x5+x6+x7+x8) (x0=0)
x1 next:
f10=f11= x4+x4(x5+x6+x7+x8) (x0=1, x1=-)
f01= x4+x4(x5+x6+x7+x8) (x0=0, x1=1)
f00= x4(x2+x3)+x4(x5+x6+x7+x8) (x0=0, x1=0)
etc... (until we get functions with at most 4 inputs)

Example: Good Variable ordering
F(x)=x4(x0+x1+x2+x3)+x4(x5+x6+x7+x8)
x4 first: f1= (x0+x1+x2+x3) (x4=1)

f0= (x5+x6+x7+x8) (x4=0)
Ideal delay vs. area curve
Delay
Ideal Design
Area
Not so ideal delay vs. area curve...
Delay
Bad Startposition for the synthesis
algorithm
Good Startposition for the synthesis

algorithm
Area
Code Transformations
• Associativity - (a+b)+c=a+(b+c)
• Distributivity - c*(a+b)=ca+cb
• Commutativity - a+b=b+a
Parenthesis are important!
Y <= A + B + C + D;
A
+
B +
C + Y
D
Parenthesis are important!
Y <= (A + B) + (C + D);
A
+
B
+ Y
C
+
D
Give the tool a good starting position!

Exercise
1
+
A
MUX D-
1
+ Latch
B
S EN
• Write a VHDL description that generates this

implementation.
• Can you improve the design?
Timing Constraints
Set Proper Timing Constraints…
• The design MUST filfill the timing constraints
– Otherwise it is likely not to work
• Optimization criteria
– Timing always first
– (Throughput), Area and Power second
• Make an *.sdc (Synopsys Design Constraint) file

for your design to be sure that it works and check
the TimeQuest Analyzer report
Clock Constraint
• Declaring a clock forces all timing paths through the logic (the four
possibities) to fulfill the timing constraint (if possible).
D Q Combinational D Q
Logic
CLK CLK
RESET RESET
Tp<Tclk
create_clock -name "Clock" -period 20.000ns [get_ports {sys_clk}] -waveform {0.000 10.000}
derive_pll_clocks -create_base_clocks
derive_clock_uncertainty
Declaring Multi-cycle paths
• Some logic can be slow and need several clock-
cycles to complete its execution…
D Q D Q
DIV_UNIT
LE
CLK CLK
RESET RESET
Tp=N*Tclk
(N=20 below)
set_multicycle_path -from [get_registers {div_unit*regA*]}] -to [get_registers

{div_unit*regRes*}] -setup -start 21
set_multicycle_path -from [get_registers {div_unit*regA*]}] -to [get_registers
{div_unit*regRes*}] -hold -start 20
TimeQuest Timing Analyzer
If one of
these titles
are marked
in red,
your design
will most
likely
not work
Synthesizing Arithmetic
Operators
Synthesis of Arithmetic Operators
• Only a subset of VHDL operators supported by synthesis
tools
– Supported operations: usually +, -, =, >, <, <>, abs, *
– The designer must write VHDL description (behavioral or
structural) of all un-supported operators: /, N**x
• Specifying a range of integer is important for efficient

synthesis
– Suppose integer x takes only two values 100,101
• It will be assigned 7 bits
• It may be better to declare x as constrained integer
Synthesis of ”+”
ENTITY adder IS Synthesis tool will use an

PORT(
8 bit standard adder
x,y: IN integer RANGE –128 to 127;
z: OUT integer RANGE –128 to 127); based on the design
End ENTITY; constraints.
ARCHITECTURE adder_8bit OF adder IS  If the designer wants to use a fast

BEGIN carry look ahead adder, which
z <= x+y; does is not part of the library,
END; then he must give its architecture.
 The user defined implementation
will over-ride the standard
implementation.
How is the ”+” and ”-” operator
implemented on FPGAs?
• Arithmetic operators are very common in FPGA
designs
– Thus is it very beneficial to have special structures that
implement them
Let’s have a look at a typical FPGA structure…

Altera APEX20K (Logic Element)
APEX 20K (Carry Chain + Cascade Chain)
Used for tri-state assignments & buses
Used for ADDERs

APEX20K (LE Operating Modes)
Altera APEX20K
(MegaLAB structure)
Altera APEX20K
(LAB structure)
APEX 20K Interconnect Structure
Synthesizing Multiplications
signal A, B: INTEGER range 0 to 15;
signal Y, Z: INTEGER range 0 to 31;
signal X: INTEGER range 0 to 1023;
...
A <= B * 3;
X <= Y * Z;
-- the following example need to include one of the libraries

-- std_logic_signed, std_logic_unsigned, std_logic_arith or numeric_std
Signal C,D:std_logic_vector(9 downto 0);
Signal result:std_logic_vector(19 downto 0);
...
Result <= C*D;
signal A, B: INTEGER range 0 to 15;
signal Y, Z: INTEGER range 0 to 31;
signal X: INTEGER range 0 to 1023;
...
A <= B * 3;
X <= Y * Z;
with powers of 2
signal A, B, C, D, E, F, G, H: INTEGER range 0 to 15;
A <= B * 4;
C <= D / 4;
E <= F mod 4;
G <= H rem 4;
Multiplications in FPGAs
• Multiplications are mapped to predefined highly
configurable hardwired multipliers or DSP-blocks
– Altera (Stratix II) – a DSP block be configured as
• 8 single 9x9 multipliers
(including add and subtract features)
– Arria 10 FPGA from Altera also has

• Single-precision Floating Point
Pipelined arithmetic
Process(clk,reset)
Begin
if (reset='1') then
mul<=0; MUL
elsif rising_edge(clk) then
mul_reg<=a*b; mul_reg
mul<=mul_reg; mul
end if; A good synthesis tool use a transformation

called retiming to distribute the registers
End process; inside the mul-unit...
Packages supported by Synopsys
(SOLD HDL Compiler for VHDL, Reference Manual, App. B)
Package std_logic_1164
library ieee;
use ieee.std_logic_1164.all;
– Defines data type std_logic
Package std_logic_arith
library ieee;
use ieee.std_logic_arith.all;
– Defines data type signed and unsigned and
arithmetic operations on these data types
Packages supported by Synopsys
(SOLD HDL Compiler for VHDL, Reference Manual, App. B)
Package numeric_std
library ieee;
use ieee.numeric_std.all;
– Defines data type signed and unsigned
– Be careful, numeric_std and std_logic_arith have
OVERLAPPING definitions!
Package ATTRIBUTES
library SYNOPSYS;
use SYNOPSYS.ATTRIBUTES.ALL;
– Defines attributes which are later used by Design Compiler, e.g.
state_vector, which is used for FSMs
Floating Point Additions in VHDL-2008
New subtypes
• VHDL-2008 introduces overloads of the
std_logic_vector to support synthesis of IEEE
floating and fixed point implementations:
– New subtypes: float32, float64, and float128

– New conversion function: to_real
– Plus a lot of configuration options...

Floating-Point
• A floating point number is represented by a sign bit,

exponent bits, and a mantissa (also called fraction
bits)
S Exponent Mantissa
Sign-Bit exp m
• The value is calculated as

FlP(B) = (-1)s * (1.m) * 2exp
• The exp often has a bias (offset) value, i.e,

FlP(B) = (-1)s * (1.m) * 2exp-(bias)
IEEE-754
• IEEE-754 defines a 32-bit floating point number as
27 exp 20 2-1 m 2-23

S Exponent Mantissa
32 31 exp 23 22 m 1
• The value of the 8 bit exponent is done according to

FlP(B) = (-1)s * (1.m) * 2exp-(127)
• Special bit patterns have been reserved
Floating-Point Numbers (IEEE)
• Exponent Values 1 to 254: normalized non-zero floating-point
numbers; biased exponent (-126...+127)
• Exponent of zero and fraction of zero: positive or negative zero
• Exponent of ones and fraction of zero: positive or negative
infinity
• Exponent of ones with a non-zero fraction: NotANumber (NAN -
Exception Condition)
• Exponent of zero and fraction of non-zero: Denormalized
number (true exponent is –126), represent numbers from
0 to 2-126
FlP(B) = (-1)s * (0.m) * 2-126
• There is also a standard for a 64-bit numbers, and 128-bit
numbers (IEEE 874)
VHDL-2008: Overloaded operators
• Implementation of
– Relational operators
– Logical operators
– Arithmetic operators
• Yes, it is now possible to synthesize mod, rem and /...
Operator sizes (float32)
Operator Area
# ALUTs # 18-bit DSPs
+ 885 0
- 882 0
* 727 4
/ 1995 0
Operator Area
+ 1818 0
- 1837 0
* 1663 10
/ 7505 0
Operator Area
+ 4025 0
- 4029 0
* 4572 61
/ 29587 0
Restrictions
• The std_logic additions for supporting floating
point operators are synthesizable, but
– Excellent for modelling
– Not so good for synthesis
• Pipelining does not seem to work properly
• Use the inbuilt generators until synthesis tools

improve
Memory Synthesis
Using Templates
• Synthesizing memories is a hard task
– Semantics of the model must match the Embedded
Memories of the target FPGA
– The Semantics vary from FPGA to FPGA
• Using a modeling template from the target FPGA

provider is the simplest way to make it work…
Example: Xilinx Template for Virtex6
--
-- Dual-Port Block RAM with Two Write Ports
-- Correct Modelization with a Shared Variable
--
-- Download: ftp://ftp.xilinx.com/pub/documentation/misc/xstug_examples.zip
-- File: HDL_Coding_Techniques/rams/rams_16b.vhd
--
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;
entity rams_16b is
generic(width:integer);
port(clka : in std_logic;
clkb : in std_logic;
ena : in std_logic;
enb : in std_logic;
wea : in std_logic;
web : in std_logic;
addra : in std_logic_vector(width-1 downto 0);
addrb : in std_logic_vector(width-1 downto 0);
dia : in std_logic_vector(15 downto 0);
dib : in std_logic_vector(15 downto 0);
doa : out std_logic_vector(15 downto 0);
dob : out std_logic_vector(15 downto 0));
end rams_16b;
Xilinx Synthesizable Template
architecture syn of rams_16b is
type ram_type is array (2**width-1 downto 0) of std_logic_vector(15 downto 0);
shared variable RAM : ram_type;
begin
process (CLKA)
begin
if CLKA'event and CLKA = '1' then
if ENA = '1' then
DOA <= RAM(conv_integer(ADDRA));
if WEA = '1' then
A0 A1 ADDRA
RAM(conv_integer(ADDRA)) := DIA;
end if; Read: D0 D1 DIA
end if;
end if;
end process; A0 A1 ADDRA
process (CLKB)
begin
Write: D0 D1 DIA
if CLKB'event and CLKB = '1' then D0 D1 DOA
if ENB = '1' then
DOB <= RAM(conv_integer(ADDRB));
if WEB = '1' then
RAM(conv_integer(ADDRB)) := DIB;
end if;
end if;
end if;
end process;
end syn;
Generating Memories in Altera...
Select what to generate...
Specify memory type and file name...
Choose Data and Address width...
Turn on or off input and output registers...
Set block type and initiation file
(not mandatory)...
(But cannot have DFFs on address input!)
Good choice for ROMs

And generate the VHDL file...
Generated Entity
LIBRARY ieee;
USE ieee.std_logic_1164.all;
LIBRARY lpm;
USE lpm.lpm_components.all;
ENTITY single_port_memory IS
PORT
(
address : IN STD_LOGIC_VECTOR (11 DOWNTO 0);
we : IN STD_LOGIC := '1';
data : IN STD_LOGIC_VECTOR (7 DOWNTO 0);
q : OUT STD_LOGIC_VECTOR (7 DOWNTO 0)
);
END single_port_memory;
Use Fake Architecture for Testing
purpose
ARCHITECTURE fake OF single_port_memory IS
BEGIN
-- For debugging CTRLs with RAMs & ROMs inside FPGAs
data <= address (7 downto 0);
END fake;
• Debugging inside an FPGA is difficult
 Having a RAM/ROM makes it even more difficult

 Using a known sequence helps
• Use the SignalTap Logic Analyzer to debug!

Signal Tap Logic Analyzer
Signal Tap Logic Analyzer View…
Signal Tap Logic Analyzer –
Select nodes to display
Signal Tap Logic Analyzer…
• Record signals during run-time
1. Add Signal nodes to view
2. Select the sample clock signal
3. Select the trigger condition how/when to start
recording
4. Recompile the design (incremental compilation)
Summary
VHDL -Hardware Correspondence
VHDL Construct Hardware

Variables and Signals Flip-flops, latches, wires
Arithmetic operators (+,-,*) Adder, ALU, multiplier..
Logic operators Gates
Relational operators Comparator, ALU
Control Constructs (for,if-then, Decoders, Multiplexers, priority

case,…..) encoder, ..
Hierarchy Description Hierarchical Hardware
Resolution Functions Tri-state logic, wire-and, wire-or

Register/Latch Inference Rules
Type of Body
Clocked Body Non Clocked Body
Signals Variables Signals & Variables

Driven in all Not driven in
RBW: Read before write
RBW WBR branches all branches
WBR: Write before read
Flip-flops Wires Latch

Watch out for Latches in your
design!!!
• Latches have a unpredicatble timing behaviour
that makes it very hard to create a functioning
design
– Try to avoid them at all costs
– Use DFF’s instead
– Look for them in the synthesis reports, if there is one

left, there is a high chance that your design will not
work.
Synchronous Behavior: Restrictions
process (clk) process (clk_a, clk_b)

begin begin
if (clk’event and if (clk_a’event and
clk = ’1’) then clk_a = ‘1’) then
p <= a+b; p <= a + b;
else end if;
q <= a-b; if (clk_b’event and
end if; clk_b = ‘1’) then
end process; q <= a + b;
end if;
end process;
Illegal! Only one clock allowed!

Sensitivity List and Synthesis
• The sensitivity list is very important for
making the simulation efficient but it is
not used by the synthesizer. x
y z
process (x)
begin
z <= x and y; --z is only evaluated
-- when there is a
-- change in value of x
Be careful, when you
end; use a sensitivity list
that does not include
all values!
Un-Synthesizable VHDL Subset
• Not supported:Ignored
– The construct is ignored by the synthesis tool
– Examples:
• Intialization of variables or signals
• Assert statements
• Physical Type: Time
– The synthesizer ignores the AFTER clause in expressions
Example: x <= y AFTER 100ns
• Not Supported: Illegal
– Data types
• Files, real, generic that are not integers (Altera allows strings)
• Multi-dimensional arrays (=>rewrite into 1-D equivalents)
– Arrays implemented as RAM/ROM (or databuses)
– Supported from Synopsys 2004.06
Have in mind...
• Delay in Models:
– after clauses are ignored.
– Delay is technology dependent!
• Data Types
– Use std_logic, std_logic_vector, signed, unsigned and constrained
integers.
– Unconstrained integers result in 32 bits!
• Initial Values are ignored!
– All sequential circuits should have a reset!
Have in mind… (ctd.)
• Attributes
– Use attributes to specify the state vector of an FSM.
– You may use also an attribute for state encoding, to force the
synthesis tool to choose an encoding style
• RAMs
– Use a predefined RAM block from FPGA provider, otherwise it is
likely that the RAM is synthesized with Flip-Flops or Latches
• Latches
– Avoid unnecessary latches and flip-flops
Give the tool a good starting point…
Delay
Bad Startposition for the synthesis
algorithm
Good Startposition for the synthesis

algorithm
Area
...and don’t forget!
THINK
HARDWARE
Appendix
Synthesis of a D-Latch
entity d_latch is
port (GATE, DATA: in std_logic;
Q : out std_logic );
end d_latch;
architecture rtl of d_latch is
begin
infer: process (GATE, DATA) begin
if (GATE = ’1’) then
Q <= DATA;
end if;
end process infer;
end rtl;
Synthesis of a Latch
Synthesis of Latches and Flip-Flops
• Y in the SR and SS columns indicates that the flip-flop has
a synchronous reset and a synchronous set
• N in the AR, AS, and ST columns indicates that the flip-
flop does not have an asynchronous reset, asynchronous
set, or synchronous toggle
• A dash (–) in the Bus and MB columns indicates that
these columns are not relevant to this design
Synthesis of a D-Flip-Flop
entity dff_pos is
port (DATA, CLK : in std_logic;
end dff_pos;
architecture rtl of dff_pos is

begin
infer : process (CLK) begin
if (CLK’event and CLK = ’1’) then
Q <= DATA;
end if;
end process infer;
end rtl;
Synthesis of a D-Flip-Flop
Synthesis of a D-Flip-Flop with
asynchronous reset
entity dff_async_reset is
port (DATA, CLK, RESET : in std_logic;
end dff_async_reset;
architecture rtl of dff_async_reset is

begin
infer : process ( CLK, RESET) begin
if (RESET = ’1’) then
Q <= ’0’;
elsif (CLK’event and CLK = ’1’) then
Q <= DATA;
end if;
end process infer;
end rtl;
asynchronous reset
synchronous reset
use synopsys.attributes.all;
entity dff_sync_reset is
port (DATA, CLK, SET : in std_logic;
attribute sync_set_reset of SET : signal is "true";
end dff_sync_reset;
synchronous reset
architecture rtl of dff_sync_reset is
begin
infer : process (CLK) begin
if (CLK’event and CLK = ’1’) then
if (SET = ’0’) then
Q <= ’0’;
else
Q <= DATA;
end if;
end if;
end process infer;
end rtl;
synchronous reset
Synthesizing Three-State Drivers
entity three_state is
port(IN1, ENABLE : in std_logic;
OUT1 : out std_logic );
end;
architecture rtl of three_state is
begin
process (IN1, ENABLE) begin
if (ENABLE = ’1’) then
OUT1 <= IN1;
else
OUT1 <= ’Z’; -- assigns high-impedance state
end if;
end process;
end rtl;
Synthesizing Three-State Drivers
FPGA Architectures
FPGA Architectures
• PLAs, PLDs and CPLDs
• FPGAs
Programmable Logic Device (PLD)
A0 1 & OR-Matrix
fixed!
A1 1 &
&
AND-Matrix
not fixed!
&
A0 A0 A1 A1 1 1
Half-Adder Implementation (PLD)
A 1 & S0  A B
 AB  A B
B 1 & S1  AB
& Only the AND-terms

ProgrammedA that result in a ’1’
ND-Matrix are implemented
&
A0 A0 A1 A1 1 1
S0 S1
Early PLD (also called PLAs or PALs)
(PAL 16R6)
Clock
combinational output
8 inputs 6 sequential
outputs
combinational output
output enable
Programmable Logic Device
• No customized mask
layers or logic cells
• A single large block of
interconnects
• Macrocells consist of
programmable array
logic followed by a flip-
flop or latch
Complex PLD
Altera MAX 7000 Family
Altera MAX 7000 Family
(Macrocell)
Macrocell can be used as

combinational or sequential output
Altera MAX 7000
(Devices)
The size of PLDs is limited.

Field Programmable Gate Arrays
• None of the layers is
customized
• Basic logic cells and
interconnect can be
programmed
• Basic cells can be SRAM
based, Flash Memory based or
fuse-based (one time
programmable)
What is an efficient
basic cell?
The FPGA cell
• The most efficient FPGA-cell consists of a single Look-Up-

Table (LUT) with four or five inputs, a D-flipflop and a
bypass Mux.
• A LUT is a programmable SRAM, that can model any
combinational function of its input.
A 0
B
C LUT D Q 1
S
D
M
CLK
RESET
Xilinx XC4000
• SRAM based
• Coarse-grain
architecture
Altera FLEX10K
(Logic Element)
Altera FLEX10K
(Block Diagram)
FLEX10K Logic Element
(Normal Mode)
Altera FLEX10K
(Devices)
Altera FLEX10K
(Devices)
Altera APEX20K
(Block Diagram)
Altera APEX20K
(MegaLAB structure)
Altera APEX20K
(LAB structure)
Altera APEX20K
(LAB control signal generation)
Altera APEX20K
(Logic Element)
APEX 20K (Carry Chain + Cascade Chain)
APEX20K (LE Operating Modes)
APEX 20K Interconnect Structure
Altera APEX20K
(Devices)
Xilinx Virtex II-Pro
• SRAM based FPGA

• May include upto 4
Power PC processors
Xilinx Virtex II-Pro

F6 - Introduction To RTL Synthesis and Technology Mapping PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

F6 - Introduction To RTL Synthesis and Technology Mapping PDF

Caricato da

Copyright:

Formati disponibili

F6: Introduction to RTL Synthesis

and Technology Mapping

• Introduction to Synthesis & Technology Mapping

• The HDL description can be synthesised into

Hard to find xor-functions in the Karnaugh-

An (n+1)-input function can always be implemented on an n-input select mux!

Note that Z=SX+S’Y

• Replace logic with equivalent logic if transformation lead

• Critical Path is calculated from Input to Output,

Area=19 gates Area=11 gates

x f1 =(xy’)´ = DeMorgan = (x’+y) = w

Area=11 gates Area=9 gates

Any boolean function f(xn, …, x1, x0) can be split (recursively)

f(xn, …, x1, x0) = x0 f(xn, …, x1, 1) + x0 f(xn, …, x1, 0)

OBDD BDD ROBDD

Read more in a logic synthesis course…

x0 first: f1= x4+x4(x5+x6+x7+x8) (x0=1)

etc... (until we get functions with at most 4 inputs)

x4 first: f1= (x0+x1+x2+x3) (x4=1)

Good Startposition for the synthesis

Give the tool a good starting position!

• Write a VHDL description that generates this

• Make an *.sdc (Synopsys Design Constraint) file

set_multicycle_path -from [get_registers {div_unit*regA*]}] -to [get_registers

• Specifying a range of integer is important for efficient

ENTITY adder IS Synthesis tool will use an

ARCHITECTURE adder_8bit OF adder IS  If the designer wants to use a fast

Let’s have a look at a typical FPGA structure…

Used for tri-state assignments & buses

Used for ADDERs

-- the following example need to include one of the libraries

– Arria 10 FPGA from Altera also has

end if; A good synthesis tool use a transformation

– New subtypes: float32, float64, and float128

– Plus a lot of configuration options...

• A floating point number is represented by a sign bit,

• The value is calculated as

• The exp often has a bias (offset) value, i.e,

• IEEE-754 defines a 32-bit floating point number as

27 exp 20 2-1 m 2-23

• The value of the 8 bit exponent is done according to

• Use the inbuilt generators until synthesis tools

• Using a modeling template from the target FPGA

(But cannot have DFFs on address input!)

Good choice for ROMs

• Debugging inside an FPGA is difficult

 Having a RAM/ROM makes it even more difficult

• Use the SignalTap Logic Analyzer to debug!

VHDL Construct Hardware

Arithmetic operators (+,-,*) Adder, ALU, multiplier..

Logic operators Gates

Relational operators Comparator, ALU

Control Constructs (for,if-then, Decoders, Multiplexers, priority

Resolution Functions Tri-state logic, wire-and, wire-or

Clocked Body Non Clocked Body

Signals Variables Signals & Variables

Flip-flops Wires Latch

– Look for them in the synthesis reports, if there is one

process (clk) process (clk_a, clk_b)

Illegal! Only one clock allowed!

Good Startposition for the synthesis

architecture rtl of dff_pos is

architecture rtl of dff_async_reset is

set_multicycle_path -from [get_registers {div_unitregA]}] -to [get_registers