Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Quartus
Hardware Synthesis from VHDL
FPGA/ASIC Design Flow
A Crash Course in Technology
Mapping
• Two-level logic mapping
• Multi-level logic mapping
• Mapping to ASIC & FPGAs
Two-level logic Mapping
• Standard Karnaugh Minimization
• Using muxes to implement logic functions
You should know this by now...
xy w
wz 00 01 11 10
00 1 1 0 0
x
01 1 1 1 1
11 0 1 1 0
f
z
10 0 1 1 0
y
f = w’x’+w’z+wy
XOR & XNOR
01 1 0 1 0
11 0 1 0 1
10 1 0 1 0
wzxy
Implementing functions using Muxes
xy
00 01 11 10 z’ 00
z
0 1 1 0 0 1 01
0 f
1 0 1 1 0 10
z 11
f = z’x’+x’y+zy xy
• FPGAs
– Shannon Decomposition
Basic Mapping Algorithm
• Traverse the circuit graph
• Find and apply suitable transformations.
w
0.05
Tx->z63=(0.05+0.038*64)+
z0
y´0
(0.07+0.016*5)=
z1
y´1 Load Fanout
. 2.63 ns
. A circuit’s intrinsic delay represents the RC delay of the
unloaded circuit.
.
x0
z62
y´62
.
y´63 0.07 z63 Fanout=5
. Fanout is the number of circuits
connected to the driving output.
Load=0.038
Load=0.016
.
A circuit loads its predecessor with capacitance! x5
The load is expressed as an RC delay that loads its driver.
Buffering (ctd.)
x
0.05
w Assume one buffer has 2 inverters’
z0 input loads
y´0
.
.
. Tx->z63=(0.05+0.076*2)+
z1
y´31
(0.15+0.006*32)+
0.15
Load=0.076
z32 (0.07+0.016*5)=
y´32
.
. 0.69 ns
.
Fanout=5
0.07 z63 Load=0.016
y´63
Load=0.006
A buffer has higher driving strength (due to lower resitance).
Thus, load is reduced!
Mapping to an ASIC Technology
DeMorgan
f1 f1
x x
y´ y´
z2 z2
c2 x´ c2
x t t
y´ y
c1 z1 c1 z1
w x w
x´
y c0 y´ c0
z0 z0
x´ x´
y f2 y f2
f11
f1 1
1 f01
f 0
f0 0 1 f
f10 0
1
x0 f00
0 x0
x1
Proof
• RHS:
– If x0=1 then the right term will be zero. Then f equals the left term.
– If x0=0 then the left term will be zero. Then f equals the right term
• LHS:
– if x0=1 then f equals f(xn,...,x1,1) (=left term on RHS)
– if x0=0 then f equals f(xn,...,x1,0) (=right term RHS)
• LHS=RHS
QED
(R)(O)BDDs –
(Reduced)(Ordered) Binary Decision Diagrams
• The order of the variables may influence the size
and synthesis time of the circuit…
x1 x0
x0
x0 x1
x1
0 1 1 1 0 1
0 1 1 1
111 1
yz 1
011 0 0
x 00 01 11 10
0 0 1 0 1 101 0
1 1
1 1 0 1 0 001 1 0 0
1
0
110 0 1 1
0 0
010 1 z
100 1 1 y
Address 0
000 0
Address
Address pins
This is actually an SRAM...
A Simple FPGA cell
• The simplest possible FPGA-cell is composed of a
single Look-Up-Table (LUT), a D-flipflop and a
bypass Mux.
• A LUT is a programmable RAM (or ROM), that can
model any combinational function of its input.
A 0
B
C LUT D Q 1
S
D
M
CLK
2(2^4)=216=65536 combinations/functions
RESET
Mapping to an FPGA - Where should I cut?
w
c2 z2
z2
x´ t t
0
c2
x LUT
y y
0 LUT
c1 z1 0
z1
x w x
w c1
0 LUT
y
y´ c0 0
0
LUT z0
z0
t c0
LUT
0
Area=5 LUTs
Mapping to an FPGA - Where should I cut?
w
c2 z2 0
z2
c2
x´ t x
y LUT
y z1
c1 z1 0
c1
x LUT
x w y
z0
y´ c0 0
c0
z0 x LUT
t y
Area=3 LUTs
(x & y need to be connected to all LUTs)
Example: Bad Variable ordering
F(x)=x4(x0+x1+x2+x3)+x4(x5+x6+x7+x8)
F(x)=x4(x0+x1+x2+x3)+x4(x5+x6+x7+x8)
Delay
Ideal Design
Area
Not so ideal delay vs. area curve...
Delay
Bad Startposition for the synthesis
algorithm
Area
Code Transformations
• Associativity - (a+b)+c=a+(b+c)
• Distributivity - c*(a+b)=ca+cb
• Commutativity - a+b=b+a
Parenthesis are important!
Y <= A + B + C + D;
A
+
B +
C + Y
D
Parenthesis are important!
Y <= (A + B) + (C + D);
A
+
B
+ Y
C
+
D
1
+
A
MUX D-
1
+ Latch
B
S EN
• Optimization criteria
– Timing always first
– (Throughput), Area and Power second
D Q Combinational D Q
Logic
CLK CLK
RESET RESET
Tp<Tclk
create_clock -name "Clock" -period 20.000ns [get_ports {sys_clk}] -waveform {0.000 10.000}
derive_pll_clocks -create_base_clocks
derive_clock_uncertainty
Declaring Multi-cycle paths
• Some logic can be slow and need several clock-
cycles to complete its execution…
D Q D Q
DIV_UNIT
LE
CLK CLK
RESET RESET
Tp=N*Tclk
(N=20 below)
If one of
these titles
are marked
in red,
your design
will most
likely
not work
Synthesizing Arithmetic
Operators
Synthesis of Arithmetic Operators
• Only a subset of VHDL operators supported by synthesis
tools
– Supported operations: usually +, -, =, >, <, <>, abs, *
– The designer must write VHDL description (behavioral or
structural) of all un-supported operators: /, N**x
A <= B * 4;
C <= D / 4;
E <= F mod 4;
G <= H rem 4;
Multiplications in FPGAs
• Multiplications are mapped to predefined highly
configurable hardwired multipliers or DSP-blocks
– Altera (Stratix II) – a DSP block be configured as
• 8 single 9x9 multipliers
• 4 single 18x18 multipliers
• 1 single 36x36 multipliers
(including add and subtract features)
mul<=mul_reg; mul
Package std_logic_1164
library ieee;
use ieee.std_logic_1164.all;
– Defines data type std_logic
Package std_logic_arith
library ieee;
use ieee.std_logic_arith.all;
– Defines data type signed and unsigned and
arithmetic operations on these data types
Packages supported by Synopsys
(SOLD HDL Compiler for VHDL, Reference Manual, App. B)
Package numeric_std
library ieee;
use ieee.numeric_std.all;
– Defines data type signed and unsigned
– Be careful, numeric_std and std_logic_arith have
OVERLAPPING definitions!
Package ATTRIBUTES
library SYNOPSYS;
use SYNOPSYS.ATTRIBUTES.ALL;
– Defines attributes which are later used by Design Compiler, e.g.
state_vector, which is used for FSMs
Floating Point Additions in VHDL-2008
New subtypes
• VHDL-2008 introduces overloads of the
std_logic_vector to support synthesis of IEEE
floating and fixed point implementations:
Operator Area
# ALUTs # 18-bit DSPs
+ 885 0
- 882 0
* 727 4
/ 1995 0
Operator sizes (float64)
Operator Area
# ALUTs # 18-bit DSPs
+ 1818 0
- 1837 0
* 1663 10
/ 7505 0
Operator sizes (float128)
Operator Area
# ALUTs # 18-bit DSPs
+ 4025 0
- 4029 0
* 4572 61
/ 29587 0
Restrictions
• The std_logic additions for supporting floating
point operators are synthesizable, but
– Excellent for modelling
– Not so good for synthesis
• Pipelining does not seem to work properly
LIBRARY lpm;
USE lpm.lpm_components.all;
ENTITY single_port_memory IS
PORT
(
address : IN STD_LOGIC_VECTOR (11 DOWNTO 0);
we : IN STD_LOGIC := '1';
data : IN STD_LOGIC_VECTOR (7 DOWNTO 0);
q : OUT STD_LOGIC_VECTOR (7 DOWNTO 0)
);
END single_port_memory;
Use Fake Architecture for Testing
purpose
ARCHITECTURE fake OF single_port_memory IS
BEGIN
-- For debugging CTRLs with RAMs & ROMs inside FPGAs
data <= address (7 downto 0);
END fake;
Type of Body
Delay
Bad Startposition for the synthesis
algorithm
Area
...and don’t forget!
THINK
HARDWARE
Appendix
Synthesis of a D-Latch
entity d_latch is
port (GATE, DATA: in std_logic;
Q : out std_logic );
end d_latch;
architecture rtl of d_latch is
begin
infer: process (GATE, DATA) begin
if (GATE = ’1’) then
Q <= DATA;
end if;
end process infer;
end rtl;
Synthesis of a Latch
Synthesis of Latches and Flip-Flops
• Y in the SR and SS columns indicates that the flip-flop has
a synchronous reset and a synchronous set
• N in the AR, AS, and ST columns indicates that the flip-
flop does not have an asynchronous reset, asynchronous
set, or synchronous toggle
• A dash (–) in the Bus and MB columns indicates that
these columns are not relevant to this design
Synthesis of a D-Flip-Flop
entity dff_pos is
port (DATA, CLK : in std_logic;
Q : out std_logic );
end dff_pos;
A0 1 & OR-Matrix
fixed!
A1 1 &
&
AND-Matrix
not fixed!
&
A0 A0 A1 A1 1 1
Half-Adder Implementation (PLD)
A 1 & S0 A B
AB A B
B 1 & S1 AB
A0 A0 A1 A1 1 1
S0 S1
Early PLD (also called PLAs or PALs)
(PAL 16R6)
Clock
combinational output
8 inputs 6 sequential
outputs
combinational output
output enable
Programmable Logic Device
• No customized mask
layers or logic cells
• A single large block of
interconnects
• Macrocells consist of
programmable array
logic followed by a flip-
flop or latch
Complex PLD
Altera MAX 7000 Family
Altera MAX 7000 Family
(Macrocell)
What is an efficient
basic cell?
The FPGA cell
• SRAM based
• Coarse-grain
architecture
Altera FLEX10K
(Logic Element)
Altera FLEX10K
(Block Diagram)
FLEX10K Logic Element
(Normal Mode)
Altera FLEX10K
(Devices)
Altera FLEX10K
(Devices)
Altera APEX20K
(Block Diagram)
Altera APEX20K
(MegaLAB structure)
Altera APEX20K
(LAB structure)
Altera APEX20K
(LAB control signal generation)
Altera APEX20K
(Logic Element)
APEX 20K (Carry Chain + Cascade Chain)
APEX20K (LE Operating Modes)
APEX 20K Interconnect Structure
Altera APEX20K
(Devices)
Xilinx Virtex II-Pro