Sei sulla pagina 1di 825

Overview

 Why VLSI?
 Moore’s Law.
 Why FPGAs?
 The VLSI and system design process.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Why VLSI?

 Integration improves the design:


– lower parasitics = higher speed;
– lower power;
– physically smaller.
 Integration reduces manufacturing cost-
(almost) no manual assembly.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


VLSI and you

 Microprocessors:
– personal computers;
– microcontrollers.
 DRAM/SRAM/flash.
 Audio/video and other consumer systems.
 Telecommunications.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Moore’s Law

 Gordon Moore: co-founder of Intel.


 Predicted that number of transistors per chip
would grow exponentially (double every 18
months).
 Exponential improvement in technology is a
natural trend: steam engines, dynamos,
automobiles.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Moore’s Law plot

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


The cost of fabrication

 Current cost: $2-3 billion.


 Typical fab line occupies about 1 city block,
employs a few hundred people.
 New fabrication processes require 6-8
month turnaround.
 Most profitable period is first 18 months-2
years.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Cost factors in ICs

 For large-volume ICs:


– packaging is largest cost;
– testing is second-largest cost.
 For low-volume ICs, design costs may
swamp all manufacturing costs.
– $10 million-$20 million.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Mask cost vs. line width

1,000,000
900,000
800,000
700,000
600,000
500,000
mask cost ($)
400,000
300,000
200,000
100,000
0
.25 micron .18 micron .13 micron .09 micron

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Field-programmable gate arrays

 FPGAs are programmable logic devices:


– Logic elements + interconnect.
– Provide multi-level logic.

LE LE
Interconnect
LE LE
network
LE LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGAs and VLSI

 FPGAs are standard parts:


– Pre-manufactured.
– Don’t worry (much) about physical design.
 Custom silicon:
– Tailored to your application.
– Generally lower power consumption.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Standard parts vs. custom

 Do you build your system with an FPGA or


with custom silicon?
– FPGAs have shorter design cycle.
– FPGAs have no manufacturing delay.
– FPGAs reduce inventory.
– FPGAs are slower, larger, more power-hungry.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Challenges in system design

 Multiple levels of abstraction: logic to


CPUs.
 Multiple and conflicting constraints: low
cost and high performance are often at odds.
 Short design time: Late products are often
irrelevant.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


The system design process

 May be part of larger product design.


 Major levels of abstraction:
– specification;
– architecture; FPGA-based system design
– logic design;
– circuit design;
– layout.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Dealing with complexity

 Divide-and-conquer: limit the number of


components you deal with at any one time.
 Group several components into larger
components:
– transistors form gates;
– gates form functional units;
– functional units form processing elements;
– etc.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Hierarchical name

 Interior view of a component:


– components and wires that make it up.
 Exterior view of a component = type:
– body; cout
– pins. sum
a Full
adder
b
cin

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Instantiating component types

 Each instance has its own name:


– add1 (type full adder)
– add2 (type full adder).
 Each instance is a separate copy of the type:
cout Add2.a
Add1.a
sum sum
a Add1(Full a Add2(Full
adder) adder)

b b
cin cin
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
A hierarchical logic design

box1 box2 x

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Net lists and component lists

 Net list:  Component list:


net1: top.in1 in1.in top: in1=net1 n1=topin1
net2: i1.out xxx.B n2=topin2 n3=topine
topin1: top.n1 xxx.xin1 out=outnet
topin2: top.n2 xxx.xin2 i1: in=net1 out=net2
botin1: top.n3 xxx.xin3 xxx: xin1=topin1
xin2=topin2
net3: xxx.out i2.in xin3=botin1 B=net2
outnet: i2.out top.out out=net3
i2: in=net3 out=outnet

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Component hierarchy

top

i1 xxx i2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hierarchical names

 Typical hierarchical name:


– top/i1.foo

component pin

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Layout and its abstractions

 Layout for dynamic latch:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stick diagram

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Transistor schematic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Mixed schematic

inverter

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Levels of abstraction

 Specification: function, cost, etc.


 Architecture: large blocks.
 Logic: gates + registers.
 Circuits: transistor sizes for speed, power.
 Layout: determines parasitics.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Circuit abstraction

 Continuous voltages and time:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Digital abstraction

 Discrete levels, discrete time:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register-transfer abstraction

 Abstract components, abstract data types:

0010
+
0001
+ 0011

0100

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Top-down vs. bottom-up design

 Top-down design adds functional detail.


– Create lower levels of abstraction from upper
levels.
 Bottom-up design creates abstractions from
low-level behavior.
 Good design needs both top-down and
bottom-up efforts.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design abstractions
English specification
Executable Throughput,
program behavior design time

register- Function units,


function Sequential clock cycles cost
transfer
machines
Literals,
Logic gates logic logic depth

transistors circuit nanoseconds

rectangles layout microns

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA design

 FPGA manufacturer creates an FPGA


fabric; system designer uses the fabric.
 FPGA fabric design issues:
– Study sample user designs.
– Select interconnect topology.
– Create logic element structures.
– Design circuits, layout.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Why do we care about layout?

 We won’t design layout.


 Layout determines:
– Logic delay.
– Interconnect delay.
– Energy consumption.
 We want to understand sources of FPGA
characteristics.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design validation

 Must check at every step that errors haven’t


been introduced-the longer an error remains,
the more expensive it becomes to remove it.
 Forward checking: compare results of less-
and more-abstract stages.
 Back annotation: copy performance
numbers to earlier stages.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Basic fabrication steps.


 Transistor structures.
 Basic transistor behavior.
 Latch up.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Fabrication processes

 IC built on silicon substrate:


– some structures diffused into substrate;
– other structures built on top of substrate.
 Substrate regions are doped with n-type and p-
type impurities. (n+ = heavily doped)
 Wires made of polycrystalline silicon (poly),
multiple layers of aluminum/copper (metal).
 Silicon dioxide (SiO2) is insulator.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Simple cross section

SiO2 metal3

metal2

transistor metal1
via

poly
n+ n+
p+
substrate
substrate
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Photolithography

Mask patterns are put on wafer using photo-


sensitive material:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Process steps

First place tubs to provide properly-doped


substrate for n-type, p-type transistors:

p-tub n-tub

substrate

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Process steps, cont’d.

Pattern polysilicon before diffusion regions:

poly gate oxide poly

p-tub n-tub

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Process steps, cont’d

Add diffusions, performing self-masking:

poly poly

n+ p-tub n+ p+ n-tub p+

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Process steps, cont’d

Start adding metal layers:

metal 1 metal 1

poly vias poly

n+ p-tub n+ p+ n-tub p+

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Level 2 metal

 Polish SiO2 before adding metal 2:


metal 2

metal 1 metal 1

poly vias poly

n+ p-tub n+ p+ p-tub p+

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Transistor structure

n-type transistor:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Transistor layout

n-type (tubs may vary):

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Drain current characteristics

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Drain current

 Linear region (Vds < Vgs - Vt):


– Id = k’ (W/L)(Vgs - Vt)(Vds - 0.5 Vds2)
 Saturation region (Vds >= Vgs - Vt):
– Id = 0.5k’ (W/L)(Vgs - Vt) 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


90 nm transconductances

Typical parameters:
 n-type:
– kn’ = 13 A/V2
– Vtn = 0.14 V
 p-type:
– kp’ = 7 A/V2
– Vtp = -0.21 V

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Current through a transistor

Use 90 nm parameters. Let W/L = 3/2.


Measure at boundary between linear and
saturation regions.
 Vgs = 0.25V:
Id = 0.5k’(W/L)(Vgs-Vt)2= 0.12 A
 Vgs = 1V:
Id = 7.2 A

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Basic transistor parasitics

 Gate to substrate, also gate to source/drain.


 Source/drain capacitance, resistance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Basic transistor parasitics, cont’d

 Gate capacitance Cg. Determined by active


area.
 Source/drain overlap capacitances Cgs, Cgd.
Determined by source/gate and drain/gate
overlaps. Independent of transistor L.
– Cgs = Col W
 Gate/bulk overlap capacitance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch-up

 CMOS ICs have parastic silicon-controlled


rectifiers (SCRs).
 When powered up, SCRs can turn on,
creating low-resistance path from power to
ground. Current can destroy chip.
 Early CMOS problem. Can be solved with
proper circuit/layout structures.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Parasitic SCR

circuit I-V behavior


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Parasitic SCR structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Solution to latch-up

Use tub ties to connect tub to power rail. Use


enough to create low-voltage connection.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Logic gate delay.


 Logic gate power consumption.
 Driving large loads.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic levels

 Solid logic 0/1 defined by VSS/VDD.


 Inner bounds of logic values VL/VH are not
directly determined by circuit properties, as
in some other logic families.
VDD
logic 1
VH
unknown
VL

VSS logic 0

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic level matching

 Levels at output of one gate must be


sufficient to drive next gate.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Transfer characteristics

 Transfer curve shows static input/output


relationship—hold input voltage, measure
output voltage.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Inverter transfer curve

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic thresholds

 Choose threshold voltages at points where


slope of transfer curve = -1.
 Inverter has a high gain between VIL and
VIH points, low gain at outer regions of
transfer curve.
 Note that logic 0 and 1 regions are not equal
sized—in this case, high pullup resistance
leads to smaller logic 1 range.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Noise margin

 Noise margin = voltage difference between


output of one gate and input of next. Noise
must exceed noise margin to make second
gate produce wrong output.
 In static gates, t= voltages are VDD and
VSS, so noise margins are VDD-VIH and VIL-
VSS.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay

 Assume ideal input (step), RC load.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay assumptions

 Assume that only one transistor is on at a


time. This gives two cases:
– rise time, pullup on;
– fall time, pullup off.
 Assume resistor model for transistor.
Ignores saturation region and
mischaracterizes linear region, but results
are acceptable.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Current through transistor

 Transistor starts in saturation region, then


moves to linear region.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Resistive model for transistor

 Average V/I at two voltages:


– maximum output voltage
– middle of linear region
 Voltage is Vds, current is given Id at that
drain voltage. Step input means that Vgs =
VDD always.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Resistive approximation

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Ways of measuring gate delay

 Delay: time required for gate’s output to


reach 50% of final value.
 Transition time: time required for gate’s
output to reach 10% (logic 0) or 90% (logic
1) of final value.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Inverter delay circuit

 Load is resistor + capacitor, driver is


resistor.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Inverter delay with t model

 t model: gate delay based on RC time


constant t.
 Vout(t) = VDD exp{-t/(Rn+RL)/ CL}
 tf = 2.2 R CL
 For pullup time, use pullup resistance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


t model inverter delay

 90 nm process:
– Rn = 11.1 kW
– Cl = 0.12 fF
 So
– tf = 2.2 x 11.1E3 x 0.12E-15 = 2.9 ps.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Quality of RC approximation

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Quality of step input
approximation

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Results of using small pullup

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Other models

 Current source model (used in power/delay


studies):
– tf = CL (VDD-VSS)/Id
– = CL (VDD-VSS)/0.5 k’ (W/L) (VDD-VSS -Vt)2
 Fitted model: fit curve to measured circuit
characteristics.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Body effect and gates

 Difference between source and substrate


voltages causes body effect.
 Source for gates in middle of network may
not equal substrate:

0
Source above VSS
0

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Body effect and gate input
ordering

 To minimize body effect, put early arriving


signals at transistors closest to power
supply:

Early arriving signal

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power consumption analysis

 Dynamic power consumption comes from


switching behavior.
 Static power dissipation comes from
leakage currents.
 Surprising result: dynamic power
consumption is independent of the sizes of
the pullups and pulldowns.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power consumption circuit

 Input is square wave.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power consumption

 A single cycle requires one charge and one


discharge of capacitor: E = CL(VDD - VSS)2 .
 Clock frequency f = 1/t.
 Energy E = CL(VDD - VSS)2.
 Power = E x f = f CL(VDD - VSS)2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Observations on power
consumption

 Resistance of pullup/pulldown drops out of


energy calculation.
 Power consumption depends on operating
frequency.
– Slower-running circuits use less power (but not
less energy to perform the same computation).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Speed-power product

 Also known as power-delay product.


 Helps measure quality of a logic family.
 For static CMOS:
– SP = P/f = CV2.
 Static CMOS speed-power product is
independent of operating frequency.
– Voltage scaling depends on this fact.
– Considers only dynamic power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sources of leakage

 Weak inversion current (subthreshold current)


 Gate-induced drain leakage at the gate/drain
overlap.
 Drain-induced barrier lowering of the source.
 Punchthrough currents.
 Reverse-biased pn junctions.
 etc.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Subthreshold leakage current

 Strong function of the threshold voltage Vt.


 Important in 90 nm and below technologies.
 Can adjust threshold by changing substrate
bias.
 Leakage through a chain of transistors is
lower than leakage through a single
transistor.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Driving large loads

 Sometimes, large loads must be driven:


– off-chip;
– long wires on-chip.
 Sizing up the driver transistors only pushes
back the problem—driver now presents
larger capacitance to earlier stage.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Cascaded driver circuit

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Optimal sizing

 Use a chain of inverters, each stage has


transistors a larger than previous stage.
 Minimize total delay through driver chain:
– ttot = n(Cbig/Cg)1/n tmin.
 Optimal number of stages:
– nopt = ln(Cbig/Cg).
 Driver sizes are exponentially tapered with
size ratio a.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Wire and via structures.


 Wire parasitics.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wires and vias

metal 3

metal 2

vias
metal 1

poly poly

n+ p-tub n+

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Metal migration

 Current-carrying capacity of metal wire


depends on cross-section. Height is fixed,
so width determines current limit.
 Metal migration: when current is too high,
electron flow pushes around metal grains.
Higher resistance increases metal migration,
leading to destruction of wire.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Metal migration problems and
solutions

 Marginal wires will fail after a small


operating period—infant mortality.
 Normal wires must be sized to accomodate
maximum current flow:
Imax = 1.5 mA/m of metal width.
 Mainly applies to VDD/VSS lines.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Diffusion wire capacitance

 Capacitances formed by p-n junctions:


sidewall
capacitances depletion region

n+ (ND)

bottomwall
substrate (NA) capacitance

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Depletion region capacitance

 Zero-bias depletion capacitance:


– Cj0 = si/xd.
 Depletion region width:
– xd0 = sqrt[(1/NA + 1/ND)2siVbi/q].
 Junction capacitance is function of voltage
across junction:
– Cj(Vr) = Cj0/sqrt(1 + Vr/Vbi)

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Poly/metal wire capacitance

 Two components:
– parallel plate;
– fringe.
fringe

plate

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Metal coupling capacitances

 Can couple to adjacent wires on same layer,


wires on above/below layers:
metal 2

metal 1 metal 1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire resistance

 Resistance of any size square is constant:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Metal mean-time-to-failure

 MTF for metal wires = time required for


50% of wires to fail.
 Depends on current density:
– proportional to j-n e Q/kT
– j is current density
– n is constant between 1 and 3
– Q is diffusion activation energy

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Skin effect

 At low frequencies, most of copper


conductor’s cross section carries current.
 As frequency increases, current moves to
skin of conductor.
– Back EMF induces counter-current in body of
conductor.
 Skin effect most important at gigahertz
frequencies.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Skin effect, cont’d

 Isolated conductor:  Conductor and ground:

Low frequency
Low frequency

High frequency
High frequency
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Skin depth

 Skin depth is depth at which conductor’s


current is reduced to 1/3 = 37% of surface
value:
 d = 1/sqrt(p f  s)
– f = signal frequency
  = magnetic permeability
 s = wire conductivity

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Effect on resistance

 Low frequency resistance of wire:


– Rdc = 1/ s wt
 High frequency resistance with skin effect:
– Rhf = 1/2 s d (w + t)
 Resistance per unit length:
– Rac = sqrt(Rdc 2 + k Rhf 2)
 Typically k = 1.2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire capacitance and resistance

Capacitance Coupling Resistance/


to ground capacitance length
(aF/m) (aF/m) (W/m)
Metal 3 18 9 0.2

Metal 2 47 24 0.3

Metal 1 76 36 0.3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Gate delay vs. wire delay

 Minimum-size inverter delay: 2.9 ps


 Length of wire with equal delay---assume
wire with capacitance equal to inverter
input capacitance = 0.12 fF.
– Metal 3 length is 6.7 m.
– About 75 times width of minimum-size
transistor.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Driving long wires.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire delay

 Wires have parasitic resistance, capacitance.


 Parasitics start to dominate in deep-
submicron wires.
 Distributed RC introduces time of flight
along wire into gate-to-gate delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RC transmission line

 Assumes that dominant capacitive coupling


is to ground, inductance can be ignored.
 Elemental values are ri, ci.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Elmore delay

 Elmore defined delay through linear


network as the first moment of the network
impulse response.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RC Elmore delay

 Can be computed as sum of sections:


dE =  r(n - i)c = 0.5 rcn(n-1)
 Resistor ri must charge all downstream
capacitors.
 Delay grows as square of wire length.
 Minimizing rc product minimizes growth of
delay with increasing wire length.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RC transmission lines

 More complex analysis.


 Step response:
– V(t) @ 1 + K1 exp{-s1t/RC}.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire sizing

 Wire length is determined by layout


architecture, but we can choose wire width
to minimize delay.
 Wire width can vary with distance from
driver to adjust the resistance which drives
downstream capacitance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Optimal wiresizing

 Wire with minimum delay has an


exponential taper.
 Optimal tapering improves delay by about
8%.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Approximate tapering

Can approximate optimal tapering with a few


rectangular segments.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Tapering of wiring trees

Different branches of tree can be set to


different lengths to optimize delay.
source

sink 1

sink 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spanning tree

A spanning tree has segments that go directly


between sources and sinks.
source

sink 1

sink 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Steiner tree

A Steiner point is an intermediate point for the


creation of new branches.
source

Steiner point

sink 1

sink 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RC trees

Generalization of RC transmission line.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Buffer insertion in RC
transmission lines

 Assume RC transmission line.


 Assume R0 is driver’s resistance, C0 is
driver’s input capacitance.
 Want to divide line into k sections of length
l. Each buffer is of size h.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Buffer insertion analysis

 Assume h = 1:
– k = sqrt{(0.4 Rint Cint)/(0.7R0 C0)}
 Assume arbitrary h:
– k = sqrt{(0.4 Rint Cint)/(0.7R0 C0)}
– h = sqrt{(R0 Cint)/(Rint C0)}
– T50% = 2.5 sqrt{R0 C0 Rint Cint}

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Buffer insertion example

 10x minimum-size inverter drives metal 3


wire of 5000 l x 3 l.
– Driver: R0 = 11.1 kW, C0 = 1.2 fF
– Wire: Rint = 100 W, Cint = 135 fF.
 Then
– k = 2.4 approx 2.
– H = 35.4.
– T50% = 11 E-12 sec

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RC crosstalk

 Crosstalk slows down signals---increases


settling noise.
 Two nets in analysis:
– aggressor net causes interference;
– victim net is interfered with.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Aggressors and victims

aggressor net

victim net

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire cross-section

 Victim net is surrounded by two aggressors.


S W

aggressor T victim aggressor

substrate

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Crosstalk delay vs. wire aspect
relative RC delay ratio

increased spacing

Increasing aspect ratio

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Crosstalk delay

 There is an optimum wire width for any


given wire spacing---at bottom of U curve.
 Optimium width increases as spacing
between wires increases.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Latches and flip-flops.


 RAMs and ROMs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register

 Stores a value as controlled by clock.


 May have load signal, etc.
 In CMOS, memory is created by:
– capacitance (dynamic);
– feedback (static).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Variations in registers

 Form of required clock signal.


 How behavior of data input around clock
affects the stored value.
 When the stored value is presented to the
output.
 Whether there is ever a combinational path
from input to output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register terminology

 Latch: transparent when internal memory is


being set from input.
 Flip-flop: not transparent—reading input
and changing output are separate events.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock terminology

 Clock edge: rising or falling transition.


 Duty cycle: fraction of clock period for
which clock is active (e.g., for active-low
clock, fraction of time clock is 0).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Registerd parameters

 Setup time: time before clock during which


data input must be stable.
 Hold time: time after clock event for which
data input must remain stable.
clock

data

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Dynamic latch

Stores charge on inverter gate capacitance:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch characteristics

 Uses complementary transmission gate to


ensure that storage node is always strongly
driven.
 Latch is transparent when transmission gate
is closed.
 Storage capacitance comes primarily from
inverter gate capacitance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch operation

  = 0: transmission gate is off, inverter


output is determined by storage node.
  = 1: transmission gate is on, inverter
output follows D input.
 Setup and hold times determined by
transmission gate—must ensure that value
stored on transmission gate is solid.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stored charge leakage

 Stored charge leaks away due to reverse-


bias leakage current.
 Stored value is good for about 1 ms.
 Value must be rewritten to be valid.
 If not loaded every cycle, must ensure that
latch is loaded often enough to keep data
valid.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multiplexer dynamic latch

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Non-dynamic latches

 Must use feedback to restore value.


 Some latches are static on one phase
(pseudo-static)—load on one phase, activate
feedback on other phase.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Recirculating latch

Static on one phase:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Edge-triggered flip-flop

D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Master-slave operation

  = 0: master latch is disabled; slave latch is


enabled, but master latch output is stable, so
output does not change.
  = 1: master latch is enabled, loading value
from input; slave latch is disabled,
maintaining old output value.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


High-density memory
architecture

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Memory operation

 Address is divided into row, column.


– Row may contain full word or more than one
word.
 Selected row drives/senses bit lines in
columns.
 Amplifiers/drivers read/write bit lines.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Read-only memory (ROM)

 ROM core is organized as NOR gates—


pulldown transistors of NOR determine
programming.
 Mask-programmable ROM uses pulldowns
to determine ROM contents.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flash memory

 Flash: electrically erasable PROM that can


be programmed with standard voltages.
 Uses dual capacitor structure.
 Available in some digital processes for
integrated memory, but raises the price of
the manufacturing process.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


ROM core circuit

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


SRAM critical path

core

Sense
amp

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Row decoders

 Decode row using NORs:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Static RAM (SRAM)

 Core cell uses six-transistor circuit to store


value.
 Value is stored symmetrically—both true
and complement are stored on cross-
coupled transistors.
 SRAM retains value as long as power is
applied to circuit.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


SRAM core cell

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


SRAM core operation

 Read:
– precharge bit and bit’ high;
– set select line high from row decoder;
– one bit line will be pulled down.
 Write:
– set bit/bit’ to desired (complementary) values;
– set select line high;
– drive on bit lines will flip state if necessary.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
SRAM sense amp

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sense amp operation

 Differential pair—takes advantage of


complementarity of bit lines.
 When one bit line goes low, that arm of diff
pair reduces its current, causing
compensating increase in current in other
arm.
 Sense amp can be cross-coupled to increase
speed.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
3-transistor dynamic RAM
(DRAM)

 First form of DRAM—modern commercial


DRAMs use one-transistor cell.
 3-transistor cell can easily be made with a
digital process.
 Dynamic RAM loses value due to charge
leakage—must be refreshed.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


3-T DRAM core cell

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


1-T RAM

 1 transistor + 1 capacitor:
word

bit

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


1-T DRAM with trench capacitor

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


1-T DRAM with stacked capacitor

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Embedded DRAM

 Embedded DRAM is integrated with logic.


 DRAM and logic processes are hard to
make compatible.
– Capacitor requires high temperatures that
destroy fine-line transistors.
 Embedded DRAM is less dense than
commodity DRAM.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


3-T DRAM operation
 Value is stored on gate capacitance of t1.
 Read:
– read = 1, write = 0, read_data’ is precharged;
– t1 will pull down read_data’ if 1 is stored.
 Write:
– read = 0, write = 1, write_data = value;
– guard transistor writes value onto gate capacitance.
 Cannot support full connectivity between all data path
elements—must choose number of transfers per cycle
allowed.
 A bus circuit is a specialized multiplexer circuit.
 Two major choices: pseudo-nMOS, precharged.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Topics

 FPGA fabric architecture concepts.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Elements of an FPGA fabric

 Logic. IOB IOB IOB …


 Interconnect. LE LE LE
 I/O pins. interconnect
LE LE LE

LE LE LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Terminology

 Configuration: bits that determine logic


function + interconnect.
 CLB: combinational logic block = logic
element (LE).
 LUT: Lookup table = SRAM used for truth
table.
 I/O block (IOB): I/O pin + associated logic
and electronics.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic element

 Programmable:
– Input connections.
– Internal function.
 Coarser-grained than logic gates.
– Typically 4 inputs.
 Generally includes register.
 May provide specialized logic.
– Adder carry chain.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example logic element

 Lookup table: a b out

0 0
0 1

a 0010 0 1 0 0
memory out
b 1001 1 0 1 0

1 1 0 1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic synthesis

 How do we break the function into logic


elements?
 How do we implement an operation within
a logic element?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement

 Where do we put each piece of logic in the


array of logic elements?

LE LE LE

LE LE
… LE

LE LE LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programmable wiring

 Organized into channels.


– Many wires per channel.
 Connections between wires made at
programmable interconnection points.
 Must choose:
– Channels from source to destination.
– Wires within the channels.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programmable interconnection
point

D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programmable wiring paths

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Choosing a path

LE

LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Routing problems

 Global routing:
– Which combination of channels?
 Local routing:
– Which wire in each channel?
 Routing metrics:
– Net length.
– Delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Segmented wiring

Length 1

Length 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Offset segments

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


I/O

 Fundamental selection: input, output, three-


state?
 Additional features:
– Register.
– Voltage levels.
– Slew rate.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programming technologies

 SRAM.
– Can be programmed many times.
– Must be programmed at power-up.
 Antifuse.
– Programmed once.
 Flash.
– Similar to SRAM but using flash memory.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Configuration

 Must set control bits for:


– LE.
– Interconnect.
– I/O blocks.
 Usually configured off-line.
– Separate burn-in step (antifuse).
– At power-up (SRAM).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Configuration vs. programming

 FPGA configuration:  CPU programming:


– Bits stay at the device – Instructions are fetched
they program. from a memory.
– A configuration bit – Instructions select
controls a switch or a complex operations.
logic bit.

add r1, r2 addIR


r1, r2
memory CPU

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Reconfiguration

 Some FPGAs are designed for fast


configuration.
– A few clock cycles, not thousands of clock
cycles.
 Allows hardware to be changed on-the-fly.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA fabric architecture
questions

 Given limited area budget:


– How many logic elements?
– How much interconnect?
– How many I/O blocks?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic element questions

 How many inputs?


 How many functions?
– All functions of n inputs or eliminate some
combinations?
– What inputs go to what pieces of the function?
 Any specialized logic?
– Adder, etc.
 What register features?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Interconnect questions

 How many wires in each channel?


 Uniform distribution of wiring?
 How should wires be segmented?
 How rich is interconnect between channels?
 How long is the average wire?
 How much buffering do we add to wires?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


I/O block questions

 How many pins?


– Maximum number of pins determined by
package type.
 Are pins programmed individually or in
groups?
 Can all pins perform all functions?
 How many logic families do we support?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 SRAM-based FPGA fabrics:


– Xilinx.
– Altera.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


SRAM-based FPGAs

 Program logic functions, interconnect using


SRAM.
 Advantages:
– Re-programmable;
– dynamically reconfigurable;
– uses standard processes.
 Disadvantages:
– SRAM burns power.
– Possible
FPGA-Based System Design: Chapter 1to steal, disrupt configuration bits.
Copyright  2004 Prentice Hall PTR
Logic elements

 Logic element includes combinational


function + register(s).
 Use SRAM as lookup table for
combinational function.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


LUT-based logic element
n
inputs

Lookup 1
table out
configuration
bits

Can multiplex at output or address at input


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Example
111

1, 1, 1, 0,
0, 1, 1, 0,
1, 0,
1, 10 0 1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Evaluation of SRAM-based LUT

 N-input LUT can handle function of 2n inputs.


 All logic functions take the same amount of space.
 All functions have the same delay.
 SRAM is larger than static gate equivalent of
function.
 Burns power at idle.
 Want to selectively add register to LE:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Registers in logic elements

 Register may be selected into the circuit:

Configuration bit

LUT LE out
D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Other LE features

 Multiple logic functions in an LE.


 Addition logic:
– carry chain.
 Partitioned lookup tables.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Xilinx Spartan-II CLB

 Each CLB has two identical slices.


 Slice has two logic cells:
– LUT.
– Carry logic.
– Registers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Spartan-II CLB details

 Each lookup table can be used as a 16-bit


synchronous RAM or 16-bit shift register.
 Arithmetic logic includes an XOR gate.
 Each slice includes a mux to ocmbine the results
of the two functino generators in the slice.
 Register can be configured as DFF or latch.
 Has three-state drivers (BUFTs) for on-chip
busses.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II CLB operation

 Arithmetic:
– Carry block includes XOR gate.
– Use LUT for carry, XOR for sum.
 Each slice uses F5 mux to combine results of
multiplexers.
 F6 mux combines outputs of F5 muxes.
 Registers can be FF/latch; clock and clock enable.
 Includes three-state output for on-chip bus.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Altera APEX II logic element

 Each logic array block has 10 logic


elements.
 Logic elements share some logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Apex II LE modes

 Modes of operation:
– Normal.
– Arithmetic.
– Counter.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX-II LE normal mode

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX-II LE arithmetic mode

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX-II LE counter mode

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX-II LE control logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programmable interconnect

 MOS switch controlled by configuration bit:

D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Programmable vs. fixed
interconnect

 Switch adds delay.


 Transistor off-state is worse in advanced
technologies.
 FPGA interconnect has extra length = added
capacitance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Interconnect strategies

 Some wires will not be utilized.


 Congestion will not be same throughout
chip.
 Types of wires:
– Short wires: local LE connections.
– Global wires: long-distance, buffered
communication.
– Special wires: clocks, etc.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Paths in interconnect

 Connection may be long, complex:

LE LE LE LE LE

Wiring channel
Wiring channel
LE LE LE LE LE

LE LE LE LE LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Interconnect architecture

 Connections from wiring channels to LEs.


 Connections between wires in the wiring
channels. Wiring channel

LE LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Interconnect richness

 Within a channel:
– How many wires.
– Length of segments.
– Connections from LE to channel.
 Between channels:
– Number of connections between channels.
– Channel structure.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Segmented wiring

Length 1

Length 2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Offset segments

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Switchbox

channel
channel channel
channel

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II interconnect

 Types of interconnect:
– local;
– general-purpose;
– dedicated;
– I/O pin.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II general-purpose
network

 Provides majority of routing resources:


– General routing matrix (GRM) connects
horizontal/vertical channels and CLBs.
– Interconnect between adjacent GRMs.
– Hex lines connect GRM to GRMs six blocks
away.
– 12 longlines span the chip.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II routing

 Relationship
between
GRM, hex
lines, and
local
interconnect:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II three-state bus

 Horizontal on-chip busses:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II clock distribution

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX II interconnect

row
column

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II I/O

 Supports multiple I/O standards:


– LVTTL, PCI, LVCMOS2, AGP2X, etc.
 Provides registers.
 Programmable delay for pin-dependent hold
time.
 Programmable weak keeper circuit.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II I/O block diagram

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Configuration

 Need to set all configuration SRAM bits:


– minimum pin cost;
– reasonable speed.
 Configuration can also be read back for
testing.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Configuration ROM

 Configured on start-up from ROM:

Configuration
memory
FPGA

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spartan-II configuration

 Configuration length depends on size of


chip:
– 200,000 to 1.3 million bits.
 Configuration modes:
– Master serial for first chip in chain.
– Slave serial for follow-on chips.
– Slave parallel.
– Boundary-scan.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Scan chain

 Scan chain: shift register used to access


internal state.
 Logic-sensitive scan design (LSSD): scan
structure that uses some hardware for
normal mode and scan.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


JTAG boundary scan

 JTAG: Joint Test Action Group.


 Boundary scan:
– provide scan chain at pins;
– allow control of chip interior;
– decouple chip from rest of board for test.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Chip-on-board testing

 Boundary scan decouples chips:

board

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Boundary scan concepts

 TAP: test access port.


– Requires three pins not shared with other logic.
– Test reset, test clock, test mode select, test data
in, test data out.
 TAP controller recognizes pins, controls
boundary scan registers.
 Instruction register defines boundary scan
mode.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Topics

 Antifuse-based FPGA fabrics:


– Actel.
 Flash-based FPGAs

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Antifuses

 Permanently programmed.
 Make a connection with electrical signal.
– More reliable than breaking a connection.
– Avoids shrapnel.
 Resistance of about 100 W.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Antifuse structure

Metal 2
antifuse

via
Metal 1

substrate

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flash-programmed FPGA

 Flash is electrically-erasable EPROM.


 Allows reprogramming without boot-up
procedure.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flash-programmed switch

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic blocks

 Program by making connections.


 Based on multiplexing.

d0 a out
out 0 d0
d1 1 d1

a Truth table

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Larger logic block

10 10 0 0
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Actel 54SX logic element

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel 54SX adder logic

 Uses two C-cells


in SuperCluster.
 Adds bits A0
and A1.
 Carry in FCI,
carry out FCO.
 Active when
CFN is high.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel 54SX R cell

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel 54SX LE

 C/R cells organized into clusters.


– Type 1 cluster: CRC.
– Type 2 cluster: CRR.
 Clusters grouped into superclusters.
– Type 1: two type 1 clusters.
– Type 2: one type 1, one type 2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel ProASIC 500K logic gate
 Uses switches to connect inputs, feedback, etc.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel 54SX interconnect

 FastConnect provides horizontal


connections between logic modules.
– Within a supercluster.
– To supercluster below.
 DirectConnect is within a supercluster:
– connects C-cell to R-cell neighbor.
 Generic global wiring in segmented
channels.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
I/O pins

 Need programmable pins:


– Input or output.
– Three-state.
 Other features:
– Registers.
– Slew rate.
– Voltage levels.
– Double-data rate (DDR) support.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Actel APEX II I/O

 Supports SDRAM and double-data rate


(DDR) memory.
 Six registers and latch.
 Bidirectional buffers.
 Two inputs and two outputs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


APEX II I/O

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Antifuse programming

 Need to be able to apply programming


voltage to every antifuse.
– Path from VDD to GND.
 Programming can be performed slowly.
– Don’t need a lot of parallelism.
 Use the wiring network to gain access to the
antifuses.
– Access transistors control path to antifuse.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Antifuse programming access
transistors

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Circuit design for FPGAs:


– Logic elements.
– Interconnect.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multiplexers as logic elements
1
Q

A
CLR

1
0

0
D (AB)’
A^B
latch
A
CLR

0
1
B
0
0
CLK

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Using antifuses

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Static CMOS gate vs. LUT

 Number of transistors:
– NAND/NOR gate has 2n transistors.
– 4-input LUT has 128 transistors in SRAM, 96 in
multiplexer.
 Delay:
– 4-input NAND gate has 9t delay.
– SRAM decoding has 21t delay.
 Power:
– Static gate’s power depends on activity.
– SRAM
FPGA-Based System always
Design: Chapter 1 burns power. Copyright  2004 Prentice Hall PTR
Lookup table circuitry

 Demultiplexer or multiplexer?
adrs

adrs
LUT LUT

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Traditional RAM/ROM

 Cell drives long bit line:

Bit line
adrs

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Lookup memory

 Multiplexer presents smaller load to


memory cells.
– Allows smaller memory cells.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multiplexer styles

static gates
pass transistors
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Multiplexer design

 Pass transistor multiplexer uses fewer


transistors than fully complementary gates.
 Pass transistor is somewhat faster than
complementary switch:
– Equal-strength p-type is 2.5X n-type width.
– Total resistance is 0.5X, total capacitance is
3.5X.
– RC delay is 0.5 x 3.5 = 1.75 times n-type
switch.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Static gate four-input mux
 Delay through n-
input NAND is
(n+2)/3.
 Lg b + 1 inputs at
first level, so delay
is (lg b + 3)/3.
 Delay at second
level is (b+2)/3.
 Delay grows as b
lg b.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pass-transistor-based four-input mux

 Must include decode


logic in total delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Tree-based four-input mux

 Delay proportional to
square of path length.
 Delay grows as lg b2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


LE output drivers

 Must drive load:


– Wire;
– Destination LE.
 Different types of wiring present different
loads.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Avoiding programming hazards

 Want to disable connections to routing


channel before programming.
From LE

config
progb
Routing channel

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Interconnect circuits

 Why so many types of


interconnect?
– Provide a choice of
delay alternatives.
 Sources of delay:
– Wires.
– Programming points.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Styles of programmable
interconnection point

pass transistor Three-state

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pass transistor programmable
interconnect point

 Small area.
 Resistive switch.
 Delay grows as the
square of the number
of switches.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Three-state programmable
interconnection point

 Larger area.
 Regenerative driver.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Switch area * wire delay vs. buffer
size (Betz & Rose)

© 1999 IEEE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Switch area * wire delay vs. pass
transistor width (Betz & Rose)

© 1999 IEEE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire delay vs. switch sizes (Chandra
and Schmidt)

 Delay vs. switch


size for various
driver sizes.
 U-shaped curve:
– Resistance
initially
decreases.
– Increased
capacitance
eventually
dominates.
© 2002 IEEE
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Clock drivers

 Clock driver tree:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock nets

 Must drive all LEs.


 Design parameters:
– number of fanouts;
– load per fanout;
– wiring tree capacitance.
 Determine optimal buffer sizes.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


H tree

 Regular layout
structure.
– Recursive.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 The logic design process.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational logic networks

 Functionality. Primary Primary


inputs outputs
 Other
requirements:
Combinational
– Size. logic
– Power.
– Performance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Non-functional requirements

 Performance:
– Clock speed is generally a primary requirement.
 Size:
– Determines manufacturing cost.
 Power/energy:
– Energy related to battery life, power related to heat.
– Many digital systems are power- or energy-limited.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Mapping into an FPGA

 Must choose the FPGA:


– Capacity.
– Pinout/package type.
– Maximum speed.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hardware description languages
 Structural description:
– A connection of components.
 Functional description:
– A set of Boolean formulas, state
transitions, etc. A
 Simulation description:
– A program designed for
simulation.
NAND
 Major languages:
– Verilog. x
– VHDL.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic optimization

 Must transform Boolean expressions into a form


that can be implemented.
– Use available primitives (gates).
– Meet delay, size, energy/power requirements.
 Logic gates implement expressions.
– Must rewrite logic to use the expressions provided by
the logic gates.
 Maintain functionality while meeting non-
functional requirements.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Macros

 Larger modules designed to fit into a


particular FPGA.
– Hard macro includes placement.
– Soft macro does not include placement.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Physical design

 Placement:
– Place logic components into FPGA fabric.
 Routing:
– Choose connection paths through the fabric.
 Configuration generation:
– Generate bits required to configure FPGA.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example: parity

 Simple parity function:


– P = a0 XOR a1 XOR a2 XOR a3.
 Implement with Xilinx ISE.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Xilinx ISE main screen

Sources in project
Source window

Processes for source

Output

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


New project

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


New project info

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Create HDL file

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


I/O description

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


I/O info

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Empty Verilog description

module parity(a,p);
input [31:0] a;
output p;

endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog with functional code

module parity(a,p);
input [31:0] a;
output p;

assign p = ^a;

endmodule
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
RTL schematic: top-level

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RTL model: implementation

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example: simulation

 Apply stimulus/test vectors.


 Look at response/output vectors.
 Can’t exhaustively simulate but we can
exercise the module.
 Simulation before synthesis is faster and
easier than simulating the mapped design.
– Sometimes want to simulate the mapped
design.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testbench

Stimulus

Unit
Under
Test
(UUT)

Response
testbench

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Automatically-created testbench
module parity_testbench_v_tf();
// DATE: 11:48:13 11/07/2003
// MODULE: parity
// DESIGN: parity
// FILENAME: testbench.v
// PROJECT: parity
// VERSION:
// Inputs
reg [31:0] a;
// Outputs
wire p;
// Bidirs
// Instantiate the UUT
parity uut (
.a(a),
.p(p)
);
// Initialize Inputs
‘ifdef auto_init
initial begin
a = 0;
end
‘endif
endmodule
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Test vector application code
initial begin
$monitor("a = %b, parity=%b\n",a,p);
#10 a = 0;
#10 a = 1;
#10 a = 2’b10;
#10 a = 2’b11;
#10 a = 3’b100;
#10 a = 3’b101;
#10 a = 3’b110;
#10 a = 3’b111;

#10 a = 1024;
#10 a = 1025;
#10 a = 16’b1010101010101010;
#10 a = 17’b11010101010101010;
#10 a = 17’b10010101010101010;
#10 a = 32’b10101010101010101010101010101010;
#10 a = 32’b11101010101010101010101010101010;
#10 a = 32’b10101010101010101010101010101011;
$finish;
end

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Project summary

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Modeling with hardware description


languages (HDLs).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hardware description languages

 Textual languages for describing hardware:


– structure;
– function.
 Most people today use textual languages
rather than schematics for most digital
design.
– Schematics make poor use of screen space.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Major HDLs

 Two major HDLs designed for simulation:


– VHDL;
– Verilog.
– Similar capabilities but somewhat different
language philosophies.
 EDIF is a standard netlist format.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Simulation vs. programming

 Simulation tags computations with times.


– Must know when signals change to properly
simulate hardware.
 Simulation is parallel.
– Many statements can execute at the same
(simulation) time.
– Just like hardware.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Types of simulation

 Compiled code simulation.


– Generate program that evaluates a hardware
block.
– Operational details within the hardware block
are lost.
 Event-driven simulation.
– Propagate events through simulation.
– Don’t simulate a block until its inputs change.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Event-driven simulation

 An event is a change
in a net’s value.

net1
 An event has two
components:
– value; t=35 ns time
– time. net

net1=0 @ 35 ns

event
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Events on a gate

 Propagate events only


when nets change
value.
0 1
 If an input change 1 no0
doesn’t cause an 1 0 event
output change, no
event is propagated.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timewheel

 The timewheel is a data structure in the


simulator that efficiently determines the
order of events processed.
 Events are placed on the timewheel in time
order.
 Events are taken out of the head of the
timewheel to process them in order.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timewheel operation

c=0 @ 2 ns
a
1 c
1 0 b=1 @ 1 ns time
0 1
b
a=1 @ 0 ns

netlist timewheel

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Order of evaluation

 Order of evaluation is important.


– Causality must be obeyed.
 Evaluating events in the wrong order can
cause inaccurate results.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Order of evaluation example

a e=0 @ 4 ns
0 c 1

1 0
0 1 e d=1 @ 2 ns time
b
0 1 b=1 @ 1 ns
d

netlist timewheel

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Compiled simulation

 A block of code is generated to simulate a


block of hardware.
– Can use compiler to optimize the code.
 Code ignores much temporal behavior
within the block.
– Must still evaluate events in the right order.
– Must generate times at interface to event-driven
model.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Modeling

 Structural modeling describes the


connections between components.
– Netlists are structural models.
 Behavioral models describes the functional
relationship between inputs and outputs.
– Similar to programming but values are events.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


HDLs language constructs

 Must be able to define component types.


– A model may be behavioral or structural.
 May be able to define abstract data types.
– A wire may carry an enumerated value.
– Multi-valued simulation may be defined using
abstract data types.
 May be able to define modules to control
the scope of names.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Testbenches

 A testbench is a model used to exercise a


simulation.
– Provides stimulus.
– Checks outputs.
 Testbenches help automate design
verification.
– Rerun edited module against testbench.
– Run models at behavioral, RTL levels against
the same testbench.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Synthesis subsets

 VHDL and Verilog were designed for


simulation.
 A synthesis subset is:
– synthesizable;
– produces consistent simulation results.
 Different tools may use different synthesis
subsets.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register-transfer synthesis

 Most common type of synthesis.


 Synthesizes gates from abstract RT model.
– Registers are explicit.
– Some tools will infer storage elements---be
careful.
 Optimized for performance, area, power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 HDL coding for synthesis.


– Verilog.
– VHDL.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Synthesis vs. simulation
semantics
 Simulation:
– Events are interpreted during simulation.
 Synthesis:
– Logic/memory is extracted from the
description.
CL

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic synthesis

 Synthesis = translation + optimization.


– Translated from HDL or direct Boolean
network.
– Ideally, translation includes don’t-cares.
– Optimization rewrites to satisfy objective
functions: area, speed, power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Syntax-directed translation

x = a and b;
a
x
b

if (a or b)
begin
x = c;
end; a
b x

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog simulation and synthesis

 Signal assignments must use the assign


keyword:
– assign sig3 = sig1 & sig2;

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog structural descriptions
 Build a structure by wiring together components:
input [7:0] a, b;
input carryin;
output [7:0] sum;
output carryout;
wire [7:1] carry;

fulladd a0(a[0],b[0],carryin,sum[0],carry[1]);
fulladd a1(a[1],b[1],carry[1],sum[1],carry[2]);
fulladd a2(a[2],b[2],carry[2],sum[2],carry[3]);

Type name Instance name

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


VHDL for Synopsys synthesis

 Each process should start with an activation


list:
process foo (a,b,in1,in2)
 At least two processes:
– combinational;
– sequential.
 Sequential process includes
wait until clock…
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Initializing variables

All variables used must be initialized.


Uninitialized variables cause latches to be
introduced: BAD.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


State machines

Use case(x/z) statement to decode current


state:
initial begin: Init s0 = B”000”; end
case (curr)
2’b00:
if (in1 = ‘0’) begin o1 = a or b;
end;
2’b01: ...

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Process structure

 How many combinational processes?


– separate datapath;
– single process for data and control.
 Comparison:
– single process is simpler;
– separate datapath uses less logic.
ctrl
combin

combin seq vs. seq


dp
combin
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Multiplexing a datapath element

case (muxctrl)
1’b0: muxout = a;
1’b1: muxout = b;
end;
foo = muxout or c;

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Arithmetic

 Can generate logic by hand.


 Operators (+,-,<,>,*,+1,-1,etc.) can be
mapped onto library elements.
– May be limited to standard widths.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


General synthesis hints

 Check out all warnings carefully.


 An early synthesis run keeps you from
debugging a simulation that won’t
synthesize.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


The synthesis process

 Synthesis is driven by a script:


compile -map_effort med
report_fpga > TOP + “.fpga”
 Script may be customized for the design.
– Verilog file foo.v, script file foo.script.
– Typically start with a standard script.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing constraints

 Clock period.
– Duty cycle, etc.
 Group path timing.
– Cells or ports that share the same timing
behavior.
 Input/output delay.
– End-to-end delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hierarchical design and logic
optimization

 Boolean network model does not reach


across component boundaries.
 Tools generally won’t automatically flatten
logic.
– Size may blow up.
 You may direct the tool to flatten a group of
components.
– Heuristic flattening algorithms may be used.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Instantiating memory

 Use a memory model:


– primitive memories based on LUT;
– larger memories synthesized from multiple
logic elements.
 Synthesis can’t handle a memory described
behaviorally.
– Can handle behavioral ROM.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


I/O configuraiton

 Synthesis can automatically determine the


types of many I/O blocks, configure
appropriately.
 Some things that need to be specified:
– indirect three-state activity;
– I/O pin location;
– registered bidirectional I/O.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing model

 Synthesis system reads a wire load model


from a technology library.
– Model depends on part, speed grade.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Attribute passing

 FPGA Compiler allows attributes to be


passed to EDIF:
– BUFG X(.I(a),.O(b)); // synopsys attribute LOC
BR

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Results and reports

 Save design as:


– database;
– EDIF.
 Types of reports:
– Default synthesis report.
– Configuration report.
» Describes LEs, IOBs, etc.
– Timing report.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Fun with CAD tools

 Array renaming between tools:


– <0>
– [0]
– _0_

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Combinational network delay.


 Combinational network energy/power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay characteristics

 Measured from change in inputs to change


in ouputs.
 Data-dependent:
– Some inputs give longer delays than others.
 May exercise different paths through the
network.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing diagram

tc >= tx + ty

A
0
1
S
B C
X

time

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sources of delay

 Gate delay:
– intrinsic;
– drive;
– load.
 Wire:
– lumped load;
– transmission line.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Basic gate delay model

 Gate delay tg.


 Wire delay tw.

LE PIP LE

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Optimizing a single link

 Custom design---improve gate delay:


– Transistor sizing.
– Gate topology.
 FPGA or custom design---improve wire
delay:
– Shorten wire length.
– Choose wire category.
– Increase driver size.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Fanout

 Fanout adds capacitance.

sink

source sink

sink

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Driving fanout

 Adding gates adds capacitance:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Ways to drive large fanout

 Increase sizes of driver transistors. Must


take into account rules for driving large
loads.
 Add intermediate buffers. This may
require/allow restructuring of the logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Buffers

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wire capacitance

 Use layers with lower capacitance.


 Redesign layout to reduce length of wires
with excessive delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Path delay

 Combinational network delay is measured


over paths through network.
 Can trace a causality chain from inputs to
worst-case output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Path delay example

network

graph model

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Critical path

 Critical path = path which creates longest


delay.
 Can trace transistions which cause delays
that are elements of the critical delay path.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay model

 Nodes represent gates.


 Assign delays to edges—signal may have
different delay to different sinks.
 Lump gate and wire delay into a single
value.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Critical path through delay graph

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Reducing critical path length

 To reduce circuit delay, must speed up the


critical path—reducing delay off the path
doesn’t help.
 There may be more than one path of the
same delay. Must speed up all equivalent
paths to speed up circuit.
 Must speed up cutset through critical path.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


False paths

 Logic gates are not simple nodes—some


input changes don’t cause output changes.
 A false path is a path which cannot be
exercised due to Boolean gate conditions.
 False paths cause pessimistic delay
estimates.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


False path example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Another false path example

d = 10 d = 10

d = 20 d = 20
False path

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement and delay

 Placement helps determine routing.


 Routing determines wire length.
 Wire length determines capacitive load.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement and wire capacitance

g1 g3
dvr
g2 g4

unbalanced load
g1 g3
dvr
g2 g4

FPGA-Based System Design: Chapter 1 more balanced Copyright  2004 Prentice Hall PTR
Optimizing network delay

 Identify the longest path.


 Improve delay along the longest path:
– Driver delay.
– Wire delay.
– Logic restructuring.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example: adder placement and delay

 N-bit adder:

+ + + +

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bad placement and routing

placement routing
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Better placement and routing

placement routing

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic rewrites

shallow
deep logic logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic transformations

 Can rewrite by using subexpressions.


– Simplifications affect the cost of rewrites.
 Flattening logic increases gate fanin.
 Logic rewrites may affect gate placement.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power optimization

 Transitions cause power consumption.


 Logic network design helps control power
consumption:
– minimizing capacitance;
– eliminating unnecessary glitches.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Glitching example

 Gate network:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Glitching example behavior

 NOR gate produces 0 output at beginning


and end:
– beginning: bottom input is 1;
– end: NAND output is 1;
 Difference in delay between application of
primary inputs and generation of new
NAND output causes glitch.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Adder chain glitching

bad
good

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Explanation

 Unbalanced chain has signals arriving at


different times at each adder.
 A glitch downstream propagates all the way
upstream.
 Balanced tree introduces multiple glitches
simultaneously, reducing total glitch
activity.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power estimation tools

 Power estimator approximates power


consumption from:
– gate network;
– primary input transition probabilities;
– capacitive loading.
 May be switch/logic simulation based or
use statistical models.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Factorization for low power

 Proper factorization reduces glitching.

bad good
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Factorization techniques

 In example, a has high transition


probability, b and c low probabilities.
 Reduce number of logic levels through
which high-probability signals must travel
in order to reduce propagation of glitches.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Layout for low power

 Place and route to minimize capacitance of


nodes with high glitching activity.
 Feed back wiring capacitance values to
power analysis for better estimates.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Number representation.
 Shifters.
 Adders and ALUs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Signed number representations

 One’s complement:
– 3 = 0101
– -3 = ~(0101) = 1010
– Two zeroes: 0000, 1111
 Two’s complement:
– 3 = 0101
– -3 = ~(0101) +1 = 1011
– One zero: 0000

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Representations and arithmetic

 N =  2i bi
 Test for zero: all bits are 0.
 Test for negative: sign bit is 1.
 Subtraction: negate then add.
– a – b = a + (-b) = a + (~b+1)

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational shifters

 Useful for arithmetic operations, bit field


extraction, etc.
 Latch-based shift register can shift only one
bit per clock cycle.
 A multiple-shift shifter requires additional
connectivity.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Barrel shifter

 Can perform n-bit shifts in a single cycle.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Barrel shifter structure

Accepts 2n data inputs and n control signals,


producing n data outputs.
data 1

n bits

output
n bits
data 2

n bits

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Barrel shifter operation

 Selects arbitrary contiguous n bits out of 2n


input buts.
 Examples:
– right shift: data into top, 0 into bottom;
– left shift: 0 into top, data into bottom;
– rotate: data into top and bottom.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for barrel shifter
module shifter(data,b,result);
parameter Nminus1 = 31; /* 32-bit shifter */
input [Nminus1:0] data; /* compute parity of
these bits */
input [3:0] b; /* amount to shift */
output [Nminus1:0] result; /* shift result */

assign result = data << b;


endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Adders

 Adder delay is dominated by carry chain.


 Carry chain analysis must consider
transistor, wiring delay.
 Modern VLSI favors adder designs which
have compact carry chains.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Full adder

 Computes one-bit sum, carry:


– si = ai XOR bi XOR ci
– ci+1 = aibi + aici + bici
 Half adder computes two-bit sum.
 Ripple-carry adder: n-bit adder built from
full adders.
 Delay of ripple-carry adder goes through all
carry bits.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for full adder

module fulladd(a,b,carryin,sum,carryout);
input a, b, carryin; /* add these bits*/
output sum, carryout; /* results */

assign {carryout, sum} = a + b + carryin;


/* compute the sum and carry */
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for ripple-carry adder
module nbitfulladd(a,b,carryin,sum,carryout)
input [7:0] a, b; /* add these bits */
input carryin; /* carry in*/
output [7:0] sum; /* result */
output carryout;
wire [7:1] carry; /* transfers the carry between bits */

fulladd a0(a[0],b[0],carryin,sum[0],carry[1]);
fulladd a1(a[1],b[1],carry[1],sum[1],carry[2]);

fulladd a7(a[7],b[7],carry[7],sum[7],carryout]);
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-lookahead adder

 First compute carry propagate, generate:


– Pi = ai + bi
– Gi = ai bi
 Compute sum and carry from P and G:
– si = ci XOR Pi XOR Gi
– ci+1 = Gi + Pici

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-lookahead expansion

 Can recursively expand carry formula:


– ci+1 = Gi + Pi(Gi-1 + Pi-1ci-1)
– ci+1 = Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-1ci-2)
 Expanded formula does not depend on
intermerdiate carries.
 Allows carry for each bit to be computed
independently.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Depth-4 carry-lookahead

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Analysis

 Deepest carry expansion requires gates with


large fanin: large, slow.
 Carry-lookahead unit requires complex
wiring between adders and lookahead
unit—values must be routed back from
lookahead unit to adder.
 Layout is even more complex with multiple
levels of lookahead.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Verilog for carry-lookahead carry
block
module carry_block(a,b,carryin,carry);
input [3:0] a, b; /* add these bits*/
input carryin; /* carry into the block */
output [3:0] carry; /* carries for each bit in the block */
wire [3:0] g, p; /* generate and propagate */

assign g[0] = a[0] & b[0]; /* generate 0 */


assign p[0] = a[0] ^ b[0]; /* propagate 0 */
assign g[1] = a[1] & b[1]; /* generate 1 */
assign p[1] = a[1] ^ b[1]; /* propagate 1 */

assign carry[0] = g[0] | (p[0] & carryin);
assign carry[1] = g[1] | p[1] & (g[0] | (p[0] & carryin));
assign carry[2] = g[2] | p[2] &
(g[1] | p[1] & (g[0] | (p[0] & carryin)));
assign carry[3] = g[3] | p[3] &
(g[2] | p[2] & (g[1] | p[1] & (g[0] | (p[0] & carryin))));

 endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for carry-lookahead sum
unit

module sum(a,b,carryin,result);
input a, b, carryin; /* add these bits*/
output result; /* sum */

assign result = a ^ b ^ carryin;


/* compute the sum */
endmodule
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Verilog for carry-lookahead adder
 module carry_lookahead_adder(a,b,carryin,sum,carryout);
input [15:0] a, b; /* add these together */
input carryin;
output [15:0] sum; /* result */
output carryout;
wire [16:1] carry; /* intermediate carries */

assign carryout = carry[16]; /* for simplicity */


/* build the carry-lookahead units */
carry_block b0(a[3:0],b[3:0],carryin,carry[4:1]);
carry_block b1(a[7:4],b[7:4],carry[4],carry[8:5]);
carry_block b2(a[11:8],b[11:8],carry[8],carry[12:9]);
carry_block b3(a[15:12],b[15:12],carry[12],carry[16:13]);
/* build the sum */
sum a0(a[0],b[0],carryin,sum[0]);
sum a1(a[1],b[1],carry[1],sum[1]);

sum a15(a[15],b[15],carry[15],sum[15]);
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-skip adder

 Looks for cases in which carry out of a set


of bits is identical to carry in.
 Typically organized into b-bit stages.
 Can bypass carry through all stages in a
group when all propagates are true: Pi Pi+1
… Pi+b-1.
– Carry out of group when carry out of last bit in
group or carry is bypassed.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Two-bit carry-skip structure

ci
Pi
Pi+1 AND

Pi+b-1
OR
Ci+b-1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-skip structure

b adder stages b adder stages b adder stages

Carry out P[2b,3b-1] Carry out P[b,2b-1]Carry out P[0,b-1]


skip skip skip

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Worst-case carry-skip

 Worst-case carry-propagation path goes


through first, last stages:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for carry-skip add with P
module fulladd_p(a,b,carryin,sum,carryout,p);
input a, b, carryin; /* add these bits*/
output sum, carryout, p; /* results including
propagate */

assign {carryout, sum} = a + b + carryin;


/* compute the sum and carry */
assign p = a | b;
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for carry-skip adder
module carryskip(a,b,carryin,sum,carryout);
input [7:0] a, b; /* add these bits */
input carryin; /* carry in*/
output [7:0] sum; /* result */
output carryout;
wire [8:1] carry; /* transfers the carry between bits */
wire [7:0] p; /* propagate for each bit */
wire cs4; /* final carry for first group */

fulladd_p a0(a[0],b[0],carryin,sum[0],carry[1],p[0]);
fulladd_p a1(a[1],b[1],carry[1],sum[1],carry[2],p[1]);
fulladd_p a2(a[2],b[2],carry[2],sum[2],carry[3],p[2]);
fulladd_p a3(a[3],b[3],carry[3],sum[3],carry[4],p[3]);
assign cs4 = carry[4] | (p[0] & p[1] & p[2] & p[3] & carryin);
fulladd_p a4(a[4],b[4],cs4, sum[4],carry[5],p[4]);

assign carryout = carry[8] | (p[4] & p[5] & p[6] & p[7] & cs4);
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay analysis

 Assume that skip delay = 1 bit carry delay.


 Delay of k-bit adder with block size b:
– T = (b-1) + 0.5 + (k/b –2) + (b-1)
block 0 OR gate skips last block
 For equal sized blocks, optimal block size is
sqrt(k/2).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-select adder

 Computes two results in parallel, each for


different carry input assumptions.
 Uses actual carry in to select correct result.
 Reduces delay to multiplexer.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-select structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Carry-save adder

 Useful in multiplication.
 Input: 3 n-bit operands.
 Output: n-bit partial sum, n-bit carry.
– Use carry propagate adder for final sum.
 Operations:
– s = (x + y + z) mod 2.
– c = [(x + y + z) –2] / 2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA delay model

 Xing/Yu---ripple-carry adder:
– n-stage adder divided into x blocks;
– each block has n/x stages;
– block k, 1<= k <= x.
# stages in block k
 Delays: constant
– ripple-carry R(yk) = l1 + dyk Delay of a single stage
– carry-generate G(yk) = l2 + d(yk-1)
– carry-terminate T(yk) = G(yk)
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Carry-skip delay model

 Consider only inter-CLB delay.


 Delay dominated by interconnect:
– S(yk) = l3 + bl2
 Wire length l is proportional to the number
of carry-skip layers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Adder comparison

 Ripple-carry adder has highest


performance/cost.
 Optimized adders are most effective in very
long bit widths (> 48 bits).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


350 120 400

350
300 100

300
250

Performance-Cost Ratio
Operational Time (ns) 80
250
Ripple
Cost (CLBs)

200 Complete
60 200 CLA
Skip
150
150 RC-select
40
100
100

20
50 50

0 0
0
32

56

80
8
32

56

80
8
40

72
8

Bits Copyright  2004 Prentice Hall PTR


FPGA-Based System Design: Chapter 1
Bits Bits © 1998 IEEE
Serial adder

 May be used in signal-processing arithmetic


where fast computation is important but
latency is unimportant.
 Data format (LSB first):

0 1 1 0

LSB

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Serial adder structure

LSB control signal clears the carry shift


register:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


ALUs

 ALU computes a variety of logical and


arithmetic functions based on opcode.
 May offer complete set of functions of two
variables or a subset.
 ALU built around adder, since carry chain
determines delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


ALU as multiplexer

 Compute functions then select desired one:

opcode
AND
OR
NOT
SUM

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for ALU
‘define PLUS 0
‘define MINUS 1
‘define AND 2
‘define OR 3
‘define NOT 4

module alu(fcode,op0,op1,result,oflo);
parameter n=16, flen=3; input [flen-1:0] fcode; [n-1:0] op0, op1; output [n-1:0] result; output
oflo;

assign
{oflo,result} =
(fcode == ‘PLUS) ? (op0 + op1) :
(fcode == ‘MINUS) ? (op0 - op1) :
(fcode == ‘AND) ? (op0 & op1) :
(fcode == ‘OR) ? (op0 | op1) :
(fcode == ‘NOT) ? (~op0) : 0;
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Switch networks.
 Combinational testing.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Boolean functions and switches

pseudo-AND
pseudo-OR

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Driving switch outputs

 If switch network output is not connected to


power supply through switch path, output
will float.
 Switch network inputs may be connected to
power supply or logic signals.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Switching logic signals

b’

a
b ab’ + a’b

a’

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Switch multiplexer

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Charge sharing

 Interior nodes in a switch network may not


be driven.
 Charge can accumulate on small parasitic
capacitances.
 Shared charge can produce erroneous output
values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Charge division

 At undriven nodes, charge is divided


according to capacitance ratio.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Charge sharing example

 Long chains of switches have intermediate


nodes which may be disconnected from
power supplies.

Cia Cab Cbc

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Charge over time

time i Cia a Cib b Cbc c C


0 1 1 1 1 1 1 1 1
1 0 0 1 0 0 1 0 1
2 0 0 0 1/2 1 1/2 0 1
3 0 0 0 1/2 0 3/4 1 3/4
4 0 0 0 0 0 3/4 0 3/4
5 0 0 0 3/8 1 3/8 0 3/4

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Avoiding charge sharing

 Make sure that for every input combination


there is a path from the power supply to the
output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Manufacturing testing

 Errors are introduced during manufacturing.


 Testing verifies that chip corresponds to
design.
 Varieties of testing:
– functional testing;
– performance testing (binning chips by speed).
 Testing also weeds out infant mortality.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing and faults

 Fault model:
– possible locations of faults;
– I/O behavior produced by the fault.
 Good news: if we have a fault model, we
can test the network for every possible
instantiation of that type of fault.
 Bad news: it is difficult to enumerate all
types of manufacturing faults.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Stuck-at-0/1 faults

 Stuck-at-0/1: logic gate output is always


stuck at 0 or 1, independent of input values.
 Correspondence to manufacturing defects
depends on logic family.
 Experiments show that 100% stuck-at-0/1
fault coverage corresponds to high overall
fault coverage.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing procedure

 Testing procedure:
– set gate inputs;
– observe gate output;
– compare fault-free and observed gate output.
 Test vector: set of gate inputs applied to a
system.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stuck-at faults in gates

a b OK SA0 SA1 a b OK SA0 SA1


0 0 1 0 1 0 0 1 0 1
0 1 1 0 1 0 1 0 0 1
1 0 1 0 1 1 0 0 0 1
1 1 0 0 1 1 1 0 0 1

NAND NOR

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing single gates

 Three ways to test NAND for stuck-at-0,


only one way to test it for stuck-at-1.
 Three ways to test NOR for stuck-at-1, only
one way to test it for stuck-at-0.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing combinational networks

 100% coverage: test every gate for


– stuck-at-0;
– stuck-at-1.
 Assume that there is only one faulty gate
per network.
 Most networks require more than one test
vector to test all gates.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multiple test example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example

 Can test both NANDs for stuck-at-0


simultaneously (abc = 000).
 Cannot test both NANDs for stuck-at-1
simultaneously due to inverter. Must use
two vectors.
 Must also test inverter.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stuck-at-open/closed model

 Models transistors always on/off.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stuck-open behavior

 If t1 is stuck open (switch cannot be closed),


there can be no path from VDD to output
capacitance.
 Testing requires two cycles:
– must discharge capacitor;
– try to operate t1 to see if capacitor can be
charged.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Delay fault

 Delay falls outside acceptable limits:


– gate delay fault assumes that all delays are
lumped into one gate;
– path delay fault models delay problems along
path through network.
 Delay problems reduce yield:
– performance problems;
– functional problems in some types of circuits.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational network testing

Two parts to testing:


– controlling the inputs of (possibly interior)
gates;
– observing the outputs of (possibly interior)
gates.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational testing example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing procedure

 Goal: test gate D for stuck-at-0 fault.


 First step: justify 0 values on gate inputs.
 Work backward from gate to primary
inputs:
– w1 = 0 (A output = 0);
– i1 = i2 = 1.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Testing procedure, cont’d

 Observe the fault at a primary output:


– o1 gives different values if D is true/faulty.
 Work forward and backward:
– F’s other input must be 0 to detect true/fault.
– Justify 0 at E’s output.
 In general, may have to propagate fault
through multiple levels of logic to primary
outputs.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Fault masking

Redundant logic can mask faults:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Redundancy example

 Testing NOR for SA0 requires setting both


inputs to 0.
 Network topology ensures that one NOR
input will always be 1.
 Function reduces to 0:
– f = (ab)’ + b’ = a’ + b’ + b = 0.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Redundancies and testing

 Redundant logic cannot be controlled.


 Observations requiring control of redundant
logic may not be possible.
 Redundant logic should be minimized to
eliminate redundancy. Redundancies can
introduce delay faults and other problems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Multipliers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Elementary school algorithm

0110 multiplicand
x1001 multiplier
0110
+0000 partial product
00110
+0000
000110
+0110
FPGA-Based System Design: Chapter 1
0110110 Copyright  2004 Prentice Hall PTR
Word serial multiplier

register

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational multiplier

Uses n-1 adders, eliminates registers:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Array multiplier

 Array multiplier is an efficient layout of a


combinational multiplier.
 Array multipliers may be pipelined to
decrease clock period at the expense of
latency.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Array multiplier organization

0110
multiplicand x1001
0110
+0000
multiplier 00110
skew array
+0000 for rectangular
000110 layout
0110
product 0110110
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Unsigned array multiplier

x2y0 x1y0 x0y0


0 0

+ x1y1 + x0y1

+ x1y2 + x0y2

xn-1yn-1

+ + 0

P(2n-1) P(2n-2) P0
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Unsigned array multiplier, cont’d

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Array multiplier critical path

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for multiplier row
module multrow(part,x,ym,yo,cin,s,cout);
/* A row of one-bit multiplies */
input [2:0] part;
input [3:0] x;
input ym, yo;
input [2:0] cin;
output [2:0] s;
output [2:0] cout;

assign {cout[0],s[0]} = part[1] + (x[0] & ym) + cin[0];


assign {cout[1],s[1]} = part[2] + (x[1] & ym) + cin[1];
assign {cout[2],s[2]} = (x[3] & yo) + (x[2] & ym) + cin[2];
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for last multiplier row
module lastrow(part,cin,s,cout);
/* Last row of adders with full carry chain. */
input [2:0] part;
input [2:0] cin;
output [2:0] s;
output cout;
wire [1:0] carry;

assign {carry[0],s[0]} = part[0] + cin[0];


assign {carry[1],s[1]} = part[1] + cin[1] + carry[0];
assign {cout,s[2]} = part[2] + cin[2] + carry[1];
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for multiplier
module array_mult(x,y,p);
input [3:0] x;
input [3:0] y;
output [7:0] p;
wire [2:0] row0, row1, row2, row3, c0, c1, c2, c3;

/* generate first row of products */


assign row0[2] = x[2] & y[0]; assign row0[1] = x[1] & y[0];
assign row0[0] = x[0] & y[0]; assign p[0] = row0[0]; assign c0 = 3’b000;
multrow p0(row0,x,y[1],y[0],c0,row1,c1); assign p[1] = row1[0];
multrow p1(row1,x,y[2],y[1],c1,row2,c2); assign p[2] = row2[0];
multrow p2(row2,x,y[3],y[2],c2,row3,c3); assign p[3] = row3[0];
lastrow l({x[3] & y[3],row3[2:1]},c3,p[6:4],p[7]);
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Baugh-Wooley multiplier

 Algorithm for two’s-complement


multiplication.
 Adjusts partial products to maximize
regularity of multiplication array.
 Moves partial products with negative signs
to the last steps; also adds negation of
partial products rather than subtracts.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Booth multiplier

 Encoding scheme to reduce number of


stages in multiplication.
 Performs two bits of multiplication at
once—requires half the stages.
 Each stage is slightly more complex than
simple multiplier, but adder/subtracter is
almost as small/fast as adder.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Booth encoding

 Two’s-complement form of multiplier:


– y = -2nyn + 2n-1yn-2 + 2n-2yn-2 + ...
 Rewrite using 2a = 2a+1 - 2a:
– y = -2n(yn-1-yn) + 2n-1(yn-2 -yn-1) + 2n-2(yn-3 -yn-2)
+ ...
 Consider first two terms: by looking at three
bits of y, we can determine whether to
add/subtract x, 2x to partial product.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Booth actions

yi yi-1 yi-2 increment


000 0
001 x
010 x
011 2x
100 -2x
101 -x
110 -x
111 0

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Booth example

 x = 011001 (2510), y = 101110 (-1810).


 y1y0y-1 = 100, P1 = P0 - (10  011001) =
11111001110.
 y3y2y1= 111, P2 = P1+ 0 = 11111001110.
 y5y4y3= 101, P3 = P2 - 0110010000 =
11000111110.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Booth structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wallace tree

 Reduces depth of adder chain.


 Built from carry-save adders:
– three inputs a, b, c
– produces two outputs y, z such that y + z = a +
b+c
 Carry-save equations:
– yi = parity(ai,bi,ci)
– zi = majority(ai,bi,ci)
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Wallace tree structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Wallace tree operation

 At each stage, i numbers are combined to


form ceil(2i/3) sums.
 Final adder completes the summation.
 Wiring is more complex.
 Can build a Booth-encoded Wallace tree
multiplier.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Serial-parallel multiplier

 Used in serial-arithmetic operations.


 Multiplicand can be held in place by
register.
 Multiplier is shfited into array.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Serial-parallel multiplier
structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Logic synthesis.
 Placement and routing.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic optimization

 Logic synthesis programs transform


Boolean expressions into logic gate
networks in a particular library.
 Optimization goals: minimize area, meet
delay constraint; some power optimizations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Syntax-directed translation

 Translate HDL into logic directly.


– ab + ac
 Generally requires optimization.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Macros

 Pre-designed logic.
– Generally identified by language features.
– + operator.
– xxx()
 Hard macro: includes placement.
 Soft macro: no placement.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic synthesis phases

 Technology-independent optimizations
work on logic representations that do not
directly model logic gates.
 Technology-dependent optimizations work
in the available set of logic gates.
 Transformation from technology-
independent to technology-dependent is
called library binding.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Technology-independent
optimizations

 Works on Boolean expression equivalent.


 Estimates size based on number of literals.
 Uses factorization, resubstitution,
minimization, etc. to optimize logic.
 Technology-independent phase uses simple
delay models.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Technology-dependent
optimizations

 Maps Boolean expressions into a particular


cell library.
 Mapping may take into account area, delay.
 May perform some optimizations on
addition to simple mapping.
 Allows more accurate delay models.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Boolean network

 A Boolean network is the main


representation of the logic functions for
technology independent optimizations.
 Each node can be represented as sum-of-
products (or product-of-sums).
 Provides multi-level structure, but functions
in the network need not correspond to logic
gates.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Boolean network example
primary outputs
out1 = k2 + x2’ out2 = k3 + x1

k2 = x1’ x2 x4 + k1
k3 = k1 x4’

k1 = x2 + x3
primary inputs
x1 x2 x3 x4
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Terms

 Support: set of variables used by a function.


 Transitive fanout: all the primary outputs
and intermediate variables of a function.
 Transitive fanin: all the primary inputs and
intermediate variables used by a function.
Transistive fanin determines a cone of logic.

primary inputs cone output

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Technology-independent logic
optimization

 Simplification rewrites node to simplify its


form.
 Network restructuring introduces new nodes
for common factors, collapses several nodes
into one new node.
 Delay restructuring changes factorization to
reduce path length.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Cost in the Boolean network

 Don’t know exact gate structure, but can


estimate final network cost:
– area estimated by number of literals (true or
complement forms of variables);
– delay estimated by path length.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Partially-specified functions

 Don’t-cares can be implemented in either


the on-set or off-set.
 Don’t-cares provide the greatest
opportunities for minimization in many
cases.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Partially-specified function
example

x2
1 don’t care

x1
0

1
x3
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Don’t-cares in Boolean networks

 In two-level function, don’t-cares are


defined at primary output.
 In Boolean network, structure of network
itself introduces don’t-cares.
 Types of structural don’t-cares:
– satisfiability;
– observability.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Satisfiability don’t-cares

 Occur when an intermediate variable value


is inconsistent with its function inputs.
Since this can’t happen, we don’t care.
f=yc
y == g
y a=b=0, f=1 can’t happen
Don’t-care for f: y’g + yg’

g=ab

a b c
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Observability don’t-cares

 Occur when an intermediate variable’s


value doesn’t affect the network primary
outputs.
a If a=1, then b is don’t-care
x

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Optimizations

 Simplification.
– Changing the way a function is represented.
 Network restructuring.
– Adding and removing nodes.
 Delay restructuring.
– Optimizations that reduce the height of critical
paths.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Cube representation

 On-set, off-set, don’t-care set, cover:


x2

x1

x3
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Espresso example

x2

x1

x3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Partial collapsing

f1 f4 F f4

f2 f3 f3

before after

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Factorization

 Based on division:
– formulate candidate divisor;
– test how it divides into the function;
– if g = f/c, we can use c as an intermediate
function for f.
 Algebraic division: don’t take into account
Boolean simplification. Less expensive then
Boolean division.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Factorization using division

 Three steps:
– generate potential common factors and compute
literal savings if used;
– choose factors to substitute into network;
– restructure the network to use the new factors.
 Algebraic/Boolean divison can be used to
implement first step.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Technology mapping

 Cover the function:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA tech mapping

 Cost (number of
inputs) doesn’t always
increase with added
functions:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGAs vs. custom logic

 Cost metric for static gates is literal:


– ax + bx’ has four literals, requires 8 transistors.
 Cost metric for FPGAs is logic element:
– All functions that fit in an LE have the same
cost.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


LUT-based logic synthesis

 Find the largest logic cone that will fit into


the LUT:
r = q + s’

q = g’ + h s = d’

d=a+b

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement and routing

 Two critical phases of layout design:


– placement of components on the chip;
– routing of wires between components.
 Placement and routing interact, but
separating layout design into phases helps
us understand the problem and find good
solutions.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement metrics

 Quality metrics for layout:


– area;
– delay.
 Area and delay determined partly by wiring.
 How do we judge a placement without
wiring? Estimate wire length without
actually performing routing.
 Design time may be important for FPGAs
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Wire length as a quality metric

bad placement good placement


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Wire length measures

 Estimate wire length by


distance between
components.
 Possible distance measures:
Euclidean
– Euclidean distance (sqrt(x2
+
y2));
– Manhattan distance (x + y). Manhattan
 Multi-point nets must be
broken up into trees for
good estimates.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Wiring trees

Steiner point

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement techniques

 Can construct an initial solution, improve an


existing solution.
 Pairwise interchange is a simple
improvement metric:
– Interchange a pair, keep the swap if it helps
wire length.
– Heuristic determines which two components to
swap.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement by partitioning

 Works well for components of fairly


uniform size.
 Partition netlist to minimize total wire
length using min-cut criterion.
 Partitioning may be interpreted as 1-D or 2-
D layout.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Recursive partitioning

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Min-cut bisecting partitioning

1 net B
A

3 nets

C
D

partition 1 partition 2
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Min-cut bisecting partitioning,
cont’d

 Swapping A and B:
– B drags 1 net;
– A drags 3 nets;
– total cut increase: 3 nets.
 Conclusion: probably not a good swap, but
must be compared with other pairs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Kernighan-Lin algorithm

 Compute min cut criterion:


– count total net cut change.
 Algorithm exchanges sets of nodes to
perform hill-climbing—finding
improvements where no single swap will
improve the cut.
 Recursively subdivide to determine
placement detail.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Simulated annealing

 Powerful but CPU-intensive optimization


technique.
 Analogy to annealing of metals:
– temperature determines probability of a
component jumping position;
– probabilistically accept moves.
– start at high temperature, cool to lower
temperature to try to reach good placement.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Routing

 Major phases in routing:


– global routing assigns nets to routing areas;
– detailed routing designs the routing areas.
 Net ordering is a major problem. Order in
whch nets are routed determines quality fo
result. Net ordering is a heuristic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Global routing

 Choose a sequence of channels.


– Not tracks within a channel.
 Must take capacity into account.
 Channel graph allows path algorithms to be
used for global routing.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Channel graph
switch switch switch
box channel channel box
box

channel LE channel LE channel

switch switch switch


channel box channel box
box

channel LE channel LE channel

switch switch switch


box channel box channel box

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Maze routing

 Will find shortest path for a single wire, if


such a path exists.
 Two phases:
– Label nodes with distance, radiating from
source.
– Use distances to trace from sink to source,
choosing a path that always decreases distance
to source.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Lee (wavefront) router

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA issues

 Often want a fast answer. May be willing to


accept lower quality result for less
place/route time.
 May be interested in knowing wirability
without needing the final configuration.
 Fast placement: constructive placement,
iterative improvement through simulated
annealing.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
FPGA routing

 Finding a route into given interconnection


network.
 Global routing assigns to channels.
 Local routing selects the programming
points used to make the connections.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA routing techniques

 Nair: route based on congestion, not


distance. Route in two passes:
– Estimate congestion.
– Final routing.
 Triptych: more gradual penalty for
congestion.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 16 x 16 multiplier example.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


The FPGA design process

 Translation from HDL.


– (synthesis, translation)
 Logic synthesis.
– (mapping)
 Placement and routing.
– (place and route)
 Configuration generation.
– (program file generation)

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design experiments

 Synthesize with no constraints.


 Synthesize with timing constraint.
– Tighten timing constraint.
 Synthesize with placement constraints.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Post-translation simulation model
 HDL model in terms of FPGA primitives.
 Example:
X_LUT4 \p12_Madd__n0015_Mxor_Result_Xo<1>1 (
.ADR0(x_7_IBUF),
.ADR1(y_13_IBUF),
.ADR2(c12[7]),
.ADR3(row12[8]),
.O(row13[7])
);

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Mapping report
Design Summary
--------------
Number of errors: 0
Number of warnings: 0
Logic Utilization:
Number of 4 input LUTs: 501 out of 1,024 48%
Logic Distribution:
Number of occupied Slices: 255 out of 512 49%
Number of Slices containing only related logic: 255 out of 255 100%
Number of Slices containing unrelated logic: 0 out of 255 0%
*See NOTES below for an explanation of the effects of unrelated logic
Total Number 4 input LUTs: 501 out of 1,024 48%

Number of bonded IOBs: 64 out of 92 69%

Total equivalent gate count for design: 3,006


Additional JTAG gate count for IOBs: 3,072
Peak Memory Usage: 64 MB

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Static timing analysis report

Timing constraint: TS_P2P = MAXDELAY


FROM TIMEGRP "PADS" TO TIMEGRP
"PADS" 99.999 uS ;

20135312 items analyzed, 0 timing errors


detected. (0 setup errors, 0 hold errors)
Maximum delay is 20.916ns.
---------------------------------------------------------------
-----------------

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Static timing report: delays along
paths
Data Sheet report:
-----------------
All values displayed in nanoseconds (ns)

Pad to Pad
------------------+----------------------+-----------+
Source Pad |Destination Pad| Delay |
------------------+----------------------+-----------+
x<0> |p<0> | 5.824|
x<0> |p<10> | 10.675|
x<0> |p<11> | 11.214|
x<0> |p<12> | 11.753|

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Routing report
Phase 1: 1975 unrouted; REAL time: 11 secs

Phase 2: 1975 unrouted; REAL time: 11 secs

Phase 3: 619 unrouted; REAL time: 12 secs

Phase 4: 619 unrouted; (0) REAL time: 12 secs

Phase 5: 619 unrouted; (0) REAL time: 12 secs

Phase 6: 619 unrouted; (0) REAL time: 12 secs

Phase 7: 0 unrouted; (0) REAL time: 12 secs


The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is:
0

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Static timing after routing

Timing constraint: TS_P2P = MAXDELAY


FROM TIMEGRP "PADS" TO TIMEGRP
"PADS" 99.999 uS ;

20135312 items analyzed, 0 timing errors


detected. (0 setup errors, 0 hold errors)
Maximum delay is 38.424ns.
---------------------------------------------------------------
-----------------

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing constraint

 Use timing constraint


editor:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Post-map static timing report

Timing constraint: TS_P2P = MAXDELAY


FROM TIMEGRP "PADS" TO
TIMEGRP "PADS" 32 nS ;

20135312 items analyzed, 0 timing errors


detected. (0 setup errors, 0 hold errors)
Maximum delay is 20.916ns.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Post-routing static timing report

Timing constraint: TS_P2P = MAXDELAY


FROM TIMEGRP "PADS" TO
TIMEGRP "PADS" 32 nS ;

20135312 items analyzed, 0 timing errors


detected. (0 setup errors, 0 hold errors)
Maximum delay is 31.984ns.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Tighter timing constraints

 Tighten requirement to 25 ns.


 Post-place-route timing report:
Timing constraint: TS_P2P = MAXDELAY
FROM TIMEGRP "PADS" TO TIMEGRP
"PADS" 25 nS ;

20135312 items analyzed, 11 timing errors


detected. (11 setup errors, 0 hold errors)
Maximum delay is 31.128ns.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Report on a violated path
Slack: -6.128ns (requirement - data path)
Source: y<0> (PAD)
Destination: p<30> (PAD)
Requirement: 25.000ns
Data Path Delay: 31.128ns (Levels of Logic = 31)

Data Path: y<0> to p<30>


Location Delay type Delay(ns) Physical Resource
Logical Resource(s)
------------------------------------------------- -------------------
K5.I Tiopi 0.825 y<0>
y<0>
y_0_IBUF
SLICE_X2Y11.G4 net (fanout=31) 1.792 y_0_IBUF
SLICE_X2Y11.Y Tilo 0.439 c2<5>
p0_Madd__n0017_Mxor_Result_Xo<1>1
SLICE_X2Y11.F4 net (fanout=2) 0.304 row1<6>
SLICE_X2Y11.X Tilo 0.439 c2<5>
p1_Madd__n0019_Cout1
SLICE_X5Y16.F3 net (fanout=2) 0.784 c2<5>
SLICE_X5Y16.X Tilo 0.439 c3<5>
p2_Madd__n0019_Cout1
SLICE_X2Y18.G4 net (fanout=2) 0.668 c3<5>
SLICE_X2Y18.Y Tilo 0.439 row5<4>
p3_Madd__n0019_Mxor_Result_Xo<1>1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power report
Power summary: I(mA) P(mW)
----------------------------------------------------------------
Total estimated power consumption: 333
---
Vccint 1.50V: 0 0
Vccaux 3.30V: 100 330
Vcco33 3.30V: 1 3
---
Inputs: 0 0
Logic: 0 0
Outputs:
Vcco33 0 0
Signals: 0 0
---
Quiescent Vccaux 3.30V: 100 330
Quiescent Vcco33 3.30V: 1 3

Thermal summary:
----------------------------------------------------------------
Estimated junction temperature: 36C
Ambient temp: 25C
Case temp: 35C
Theta J-A: 34C/W

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Power report: decoupling
capacitance
Decoupling Network Summary: Cap Range (uF) #
----------------------------------------------------------------
Capacitor Recommendations:
Total for Vccint : 4
470.0 - 1000.0 : 1
0.0100 - 0.0470 : 1
0.0010 - 0.0047 : 2
---
Total for Vccaux : 4
470.0 - 1000.0 : 1
0.0100 - 0.0470 : 1
0.0010 - 0.0047 : 2
---
Total for Vcco33 : 10
470.0 - 1000.0 : 1
0.470 - 2.200 : 1
0.0470 - 0.2200 : 2
0.0100 - 0.0470 : 3
0.0010 - 0.0047 : 3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Improving area

 Floorplanner window:

Chip
floorplan
LEs

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Rat’s nest wiring

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Routing editor view

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Adding placement constraints

 Must add attributes to the Verilog:


// synthesis attribute rloc of p0 is X0Y0
multrow
p0(row0,x,y[1],y[0],c0,row1,c1);

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Editing constraints

 Use constraints editor


to place constraints:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design browser pane

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Drag and drop constraints

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Change the shape of constraints

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Full set of placement constraints

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Placement results

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


New timing report

 After placement constraints:


19742142 items analyzed, 0 timing errors
detected. (0 setup errors, 0 hold errors)
Maximum delay is 29.934ns.
 Compares to 31 ns for unconstrained
placement.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Detailed routing constraints

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Basics of sequential machines.


 Sequential machine specification.
 Sequential machine design processes.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sequential machines

 Use registers to make primary output


values depend on state + primary inputs.
 Varieties:
– Mealy—outputs function of present state,
inputs;
– Moore—outputs depend only on state.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FSM structure

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Constraints on structure

 No combinational cycles.
 All components must have bounded delay.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Synchronous design

 Controlled by clock(s).
– State changes at time determined by the clock.
– Inputs to registers settle in time for state change.
– Primary inputs settle in time for combinational delay
through logic.
 Machine state is determined solely by registers.
– Don’t have to worry about timing constraints, events
outside the registers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Non-functional requirements and
optimization

 Performance:
– Clock period is determined by combinational logic
delay.
 Area:
– Combinational logic size usually dominates area.
 Energy/power:
– Often dominated by combinational logic.
– May be improved by latching values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Models of state machines

 Register-transfer:
– Combinational equations for inputs to registers.
 State transition graph/table:
– Next-state, output functions described
piecewise.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


State transition graph

 Each transition describes part of the next-


state, output functions:

0/010 S2

S1

1/1-0 S3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register-transfer structure

 Registers fed by combinational logic:

D Q D Q

D Q Combinational D Q
logic
D Q D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Block diagram

 Purely structural description:

A B1

B2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Symbolic values

 A sequential machine description may use


symbolic, not binary values.
– Symbolic values must be encoded during
implementation.
 Encoding may optimize implementation
characteristics:
– Area.
– Performance.
– Energy.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


STG vs. register-transfer

 Each representation is easier for some types


of machines.
 Example: counter.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Counter state transition graph

 Cyclic structure:

1/1 1/2 1/7


0 1 6 7

1/0

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Counter register-transfer function

 Specify using addition:


– Next_count = count + 1.
 Regular structure of logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example: 01 string recognizer

 Recognize 01 sequence in input string:

recognizer

0 0
0 0
1 1
1 0
0 0
1 1
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Recognizer state transition graph

1/0 0/0

0/0
Bit 1 Bit 2

1/1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Mealy vs. Moore machine

 Moore machine:
– Output a function of state.
 Mealy machine:
– Output a function of primary inputs + state.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sequential machine definition

 Machine computes next state N, primary


outputs O from current state S, primary
inputs I.
 Next-state function:
– N = d(I,S).
 Output function (Mealy):
– O = l(I,S).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Reachability

 State is reachable if there is a path from


given state.
 May be created by state encoding:

s0 s1

s2 s3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Homing sequence

 Sequence of inputs that drives the machine


to a given state.

s0 s1

s2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Equivalent states

 States are equivalent if they cannot be


distinguished by any input sequence:
0/0
s1 s2
-/1
1/0 -/0
-/0
s3 s4

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Networks of FSMs

 Functions can be built up from


interconnected FSMs:

I1 x I2

M1 y M2
O1 O2

Internal connections
External connections

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Illegal composition of Mealy
machines

Combinational
Combinational

logic
logic

D
Q

Q
D

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Communicating FSM states
0/0

s1 s3

-/1 1/1
1/0 -/0

s2 s4 0/0

M1 M2
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Product machine

 Two connected machines:

i1 o1 i2 o2
R S

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Component STGs

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Behavior of connected machines

i1 o1 i2 o2
R S

0 R1 0 S1 1

0 R2 1 S2 0

0 R3 0 S1 0

0 R3 0 S1 0
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Forming product machine

 Form Cartestian product of states:


– R1S1, R1S2, R2S1, R2S2, R3S1, R3S2.
 For each product state, determine the
combined behavior of each product
transition:
– Required inputs.
– Produced output.
– Next product state.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


State assignment

 Find a binary code for symbolic values in


machine.
– Optimize area, performance.
– May be done on inputs, outputs as well.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Optimizing state assignments

 Codes affect the next-state, output logic.


– Compute conditions based on state.
 Best code depends on the input, output logic
and its interaction with state computations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Encoding a shift register

 Symbolic state 0 S00 S00 0


transition table for 1 S00 S10 0
shift register:
0 S01 S00 1
1 S01 S10 1
0 S10 S01 0
1 S10 S11 0
0 S11 S01 1
1 S11 S11 1
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Bad encoding

 Let S00 = 00, S01 = 01, S10 = 10, S11 = 10.


 Logic:
– Output = S1 S0’ + S1’ S0
– N1 = I
– N0 = I S1’ + I S1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Good encoding

 Let S00 = 00, S01 = 01, S10 = 10, S11 = 11.


 Logic:
– Output = S0
– N1 = I
– N0 = S1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


One-hot code

 N-state machine has n-bit encoding.


 Ith bit is 1 if machine is in state i.
 Comparison:
– Easy to tell what state the machine is in.
– Easy to get the machine into an illegal state
(0000, 1111, etc.).
– Uses a lot of registers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Common factors in state coding

 Consider this set of transitions:


– 0, s1 OR s2 -> s3, 1
 Want to choose a code that easily produces
s1 OR s2.
– S1 = 00, S2 = 01.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


State codes in n-space

1
s2 code = 110

s1 code = 111

0 1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


State codes and delay

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Verilog styles for sequential machines.


 Flip-flops and latches.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog always statement

 Use always to wait for clock edge:


always @(posedge clock) // start execution
at the clock edge
begin
// insert combinational logic here
end

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog state machine
always @(posedge clock) // start execution at the clock edge
begin
if (rst == 1)
begin
// reset code
end
else begin // state machine
case (state)
‘state0: begin
o1 = 0;
state = ‘state1;
end
‘state1: begin
if (i1) o1 = 1; else o1 = 0;
state = ‘state0;
endcase
end // state machine
end
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Traffic light controller

 Intersection of two roads:


– highway (busy);
– farm (not busy).
 Want to give green light to highway as
much as possible.
 Want to give green to farm when needed.
 Must always have at least one red light.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Traffic light system

traffic farm road


light
highway

sensor

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


System operation

 Sensor on farm road indicates when cars on


farm road are waiting for green light.
 Must obey required lengths for green,
yellow lights.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Traffic light machine

 Build controller out of two machines:


– sequencer which sets colors of lights, etc.
– timer which is used to control durations of
lights.
 Separate counter isolates logical design
from clock period.
 Separate counter greatly reduces number of
states in sequencer.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Sequencer state transition graph
(cars & long)’ / 0 green red
hwy-
green cars & long / 1 green red
short/ 1 red yellow

short’ / hwy-
farm- short’ /
0 red yellow yellow yellow 0 yellow red

short / 1 yellow red


cars’ & long / 1 red green farm-
green
cars & long’ / 0 red green
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Verilog model of controller
module sequencer(rst,clk,cars,long,short,hg,hy,hr,fg,fy,fr,count_reset);
input rst, clk; /* reset and clock */ input cars; // high when a car is present at the farm road
input long, short; /* long and short timers */ output hg, hy, hr; // highway light: green, yellow, red
output fg, fy, fr; /* farm light: green, yellow, red */ reg hg, hy, hr, fg, fy, fr; // remember these outputs
output count_reset; /* reset the counter */ reg count_reset; // register this value for simplicity
// define the state codes
‘define HWY_GREEN 0
‘define HWY_YX 1
‘define HWY_YELLOW 2
‘define HWY_YY 3
‘define FARM_GREEN 4
‘define FARM_YX 5
‘define FARM_YELLOW 6
‘define FARM_YY 7

reg [2:0] state; // state of the sequencer

always @(posedge clk)


begin
if (rst == 1)
begin
state = ‘HWY_GREEN; // default state
count_reset = 1;
end
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Verilog model of controller, cont’d.
else begin // state machine
count_reset = 0;
case (state)
‘HWY_GREEN: begin
if (~(cars & long)) state = ‘HWY_GREEN;
else begin
state = ‘HWY_YX;
count_reset = 1;
end
hg = 1; hy = 0; hr = 0; fg = 0; fy = 0; fr = 1;
end
‘HWY_YX: begin
state = ‘HWY_YELLOW;
hg = 0; hy = 1; hr = 0; fg = 0; fy = 0; fr = 1;
end
‘HWY_YELLOW: begin
if (~short) state = ‘HWY_YELLOW;
else begin
state = ‘FARM_YY;
end
hg = 0; hy = 1; hr = 0; fg = 0; fy = 0; fr = 1;
end
‘FARM_YY: begin
state = ‘FARM_GREEN;
hg = 0; hy = 0; hr = 1; fg = 1; fy = 0; fr = Copyright
0;  2004 Prentice Hall PTR
FPGA-Based System Design: Chapter 1end
Verilog model of timer
module timer(rst,clk,long,short);
input rst, clk; // reset and clock
output long, short; // long and short timer outputs

reg [3:0] tval; // current state of the timer

always @(posedge clk) // update the timer and outputs


if (rst == 1)
begin
tval = 4’b0000;
short = 0;
long = 0;
end // reset
else begin
{long,tval} = tval + 1; // raise long at rollover
if (tval == 4’b0100)
short = 1’b1; // raise short after 2^2
end // state machine
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog model of system
module tlc(rst,clk,cars,hg,hy,hr,fg,fy,fr);
input rst, clk; // reset and clock
input cars; // high when a car is present at the farm road
output hg, hy, hr; // highway light: green, yellow, red
output fg, fy, fr; // farm light: green, yellow, red

wire long, short, count_reset; // long and short


// timers +
counter reset

sequencer s1(rst,clk,cars,long,short,
hg,hy,hr,fg,fy,fr,count_reset);
timer t1(count_reset,clk,long,short);

endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


‘FARM_GREEN: begin
if (cars & ~long) state = ‘FARM_GREEN;
else begin
state = ‘FARM_YX;
count_reset = 1;
end
hg = 0; hy = 0; hr = 1; fg = 1; fy = 0; fr = 0;
end
‘FARM_YX: begin
state = ‘FARM_YELLOW;
hg = 0; hy = 0; hr = 1; fg = 1; fy = 0; fr = 0;
end
‘FARM_YELLOW: begin
if (~short) state = ‘FARM_YELLOW;
else begin
state = ‘HWY_GREEN;
end
hg = 0; hy = 0; hr = 1; fg = 0; fy = 1; fr = 0;
end
‘HWY_YY: begin
state = ‘HWY_GREEN;
hg = 1; hy = 0; hr = 0; fg = 0; fy = 0; fr = 1;
end
endcase
end // state machine
end // always
endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


The synchronous philosophy

 All operation is controlled by the clock.


– All timing is relative to clock.
– Separates functional, performance
optimizations.
 Put a lot of work into designing the clock
network so you don’t have to worry about it
throughout the combinational logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register characteristics

 Form of clock signal used to trigger the


register.
 How the behavior of data around the clock
trigger affects the stored value.
 When the stored value is presented at the
output.
 Whether there is ever a combinational path
from input to output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Types of registers

 Latch: transparent when internal memory is


being set.
 Flip-flop: not transparent, reading and
changing output are separate.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Types of registers

 D-type (data). Q output is determined by the


D input at the clocking event.
 T-type (toggle). Toggles its state at input
event.
 SR-type (set/reset). Set or reset by inputs
(S=R=1 not allowed).
 JK-type. Allows both J and K to be 1,
otherwise similar to SR.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock event

 Change in clock signal that controls register


behavior.
– 0-1 transition or 1-0 transition.
 Data must generally be held constant
around the clock event.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Setup and hold times

event
setup
hold

clock

changing stable
D

time
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Duty cycle

 Percentage of time that clock is active.

50%

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Clocking disciplines.
– Flip-flops.
– Latches.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clocking disciplines

 Rules for constructing sequential machines.


– Combinations of registers and gates.
– Behavior of clocks and primary inputs over
time.
 Rules are sufficient to guarantee that the
system will work at some clock rate.
– May not be as fast as we want.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Qualified clock

 Clock logically combined with signal:

D Q


sig1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flip-flop-based sequential
machines

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flip-flop rules

 Primary inputs change after clock () edge.


 Primary inputs must stabilize before next
clock edge.
 Rules allow changes to propagate through
combinational logic for next cycle.
 Flip-flop outputs hold current-state values
for next-state computation.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Signals in flip-flop system

positive clock edge

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch-based machines

 Latches do not cut combinational logic


when clock is active.
 Latch-based machines must use multiple
ranks of latches.
 Multiple ranks require multiple phases of
clock.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Two-sided latch constraint

 Latch must be open less than the shortest


combinational delay.
 Period between latching operations must be
longer than the longest combinational delay.
 Note: difference between shortest and
longest combinational delay may be large
(sum0 vs. sum31).

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch shoot-through

Latch may allow data to shoot through:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Strict two-phase clocking
discipline

 Strict two-phase discipline is conservative


but works.
 Can be relaxed later with proper knowledge
of constraints.
 Strict two-phase machine makes latch-based
machine behave more like flip-flop design,
but requires multiple phases.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Strict two-phase architecture

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Two-phase clock

Phases must not overlap:

non-overlap region

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Why it works

 Each phase has a one-sided constraint:


phase must be long enough for all
combinational delays.
 If there are no combinational loops, phases
can always be stretched to make that section
of the machine work.
 Total clock period depends on sum of phase
periods.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Clocking types

 Logic on different phases operate at


different times—can’t mix signals from
different phases.
 Primary inputs must obey the same rules as
internal signals.
 Clocking types are bookkeeping that help us
ensure that machine structure is valid.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stable signals

 A logic signal is always stable during one


phase—phase in which the latch which
produced it is not active.
 Easiest to think of machine behavior in
terms of stable signals, though signals
propagate while not stable.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Signal types

 Clocks are separate type: 1 , 2.


 Two types of stable data signal:
– stable 1 (s 1)
– stable 2 (s 2)
 A stable signal has a complementary valid
signal:
– stable 2 (s 2) = valid 1 (v 1)

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stable data signal

inactive clock

stable 2 becomes stable until latch


valid at end of 1 feeding this
logic goes active
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
How clocking types combine

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clocking types in the two-phase
machine

I1(s 2) s 2
combinational
D Q logic

O1(s 2)
1
combinational I2(s 1)

s 1 logic Q D

O2(s 1)
2
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Clocking type propagation

 Combinational logic does not change type


of signal.
 Primary inputs must be compatible.
 Latches change signals from one clock type
to another.
 In strict system, never mix clocks with data
signals in combinational logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Two-coloring

I1(s 2) s 2
combinational
D Q logic

O1(s 2)
1
combinational I2(s 1)

s 1 logic Q D

O2(s 1)
2
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Example: shift register

 Want to displace bit by n registers in n


cycles.
 Each register requires two phases:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Shift register operation

1 = 1, 2 = 0

FPGA-Based System Design: Chapter 1 1 = 0, 2 = 1 Copyright  2004 Prentice Hall PTR


Non-strict disciplines

 Some relaxation of the rules can be useful:


– reduce area;
– increase performance.
 Rules must be relaxed in a way that ensures
the machine will still work.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Qualified clocks

 Use logic to generate a clock signal which


is not always active.
 Qualification must not introduce glitches
into the clock—glitches violate the
fundamental definition of a clock by
introducing extra edges.
 Use stable signals to qualify clocks.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Uses of qualified clocks

 May want to conditionally load a register.


 May qualify a clock to turn off machine for
low-power operation.
 Latch must be not lose its value during
inactive period.
 Difficult to ensure that logic value will
come high in time—use quasi-static latch.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Qualified clocks and skew

 Logic in the clocking path introduces delay.


 Delay can cause clock to arrive at latches at
different times, violating clocking
assumptions.
 When designing qualification logic:
– minimize and check skew;
– sharpen clock edge.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Qualification skew example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Performance analysis.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Unbalanced delays

Logic with unbalanced delays leads to


inefficient use of logic:

short clock period long clock period

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flip-flop-based system performance
analysis

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Flip-flop-based system model

 Clock signal is perfect (no rise/fall), period P.


 Clock event on rising edge.
 Setup time s.
– Time from arrival of combinational logic event to clock
event.
 Propagation time p.
– Time for value to go from flip-flop input to output.
 Worst-case combinational delay C.
– Time from output of flip-flop to input.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock parameters

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock period constraint

 P >= C + s + p. s p C

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock with rise/fall

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Rise/fall clock period constraint

 P >= C + s + p + tr. s tr p C

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Min-max delays

 Delays may vary:


– Manufacturing
variations.
– Temperature variations.
 Min/max delays
compound over paths. t
– Delays within a chip
are correlated.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch system clock period

 For each phase, phase period must be longer


than sum of:
– combinational delay;
– latch propagation delay.
 Phase period depends on longest path.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch-based system model

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Two-phase timing parameters

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock period constraint

 Total clock period (both phases):


– P >= C1 + C2 + 2s + 2p.
 Each phase must meet timing for its own
latch.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latch-based system model

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Advanced performance analysis

 Latch-based systems always have some idle


logic.
 Can increase performance by blurring phase
boundaries. Results in cycle time closer to
average of phases.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example with unbalanced phases

One phase is much longer than the other:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spreading out a phase

Compute only part of long paths in one phase:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Spreading out a phase, cont’d.

Use other phase for end of long logic block


and all of short logic block:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Problems

 Hard to debug—can’t stop the system.


 Hard to initialize system state.
 More sensitive to process variations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing and glitches in FSMs

 If inputs don’t change, can outputs glitch?

input output
logic

Q D

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Skew

 Skew: relative delay between events.


 Signal skew: most important for
asynchronous, timing-dependent logic.
 Clock skew: can harm any sequential
system.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Signal skew

Machine data signals must obey setup and


hold times—avoid signal skew.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Signal skew example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew

Clock must arrive at all memory elements in


time to load data.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew example

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew in system

D Q

logic

d
D Q

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew and qualified clocks

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew analysis model

ss2112 == dd21 –– dd12

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Skew and clock period

 Assume that each flip-flop operates


instantaneously:
– T >= D2 + d12
 If clock arrives at FF2 after FF1, then we
have more time to compute.
 Given clock period, determine allowable
skew:
– s12 >= T + D2

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing through logic

 As skew increases, we
have less time to get
the signal through the
logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock distribution

 Often one of the


hardest problems in
clock design.
– Fast edges.
– Minimum skew.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clock skew example

D Q D Q

10 ps 10 ps

20 ps 20 ps

D Q
30 ps 30 ps

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Retiming

Retiming moves registers through


combinational logic:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Retiming properties

 Retiming changes encoding of values in


registers, but proper values can be
reconstructed with combinational logic.
 Retiming may increase number of registers
required.
 Retiming must preserve number of registers
around a cycle—may not be possible with
reconvergent fanout.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Topics

 Basics of register-transfer design:


– data paths and controllers.
 High-level synthesis.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register-transfer design

 A register-transfer system is a sequential


machine.
 Register-transfer design is structural—
complex combinations of state machines
may not be easily described solely by a
large state transition graph.
 Register-transfer design concentrates on
functionality, not details of logic design.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Register-transfer system example

A register-transfer machine has combinational


logic connecting registers:
Q D combinational
logic

combinational D Q combinational D Q
logic logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Block diagrams

Block diagrams specify structure: wire bundle


of width 5

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data path-controller systems

 One good way to structure a system is as a


data path and a controller:
– data path executes regular operations
(arithmetic, etc.), holds registers with data-
oriented state;
– controller evaluates irregular functions, sets
control signals for data path.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data and control are equivalent

 We can rewrite control into data and visa


versa:
– control: if i1 = ‘0’ then o1 <= a; else o1 <=
b; end if;
– data: o1 <= ((i1 = ‘0’) and a) or ((i1 = ‘1’)
and b);
 Data/control distinction is useful but not
fundamental.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Data and control

ctrl

carry select

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data operators

 Arithmetic operations are easy to spot in


hardware description languages:
– x <= a + b;
 Multiplexers are implied by conditionals.
Must evaluate entire program to determine
which sources of data for registers.
 Multiplexers also come from sharing
adders, etc.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Conditionals and multiplexers

if x = ‘0’ then
reg1 <= a;
else
reg1 <= b;
end if;

code

register-transfer
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Alternate data path-controller
systems

controller controller controller

data path data path data path

one controller, two communicating


one data path data path-controller
systems

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipelines

 Provide higher utilization of logic:

Combinational logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline metrics

 Throughput: rate at which new values enter


the system.
– Initiation interval: time between successive
inputs.
 Latency: delay from input to output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Simple pipelines

 Pure pipelines have no control.


 Choose latency, throughput.
 Choose register locations with retiming.
 Overhead:
– Setup, hold times.
– Power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Complex pipelines

 Actions in pipeline depend on data or


external events.
 Actions on pipe:
– Stall values.
– Abort operation.
– Bypass values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


High-level synthesis

 Sequential operation is not the most abstract


description of behavior.
 We can describe behavior without assigning
operations to particular clock cycles.
 High-level synthesis (behavioral synthesis)
transforms an unscheduled behavior into a
register-transfer behavior.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Tasks in high-level synthesis

 Scheduling: determines clock cycle on


which each operation will occur.
 Allocation: chooses which function units
will execute which operations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Functional modeling code in
Verilog

assign o1 = i1 | i2;
if (! I3) then
o1 = 1’b1; clock cycle boundary can
o2 = a + b; be moved to design different
register transfers
else
o1 = 1’b0;
end;

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data dependencies

 Data dependencies describe relationships


between operations:
– x <= a + b; value of x depends on a, b
 High-level synthesis must preserve data
dependencies.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data flow graph

 Data flow graph (DFG) models data


dependencies.
 Does not require that operations be
performed in a particular order.
 Models operations in a basic block of a
functional model—no conditionals.
 Requires single-assignment form.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data flow graph construction

original code: single-assignment form:


x <= a + b; x1 <= a + b;
y <= a * c; y <= a * c;
z <= x + d; z <= x1 + d;
x <= y - d; x2 <= y - d;
x <= x + c; x3 <= x2 + c;

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data flow graph construction,
cont’d

Data flow forms directed acyclic graph


(DAG):

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Goals of scheduling and
allocation

 Preserve behavior—at end of execution,


should have received all outputs, be in
proper state (ignoring exact times of
events).
 Utilize hardware efficiently.
 Obtain acceptable performance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data flow to data path-controller

One feasible schedule for last DFG:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Binding values to registers

registers fall on
clock cycle
boundaries

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register lifetimes

a b c d

x x x

x x

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Allocation creates multiplexers

 Same unit used for different values at


different times.
– Function units.
– Registers.
 Multiplexer controls which value has access
to the unit.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Choosing function units

muxes allow
function units
to be shared
for several
operations

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Building the sequencer

sequencer requires three states,


even with no conditionals

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog for data path
module dp(reset,clock,a,b,c,d,muxctrl1,muxctrl2,muxctrl3,
muxctrl4,loadr1,loadr2,loadr3,loadr4,x3,z);
parameter n=7;
input reset; input clock; input [n:0] a, b, c, d; // data primary inputs input muxctrl1, muxctrl2, muxctrl4; // mux control
input [1:0] muxctrl3; // 2-bit mux control input loadr1, loadr2, loadr3, loadr4; // register control output [n:0] x3, z;

reg [n:0] r1, r2, r3, r4; // registers


wire [n:0] mux1out, mux2out, mux3out, mux3bout, mux4out, mult1out, mult2out;

assign mux1out = (muxctrl1 == 0) ? a : r1;


assign mux2out = (muxctrl2 == 0) ? b : r4;
assign mux3out = (muxctrl3 == 0) ? a : (muxctrl3 == 1 ? r4 : r3);
assign mux4out = (muxctrl4 == 0) ? c : r2;
assign mult1out = mux1out * mux2out;
assign mult2out = mux3out * mux4out;
assign x3 = mult2out;
assign z = mult1out;
always @(posedge clock)
begin
if (reset)

r1 = 0; r2 = 0; r3 = 0; r4 = 0;
end
if (loadr1) r1 = mult1out;
if (loadr2) r2 = mult2out;
if (loadr3) r3 = c;
if (loadr4) r4 = d;
end
 endmodule

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Choices during high-level
synthesis

 Scheduling determines number of clock


cycles required; binding determines area,
cycle time.
 Area tradeoffs must consider shared
function units vs. multiplexers, control.
 Delay tradeoffs must consider cycle time vs.
number of cycles.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Finding schedules

 Two simple schedules:


– As-soon-as-possible (ASAP) schedule puts
every operation as early in time as possible.
– As-late-as-possible (ALAP) schedule puts
every operation as late in schedule as possible.
 Many schedules exist between ALAP and
ASAP extremes.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


ASAP and ALAP schedules

ASAP

ALAP

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog model of ASAP schedule
reg [n-1:0] w1reg, w2reg, w6reg1, w6reg2, w6reg3,
w6reg4, w3reg1, w3reg2, w4reg, w5reg;

always @(posedge clock)


begin
// cycle 1
w1reg = i1 + i2;
w3reg1 = i4 + i5;
w6reg1 = i7 + i8;
// cycle 2
w2reg = w1reg + i3;
w3reg2 = w3reg1;
w6reg2 = w6reg1;
// cycle 3
w4reg = w3reg2 + w2reg;
w6reg3 = w6reg2;
// cycle 4
w5reg = i6 + w4reg;
w6reg4 = w6reg3;
// cycle 5
o1 = w6reg4 + w5reg;
end

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Verilog of ALAP schedule
reg [n-1:0] w1reg, w2reg, w6reg, w6reg2,
w6reg3, w3reg, w4reg, w5reg;

always @(posedge clock)


begin
// cycle 1
w1reg = i1 + i2;
// cycle 2
w2reg = w1reg + i3;
w3reg = i4 + i5;
// cycle 3
w4reg = w3reg + w2reg;
w6reg3 = w6reg2;
// cycle 4
w5reg = i6 + w4reg;
w6reg = i7 + i8;
// cycle 5
o1 = w6reg + w5reg;
end

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Critical path of schedule

Longest path through data flow determines


minimum schedule length:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Operator chaining

 May execute several


operations in sequence in one
cycle—operator chaining.
 Delay through function units
may not be additive, such as
through several adders.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Control implementation

 Clock cycles are also known as control


steps.
 Longer schedule means more states in
controller.
 Cost of controller may be hard to judge
from casual inspection of state transition
graph.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Controllers and scheduling

functional
model:
x <= a + b;
one state
y <= c + d;

two states
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Distributed control

two distributed controllers

one centralized controller


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Synchronized communication
between FSMs

To pass values between two machines, must schedule output


of one machine to coincide with input expected by the other:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hardwired vs. microcoded
control

 Hardwired control has a state register and


“random logic.”
 A microcoded machine has a state register
which points into a microcode memory.
 Styles are equivalent; choice depends on
implementation considerations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Data path-controller delay

Watch out for long delay paths created by


combination of data path and controller:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Low power design.


 Pipelining.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Rules for reducing power
consumption.

 Turn it off.
– Eliminates leakage current.
 Slow it down, reduce voltage.
– Performance is linear with clock frequency.
– Power is V2.
 Don’t change its inputs.
– Activity-dependent.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Energy and power

 Energy = power * time.


 Energy consumption is critical for battery-
powered systems.
 Power consumption is critical for heat
dissipation limited systems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Energy and performance

 In many cases, high performance = low


energy.
– Efficiency pays off in both arenas.
 In some cases, energy can be saved by
reducing performance.
– P = 1/2 CV2
– Power goes down faster than performance.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Levels of abstraction

 Physical:
– Minimize capacitance.
 Gate:
– Use low leakage gates.
 Combinational:
– Avoid twitches.
 Register-transfer:
– Avoid using units.
 Architecture:
– Slow things down, turn them off.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Sources of energy consumption

 Static:
– Leakage.
 Dynamic:
– Switching activity.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Physical optimizations

 Assuming equal signal probabilities, total


wire capacitance is proportional to dynamic
power consumption.
 Shorter wires -> less power consumption.
 More active nets should be shortened first.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


How to reduce wire length

 Use hard macros where possible.


 Add placement constraints.
 Use design hierarchy to guide placement
search.
 Use nets with small drivers where possible.
– Don’t drive a net faster than it needs to go.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Logic/circuit optimizations

 Turn off gate where possible.


– Not an option in most FPGAs, but it should be.
 Operate gate at low voltage.
– Speed decreases linearly, power decreases as
V2.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combinational optimizations

 Design network to avoid unnecessary


glitching where possible.
– Balance delays across paths.
 Can duplicate logic to reduce wire lengths.
– Does the duplicate logic use less power than the
wire?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Register-transfer optimizations

 Hold inputs when a unit’s output will not be


used.
– Put register at inputs.
 Turn off units when they won’t be used for
several cycles.
– Can’t selectively turn off LEs in most FPGAs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Architectures for low power

 Two important methods:


– architecture-driven voltage scaling
– power-down modes

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Architecture-driven voltage
scaling

 Add extra logic to increase parallelism so


that system can run at lower rate.
 Power improvement for n parallel units over
Vref:
– Pn(n) = [1 + Ci(n)/nCref + Cx(n)/Cref](V/Vref)

Clock = 25 MHz
Clock = 50 MHz
Clock = 25 MHz
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Power-down modes

 CMOS doesn’t consume power when not


transitioning. Many systems can incorporate
power-down modes:
– condition the clock on power-down mode;
– add state to control for power-down mode;
– modify the control logic to ensure that power-
down/power-up don’t corrupt control state.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipelines

 Provide higher utilization of logic:


P21

Combinational logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline metrics

 Throughput: rate at which new values enter


the system.
– Initiation interval: time between successive
inputs.
 Latency: delay from input to output.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Simple pipelines

 Pure pipelines have no control.


 Choose latency, throughput.
 Choose register locations with retiming.
 Overhead:
– Setup, hold times.
– Power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Complex pipelines

 Actions in pipeline depend on data or


external events.
 Actions on pipe:
– Stall values.
– Abort operation.
– Bypass values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline metrics

 Ignore register delay:


– Combinational logic delay D.
– Latency L.
– Throughput T.
 Delay through unpipelined system.
– L = D.
– T = 1/D.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Adding pipeline stages

 Add a pipeline stage:


– Latency remains L = D.
– Throughput increases: T = 2/D.
 n-stage pipeline:
– Throughput increases: T = n/D.
 Clock period:
– P = D/n.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Performance vs. pipeline stages

throughput

clock period

# stages
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Adding pipeline stages

 Must add a pipeline stage that cuts the logic.


– Cutset for PI-PO graph.
 Can use retiming to position the registers in
the logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Cutsets

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bad pipeline

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline utilization

 Need to fill up the pipeline.


– Later stages are unused as the pipeline fills up.
 Assume D stages of valid data, n total
stages.
– Utilization U= D / D+n.
 In steady state, utilization approaches 1.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipelines with control

 Pipeline may do different things at different


times.
– CPU control flow.
 Must make sure that the pipeline operates
properly in all cases.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sending a control signal forward

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Sending a control signal backward

 Make sure control arrives at right cycle:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Combining signals from multiple
cycles

 Different stages
can’t use ALU on
same cycle.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Distributed pipeline control

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline control logic

 Ideal pipeline
needs no
-/ALU = op
significant
control:
s1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Simple decisions

 Simple decision
doesn’t add
1 /ALU = - 0 /ALU = +
states:

s1

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Controlling a pipeline flush

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Product machine for pipeline flush

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hardware sharing control

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Pipeline verification

 Extensive simulation is required to exercise


the pipeline.
– State of pipeline stages interact.
 Symbolic simulation: simulate names, not
particular values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Design methodologies.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design methodologies

 Every company has its own design


methodology.
 Methodology depends on:
– size of chip;
– design time constraints;
– cost/performance;
– available tools.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design teams

 Almost all interesting projects are too big


for one person to handle.
 Need a team of people with varying skills.
 Who is in charge?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Documents

 Documents are critical:


– Writing it helps you decide what to do.
– Minimizes risk of hit-by-a-truck syndrome.
– Provides information for maintenance, next
generation.
 Each document serves as the contract
between the provider of the document and
the consumer of the document.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Major documents

 Requirements.
 Specification.
 Architecture.
 Module designs.
 Reference manual, user manual.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Types of information

 Functional description.
 Non-functional description: cycle time,
power, etc.
 Timetables.
 Design verification methods.
 Quality metrics.
 Job assignments.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Starting the project

 Requirements: English description of what


is to be done.
– Customer-oriented.
– High-level.
 May be written by marketing.
 Author of requirements should verify that
the requirements are accurate.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Specifications

 The specification is the contract between


marketing and the design team.
 The specification is more technical than the
requirements:
– delays, etc.
 An ideal specification would contain no
architectural information, but that goal may be
hard to achieve in practice.
– The specification says what to do, not how to do it.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Specification and planning

 Driven by contradictory impulses:


– customer-centric concerns about cost,
performance, etc.;
– forecasts of feasibility of cost and performance.
 Features, performance, power, etc. may be
negotiated at early stages; negotiation at
later stages creates problems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Architecture

 The architecture document is the contract


between the system designers and the
component designers.
 Specifies major subsystems and their
interactions.
 Makes important design decisions.
 Isn’t a full implementation.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Module designs

 Specifies details of a module:


– functionality;
– non-functional parameters;
– design verification.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Do documents reflect the
product?

 In a word, no.
– Things change.
– People don’t have time to conform documents
to the final design.
 Some amount of updating is important for
maintenace, future generations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design reviews

 Have other designers (team + non-team)


evaluate a design.
– Relatively simple.
– Proven to work.
 Must walk through the design in detail to
look for problems, improvements.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Generic design flow

architectural
simulation
detailed register-transfer logic
specs design design
Timing/area
budget

Final physical
configuration design design
verification

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Estimation and planning

 Estimation techniques vary with module:


– memories may be generated once size is
known;
– data paths may be estimated from previous
design;
– controllers are hard to estimate without details.
 Estimates must include speed, area, power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Floorplanning and budgeting

 Want some early physical design


information: area, delay, power, etc.
 Ways to get info:
– previous designs;
– quick design runs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Architecture

 Need to build an executable model of the


architecture.
– Run vectors on architecture.
– Use as golden design for comparison with later
stages.
 Modeling languages:
– C: easier to write, less detailed.
– Verilog: harder to write, synthesizable with
effort.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Logic design

 For controllers, good state assignment is


usually requires CAD tools.
 Logic synthesis is an option:
– very good for non-critical logic;
– can work well for speed-critical logic.
 Logic synthesis system may be sensitive to
changes in the input specification.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Place and route

 Most computationally expensive stage.


 Metrics take more time to judge than
functional vectors.
 Deciding how to fix a problem may take
effort.
– How to change placement, etc.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Design verification

 Functional verification:
– runs reasonable set of vectors.
 Non-functional verification:
– performance;
– power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Functional verification

 At all levels of hierarchy: module,


subsystem, system.
 At every level of abstraction.
– Compare to previous level of abstraction,
golden model.
 Must check interfaces.
– Half of bugs are at the interface to other
modules.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Functional verification input

 Sources of vectors:
– Previous designs.
– Vectors from higher levels of abstraction.
– Vectors designed previously for this stage.
– Inputs from other modules.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Non-functional verification

 Performance:
– Static timing analysis.
 Power:
– Some information from timing analysis.
– Power analysis tools.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Breadboards

 May build a board to test an FPGA-based


design.
– Takes some time.
– May allow running the design against the real
I/O device.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Bus interfaces.
 Platform FPGAs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bus interfaces

 Requirements:
– High performance.
– Variable signal environment.
 Techniques:
– Asynchronous logic.
– Handshaking-oriented protocols.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Timing diagrams

0 1
a

changing
b stable

Timing constraint
c

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Asynchronous logic

 Distribute timing information with values.


– No global clock.
 Clock signal paths must have the same
delay as data values.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Latching an asynchronous signal

adrs
adrs D Q

adrs_ready

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Asynchronous timing constraints

 Must satisfy setup, hold times.

adrs
Hold time
Setup time

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bus system design

 Requirements:
– Imposed by the other side of the system.
 Constraints:
– Imposed by this side of the system.
requirements

a b

constraints
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Views of the bus

 Hardware:

D Q D Q

a b
Combinational
logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Views of bus system, cont’d.

 Timing diagram:

x y
x
D Q D Q

a b
y Combinational
logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bus protocols

 Basic transaction:
– four-cycle handshake.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Handshake machine

 Each side is an FSM (possibly


asynchronous):

Go enq enq

0 a 1 0 b 1

ack ack
ack

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Basic protocols

 Handshake transmits data:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Box 1 logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Box 2 logic

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bus timing

t1 = tc1 - td1 >= tr td1 = d stable


td2 = d not stable
tc1 = c rises
t2 = tack1 - tc1 >= th
tc2 = c falls
tack1 = ack rises
t3 = tc2 - tack1 >= th

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Busses and systems

 Microprocessor systems often have several


busses running at different rates:

CPU
mem
High-speed

I/O bridge
Low-speed

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Basic signals in a bus

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Bus characteristics

 Physical
– Connector size, etc.
 Electrical
– Voltages, currents, timing.
 Protocol
– Sequence of events.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Advanced transactions

 Multi-cycle transfers:
– Several values on one handshake.
– May use implicit addressing.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


PCI bus

 Used for box-level system interconnect.


 Two versions:
– 33 MHz.
– 66 MHz.
 Supports advanced transactions.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


PCI bus read

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multi-rate systems

 Logic blocks
running at different
clock rates may
communicate: Logic 1 Logic 2
– Multi-chip.
– Single-chip.
» Slow bus connects 100 MHz 33 MHz
to fast logic.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Metastability

 Registers capturing
transitioning signals
may take an
arbitrarily long time
to settle.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Resynchronization

 Use cascaded registers to minimize the


chance of using a metastable value.

d D Q D Q dout

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Platform FPGAs

 Put all the logic for a system on one FPGA.


 Requires large FPGAs plus:
– Specialized logic:
» I/O support;
» memory interface.
– CPUs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Example: Virtex II Pro

 Major features:
– Large FPGA fabric.
– High-speed I/O.
– PowerPC.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Virtex II Pro High-speed I/O

 Rocket I/O:
– parallel/serial or serial/parallel transceiver.
 Clock recovery circuitry.
 Transceivers for multiple standards: Gigabit
Ethernet, Fibre Channel, etc.
 Programmable decoding features.
 Interface to FPGA fabric.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Virtex II Pro CPUs

 Up to 4 PowerPC 405s per chip:


– 5 stage pipe, static branch prediction, etc.
 Separate instruction, data caches.
 MMU.
 Timers.
 Scan-based debug support.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


PowerPC CoreConnect

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Altera Stratix

 Combines FPGA fabric, memory blocks,


multipliers.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Stratix DSP block

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Hardware/software co-design.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Why put CPUs on FPGAs?

 Shrink a board to a chip.


 What CPUs do best:
– Irregular code.
– Code that takes advantage of a highly
optimized datapath.
 What FPGAs do best:
– Data-oriented computations.
– Computations with local control.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
System design

 True concurrency increases system


performance.
– CPU and accelerator should run in parallel.
 CPU cost is a non-linear function of
performance.
– Accelerator will be smaller, faster, lower
power.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Hardware/software partitioning
if (foo < 8) {
for (i=0; i<N; i++)
x[i] = y[i]*z[i];
}

CPU accelerator

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Methodology

 Measure the application.


 Identify what to put onto the accelerator.
 Build interfaces.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Concurrency

 Concurrent applications provide the most


speedup. No data dependencies

if (a > b) ... x[i] = y[i] * z[i]

CPU accelerator

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Concurrency analysis

 Data dependencies.
z= x * y;
w = z - v;
 Control dependencies.
if (a < b)
u = r + s;

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Partitioning

 Can divide the application into several


processes that run concurrently.
 Process partitioning exposes opportunities
for parallelism.
if (i>b) … Process 1
for (i=0; i<N; i++) Process
… 2
for (j=0; j<N; j++) Process
... 3

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Partitioning programs

 Reasonable partitioning points:


– If statements,etc.
– Loop nests.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multi-threaded systems

 Single thread:  Multi-thread:

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Performance analysis

 Single threaded:  Multi-threaded with


– Find longest possible no synchronization:
execution path. – Find the longest of
several execution
paths.
 Multi-threaded with
synchronization:
– Find the worst-case
synchronization
conditions.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Multi-threaded performance
analysis

 Synchronization causes the delay along one


path to affect the delay along another.

ta tb

synchronization point
tc td

Delay = max(ta, tb) + td


FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Control

 Need to signal between CPU and


accelerator.
– Data ready.
– Complete.
 Implementations:
– Shared memory.
– Handshake.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Keeping the accelerator fed

 Must get data in, must get data out.


 Data transfer costs:
– flush CPU cache;
– device driver;
– bus transactions.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Memory buffers

 Must keep accelerator fed.


– Buffer size in accelerator depends on amount of
data needed at a time, delays in obtaining
needed values.
 Streaming generally requires small buffers:
– x[i] = y[i] * z[i];
 Values with long lifetimes need more buffer
space.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Allocation

 How do we decide what goes on the CPU,


what goes on the FPGA?
 Allocation puts functions on the CPU or
FPGA.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Speedup

 Speedup for one iteration:


– tHW - tSW - tI - tO
 May be able to set up many iterations at
once:
– N*(tHW - tSW) - tI - tO

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Drivers

 Need interface between CPU and


accelerator:
– transfer data values;
– start, stop computation.
 If computation time is very predictable, a
simpler communication scheme may be
possible.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Debugging

 Hard to test a CPU/accelerator system:


– Hard to control and observe the accelerator
without the CPU.
– Software on CPU may have bugs.
 Build separate test benches for CPU code,
accelerator.
 Test integrated system after components
have been tested.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Topics

 Multi-FPGA systems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Issues

 Types of multi-FPGA systems.


 Multi-FPGA networks.
 Multi-FPGA partitioning.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Types of systems

 Can build a specialized multi-FPGA


system.
– Wired for one purpose.
 Can build reusable multi-FPGA system.
– Emulators, other debugging systems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Networks

 Ad hoc.
– Best suited for specialized systems.
 Crossbar.
– Fully connected.
 Specialized crossbars.
 Multi-stage.
– Not often used in multi-FPGA systems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Crossbar

 Fully connected:

w
x

y
z

a b c d
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Properties of crossbar

 Fully connected:
– Single source/destination.
– Multi-point.
 n2 area.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clos network

 System of crossbars that has less than n2


area.
 Fully connected for single-destination
connections.
– Not fully connected for multiple destinations.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clos network organization

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Net size distribution

 Most nets are small, making Clos network


feasible for logic:
# nets

1 2 3 # pins
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
Partial crossbar

 Takes advantage of FPGA


reprogrammability.
 Several small crossbars.
– If your crossbar doesn’t have room for the
connection, reprogram to use another crossbar
on another pin.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Trees and fat trees

 Trees allow
communication
between leaves.
 Fat trees provide
more bandwidth
near root.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Multi-chip partitioning

 Somewhat similar to partitioning for LE


placement.
 Differences:
– k-way partitioning;
– pins are a major cost;
– must handle large problems.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


K-way partitioning

 Direct:
– Divide into k sets.
 Iterative:
– Extract one set, then another, etc.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Clustering-based partitioning

 Grow a cluster to form a partition.


– Start with a seed for the cluster.
– Choose new nodes to add to the cluster.
 Next move depends on the quality of the
previous moves.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Fiduccia-Mattheyses partitioning

 Can deal with variable-sized blocks.


 Related to Kernighan-Lin partitioning.
– Uses a new data structure to determine the best cell to
move.
 Uses an improved algorithm for updating cell
gains after a move.
– Total gain recomputation can be performed by a set of
constant time gain increments/decrements.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Topics

 Coarse-grained FPGAs.
 Reconfigurable systems.
 Reconfigurable ASICs.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


FPGA granularity

 Typical LEs implement a small amount of


logic.
– Waste a lot of space/power on connecting logic
elements.
– Specialized adder logic tries to solve this
problem for a special case.
 Can build FPGAs with larger elements.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Granularity issues

 How big is the logic element?


 How flexible should it be?
 What interconnection network is needed?
 How do you program it?

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Reconfigurable systems

 Reconfigure logic on-the-fly:


– application characteristics may change over
time.
 Issues:
– Reconfiguration time.
– Reconfiguration memory cost.
– Power consumption.
– Synthesis for reconfiguration.
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR
PipeRench

 Reconfigurable pipeline:
– Each stage of the pipeline can be reconfigured
quickly and independently.
 Allows virtual pipeline that is longer than
physical pipeline.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


PipeRench pipeline operation

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RaPiD architecture

 Coarse-grained computational architecture:


– Soft control can be reconfigured on every cycle.
– Hard control can be reconfigured only in
configuration mode.
 Interconnect network allows computational
elements to be arranged in pipelines.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


RaPiD pipeline

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR


Reconfigurable ASICs

 Problems with ASICs:


– Mask cost.
– Manufacturing time.
 Solution---mix ASIC and FPGA:
– Reconfigurable logic on bottom.
– Custom wiring on top.

FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR

Potrebbero piacerti anche