Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Why VLSI?
Moore’s Law.
Why FPGAs?
The VLSI and system design process.
Microprocessors:
– personal computers;
– microcontrollers.
DRAM/SRAM/flash.
Audio/video and other consumer systems.
Telecommunications.
1,000,000
900,000
800,000
700,000
600,000
500,000
mask cost ($)
400,000
300,000
200,000
100,000
0
.25 micron .18 micron .13 micron .09 micron
LE LE
Interconnect
LE LE
network
LE LE
b b
cin cin
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
A hierarchical logic design
box1 box2 x
top
i1 xxx i2
component pin
inverter
0010
+
0001
+ 0011
0100
SiO2 metal3
metal2
transistor metal1
via
poly
n+ n+
p+
substrate
substrate
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Photolithography
p-tub n-tub
substrate
p-tub n-tub
poly poly
n+ p-tub n+ p+ n-tub p+
metal 1 metal 1
n+ p-tub n+ p+ n-tub p+
metal 1 metal 1
n+ p-tub n+ p+ p-tub p+
n-type transistor:
Typical parameters:
n-type:
– kn’ = 13 A/V2
– Vtn = 0.14 V
p-type:
– kp’ = 7 A/V2
– Vtp = -0.21 V
VSS logic 0
90 nm process:
– Rn = 11.1 kW
– Cl = 0.12 fF
So
– tf = 2.2 x 11.1E3 x 0.12E-15 = 2.9 ps.
0
Source above VSS
0
metal 3
metal 2
vias
metal 1
poly poly
n+ p-tub n+
n+ (ND)
bottomwall
substrate (NA) capacitance
Two components:
– parallel plate;
– fringe.
fringe
plate
metal 1 metal 1
Low frequency
Low frequency
High frequency
High frequency
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Skin depth
Metal 2 47 24 0.3
Metal 1 76 36 0.3
sink 1
sink 2
sink 1
sink 2
Steiner point
sink 1
sink 2
Assume h = 1:
– k = sqrt{(0.4 Rint Cint)/(0.7R0 C0)}
Assume arbitrary h:
– k = sqrt{(0.4 Rint Cint)/(0.7R0 C0)}
– h = sqrt{(R0 Cint)/(Rint C0)}
– T50% = 2.5 sqrt{R0 C0 Rint Cint}
aggressor net
victim net
substrate
increased spacing
data
D Q
core
Sense
amp
Read:
– precharge bit and bit’ high;
– set select line high from row decoder;
– one bit line will be pulled down.
Write:
– set bit/bit’ to desired (complementary) values;
– set select line high;
– drive on bit lines will flip state if necessary.
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
SRAM sense amp
1 transistor + 1 capacitor:
word
bit
LE LE LE
Programmable:
– Input connections.
– Internal function.
Coarser-grained than logic gates.
– Typically 4 inputs.
Generally includes register.
May provide specialized logic.
– Adder carry chain.
0 0
0 1
a 0010 0 1 0 0
memory out
b 1001 1 0 1 0
1 1 0 1
LE LE LE
LE LE
… LE
LE LE LE
D Q
LE
LE
Global routing:
– Which combination of channels?
Local routing:
– Which wire in each channel?
Routing metrics:
– Net length.
– Delay.
Length 1
Length 2
SRAM.
– Can be programmed many times.
– Must be programmed at power-up.
Antifuse.
– Programmed once.
Flash.
– Similar to SRAM but using flash memory.
Lookup 1
table out
configuration
bits
1, 1, 1, 0,
0, 1, 1, 0,
1, 0,
1, 10 0 1
Configuration bit
LUT LE out
D Q
Arithmetic:
– Carry block includes XOR gate.
– Use LUT for carry, XOR for sum.
Each slice uses F5 mux to combine results of
multiplexers.
F6 mux combines outputs of F5 muxes.
Registers can be FF/latch; clock and clock enable.
Includes three-state output for on-chip bus.
Modes of operation:
– Normal.
– Arithmetic.
– Counter.
D Q
LE LE LE LE LE
Wiring channel
Wiring channel
LE LE LE LE LE
LE LE LE LE LE
LE LE
Within a channel:
– How many wires.
– Length of segments.
– Connections from LE to channel.
Between channels:
– Number of connections between channels.
– Channel structure.
Length 1
Length 2
channel
channel channel
channel
Types of interconnect:
– local;
– general-purpose;
– dedicated;
– I/O pin.
Relationship
between
GRM, hex
lines, and
local
interconnect:
row
column
Configuration
memory
FPGA
board
Permanently programmed.
Make a connection with electrical signal.
– More reliable than breaking a connection.
– Avoids shrapnel.
Resistance of about 100 W.
Metal 2
antifuse
via
Metal 1
substrate
d0 a out
out 0 d0
d1 1 d1
a Truth table
10 10 0 0
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Actel 54SX logic element
A
CLR
1
0
0
D (AB)’
A^B
latch
A
CLR
0
1
B
0
0
CLK
Number of transistors:
– NAND/NOR gate has 2n transistors.
– 4-input LUT has 128 transistors in SRAM, 96 in
multiplexer.
Delay:
– 4-input NAND gate has 9t delay.
– SRAM decoding has 21t delay.
Power:
– Static gate’s power depends on activity.
– SRAM
FPGA-Based System always
Design: Chapter 1 burns power. Copyright 2004 Prentice Hall PTR
Lookup table circuitry
Demultiplexer or multiplexer?
adrs
adrs
LUT LUT
Bit line
adrs
static gates
pass transistors
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Multiplexer design
Delay proportional to
square of path length.
Delay grows as lg b2.
config
progb
Routing channel
Small area.
Resistive switch.
Delay grows as the
square of the number
of switches.
Larger area.
Regenerative driver.
© 1999 IEEE
© 1999 IEEE
Regular layout
structure.
– Recursive.
Performance:
– Clock speed is generally a primary requirement.
Size:
– Determines manufacturing cost.
Power/energy:
– Energy related to battery life, power related to heat.
– Many digital systems are power- or energy-limited.
Placement:
– Place logic components into FPGA fabric.
Routing:
– Choose connection paths through the fabric.
Configuration generation:
– Generate bits required to configure FPGA.
Sources in project
Source window
Output
module parity(a,p);
input [31:0] a;
output p;
endmodule
module parity(a,p);
input [31:0] a;
output p;
assign p = ^a;
endmodule
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
RTL schematic: top-level
Stimulus
Unit
Under
Test
(UUT)
Response
testbench
An event is a change
in a net’s value.
net1
An event has two
components:
– value; t=35 ns time
– time. net
net1=0 @ 35 ns
event
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Events on a gate
c=0 @ 2 ns
a
1 c
1 0 b=1 @ 1 ns time
0 1
b
a=1 @ 0 ns
netlist timewheel
a e=0 @ 4 ns
0 c 1
1 0
0 1 e d=1 @ 2 ns time
b
0 1 b=1 @ 1 ns
d
netlist timewheel
x = a and b;
a
x
b
if (a or b)
begin
x = c;
end; a
b x
fulladd a0(a[0],b[0],carryin,sum[0],carry[1]);
fulladd a1(a[1],b[1],carry[1],sum[1],carry[2]);
fulladd a2(a[2],b[2],carry[2],sum[2],carry[3]);
case (muxctrl)
1’b0: muxout = a;
1’b1: muxout = b;
end;
foo = muxout or c;
Clock period.
– Duty cycle, etc.
Group path timing.
– Cells or ports that share the same timing
behavior.
Input/output delay.
– End-to-end delay.
tc >= tx + ty
A
0
1
S
B C
X
time
Gate delay:
– intrinsic;
– drive;
– load.
Wire:
– lumped load;
– transmission line.
LE PIP LE
sink
source sink
sink
network
graph model
d = 10 d = 10
d = 20 d = 20
False path
g1 g3
dvr
g2 g4
unbalanced load
g1 g3
dvr
g2 g4
FPGA-Based System Design: Chapter 1 more balanced Copyright 2004 Prentice Hall PTR
Optimizing network delay
N-bit adder:
+ + + +
placement routing
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Better placement and routing
placement routing
shallow
deep logic logic
Gate network:
bad
good
bad good
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Factorization techniques
Number representation.
Shifters.
Adders and ALUs.
One’s complement:
– 3 = 0101
– -3 = ~(0101) = 1010
– Two zeroes: 0000, 1111
Two’s complement:
– 3 = 0101
– -3 = ~(0101) +1 = 1011
– One zero: 0000
N = 2i bi
Test for zero: all bits are 0.
Test for negative: sign bit is 1.
Subtraction: negate then add.
– a – b = a + (-b) = a + (~b+1)
n bits
output
n bits
data 2
n bits
module fulladd(a,b,carryin,sum,carryout);
input a, b, carryin; /* add these bits*/
output sum, carryout; /* results */
fulladd a0(a[0],b[0],carryin,sum[0],carry[1]);
fulladd a1(a[1],b[1],carry[1],sum[1],carry[2]);
…
fulladd a7(a[7],b[7],carry[7],sum[7],carryout]);
endmodule
endmodule
module sum(a,b,carryin,result);
input a, b, carryin; /* add these bits*/
output result; /* sum */
ci
Pi
Pi+1 AND
…
Pi+b-1
OR
Ci+b-1
fulladd_p a0(a[0],b[0],carryin,sum[0],carry[1],p[0]);
fulladd_p a1(a[1],b[1],carry[1],sum[1],carry[2],p[1]);
fulladd_p a2(a[2],b[2],carry[2],sum[2],carry[3],p[2]);
fulladd_p a3(a[3],b[3],carry[3],sum[3],carry[4],p[3]);
assign cs4 = carry[4] | (p[0] & p[1] & p[2] & p[3] & carryin);
fulladd_p a4(a[4],b[4],cs4, sum[4],carry[5],p[4]);
…
assign carryout = carry[8] | (p[4] & p[5] & p[6] & p[7] & cs4);
endmodule
Useful in multiplication.
Input: 3 n-bit operands.
Output: n-bit partial sum, n-bit carry.
– Use carry propagate adder for final sum.
Operations:
– s = (x + y + z) mod 2.
– c = [(x + y + z) –2] / 2.
Xing/Yu---ripple-carry adder:
– n-stage adder divided into x blocks;
– each block has n/x stages;
– block k, 1<= k <= x.
# stages in block k
Delays: constant
– ripple-carry R(yk) = l1 + dyk Delay of a single stage
– carry-generate G(yk) = l2 + d(yk-1)
– carry-terminate T(yk) = G(yk)
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Carry-skip delay model
350
300 100
300
250
Performance-Cost Ratio
Operational Time (ns) 80
250
Ripple
Cost (CLBs)
200 Complete
60 200 CLA
Skip
150
150 RC-select
40
100
100
20
50 50
0 0
0
32
56
80
8
32
56
80
8
40
72
8
0 1 1 0
LSB
opcode
AND
OR
NOT
SUM
module alu(fcode,op0,op1,result,oflo);
parameter n=16, flen=3; input [flen-1:0] fcode; [n-1:0] op0, op1; output [n-1:0] result; output
oflo;
assign
{oflo,result} =
(fcode == ‘PLUS) ? (op0 + op1) :
(fcode == ‘MINUS) ? (op0 - op1) :
(fcode == ‘AND) ? (op0 & op1) :
(fcode == ‘OR) ? (op0 | op1) :
(fcode == ‘NOT) ? (~op0) : 0;
endmodule
Switch networks.
Combinational testing.
pseudo-AND
pseudo-OR
b’
a
b ab’ + a’b
a’
Fault model:
– possible locations of faults;
– I/O behavior produced by the fault.
Good news: if we have a fault model, we
can test the network for every possible
instantiation of that type of fault.
Bad news: it is difficult to enumerate all
types of manufacturing faults.
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Stuck-at-0/1 faults
Testing procedure:
– set gate inputs;
– observe gate output;
– compare fault-free and observed gate output.
Test vector: set of gate inputs applied to a
system.
NAND NOR
Multipliers.
0110 multiplicand
x1001 multiplier
0110
+0000 partial product
00110
+0000
000110
+0110
FPGA-Based System Design: Chapter 1
0110110 Copyright 2004 Prentice Hall PTR
Word serial multiplier
register
0110
multiplicand x1001
0110
+0000
multiplier 00110
skew array
+0000 for rectangular
000110 layout
0110
product 0110110
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Unsigned array multiplier
+ x1y1 + x0y1
+ x1y2 + x0y2
xn-1yn-1
+ + 0
P(2n-1) P(2n-2) P0
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Unsigned array multiplier, cont’d
Logic synthesis.
Placement and routing.
Pre-designed logic.
– Generally identified by language features.
– + operator.
– xxx()
Hard macro: includes placement.
Soft macro: no placement.
Technology-independent optimizations
work on logic representations that do not
directly model logic gates.
Technology-dependent optimizations work
in the available set of logic gates.
Transformation from technology-
independent to technology-dependent is
called library binding.
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Technology-independent
optimizations
k2 = x1’ x2 x4 + k1
k3 = k1 x4’
k1 = x2 + x3
primary inputs
x1 x2 x3 x4
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Terms
x2
1 don’t care
x1
0
1
x3
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Don’t-cares in Boolean networks
g=ab
a b c
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Observability don’t-cares
Simplification.
– Changing the way a function is represented.
Network restructuring.
– Adding and removing nodes.
Delay restructuring.
– Optimizations that reduce the height of critical
paths.
x1
x3
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Espresso example
x2
x1
x3
f1 f4 F f4
f2 f3 f3
before after
Based on division:
– formulate candidate divisor;
– test how it divides into the function;
– if g = f/c, we can use c as an intermediate
function for f.
Algebraic division: don’t take into account
Boolean simplification. Less expensive then
Boolean division.
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Factorization using division
Three steps:
– generate potential common factors and compute
literal savings if used;
– choose factors to substitute into network;
– restructure the network to use the new factors.
Algebraic/Boolean divison can be used to
implement first step.
Cost (number of
inputs) doesn’t always
increase with added
functions:
q = g’ + h s = d’
d=a+b
Steiner point
1 net B
A
3 nets
C
D
partition 1 partition 2
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Min-cut bisecting partitioning,
cont’d
Swapping A and B:
– B drags 1 net;
– A drags 3 nets;
– total cut increase: 3 nets.
Conclusion: probably not a good swap, but
must be compared with other pairs.
16 x 16 multiplier example.
Pad to Pad
------------------+----------------------+-----------+
Source Pad |Destination Pad| Delay |
------------------+----------------------+-----------+
x<0> |p<0> | 5.824|
x<0> |p<10> | 10.675|
x<0> |p<11> | 11.214|
x<0> |p<12> | 11.753|
Thermal summary:
----------------------------------------------------------------
Estimated junction temperature: 36C
Ambient temp: 25C
Case temp: 35C
Theta J-A: 34C/W
Floorplanner window:
Chip
floorplan
LEs
No combinational cycles.
All components must have bounded delay.
Controlled by clock(s).
– State changes at time determined by the clock.
– Inputs to registers settle in time for state change.
– Primary inputs settle in time for combinational delay
through logic.
Machine state is determined solely by registers.
– Don’t have to worry about timing constraints, events
outside the registers.
Performance:
– Clock period is determined by combinational logic
delay.
Area:
– Combinational logic size usually dominates area.
Energy/power:
– Often dominated by combinational logic.
– May be improved by latching values.
Register-transfer:
– Combinational equations for inputs to registers.
State transition graph/table:
– Next-state, output functions described
piecewise.
0/010 S2
S1
1/1-0 S3
D Q D Q
D Q Combinational D Q
logic
D Q D Q
A B1
B2
Cyclic structure:
1/0
recognizer
0 0
0 0
1 1
1 0
0 0
1 1
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Recognizer state transition graph
1/0 0/0
0/0
Bit 1 Bit 2
1/1
Moore machine:
– Output a function of state.
Mealy machine:
– Output a function of primary inputs + state.
s0 s1
s2 s3
s0 s1
s2
I1 x I2
M1 y M2
O1 O2
Internal connections
External connections
Combinational
Combinational
logic
logic
D
Q
Q
D
s1 s3
-/1 1/1
1/0 -/0
s2 s4 0/0
M1 M2
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Product machine
i1 o1 i2 o2
R S
i1 o1 i2 o2
R S
0 R1 0 S1 1
0 R2 1 S2 0
0 R3 0 S1 0
0 R3 0 S1 0
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Forming product machine
1
s2 code = 110
s1 code = 111
0 1
sensor
short’ / hwy-
farm- short’ /
0 red yellow yellow yellow 0 yellow red
sequencer s1(rst,clk,cars,long,short,
hg,hy,hr,fg,fy,fr,count_reset);
timer t1(count_reset,clk,long,short);
endmodule
event
setup
hold
clock
changing stable
D
time
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Duty cycle
50%
Clocking disciplines.
– Flip-flops.
– Latches.
D Q
sig1
non-overlap region
inactive clock
I1(s 2) s 2
combinational
D Q logic
O1(s 2)
1
combinational I2(s 1)
s 1 logic Q D
O2(s 1)
2
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Clocking type propagation
I1(s 2) s 2
combinational
D Q logic
O1(s 2)
1
combinational I2(s 1)
s 1 logic Q D
O2(s 1)
2
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Example: shift register
1 = 1, 2 = 0
Performance analysis.
P >= C + s + p. s p C
P >= C + s + p + tr. s tr p C
input output
logic
Q D
D Q
logic
d
D Q
As skew increases, we
have less time to get
the signal through the
logic.
D Q D Q
10 ps 10 ps
20 ps 20 ps
D Q
30 ps 30 ps
combinational D Q combinational D Q
logic logic
ctrl
carry select
if x = ‘0’ then
reg1 <= a;
else
reg1 <= b;
end if;
code
register-transfer
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Alternate data path-controller
systems
Combinational logic
assign o1 = i1 | i2;
if (! I3) then
o1 = 1’b1; clock cycle boundary can
o2 = a + b; be moved to design different
register transfers
else
o1 = 1’b0;
end;
registers fall on
clock cycle
boundaries
a b c d
x x x
x x
muxes allow
function units
to be shared
for several
operations
r1 = 0; r2 = 0; r3 = 0; r4 = 0;
end
if (loadr1) r1 = mult1out;
if (loadr2) r2 = mult2out;
if (loadr3) r3 = c;
if (loadr4) r4 = d;
end
endmodule
ASAP
ALAP
functional
model:
x <= a + b;
one state
y <= c + d;
two states
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Distributed control
Turn it off.
– Eliminates leakage current.
Slow it down, reduce voltage.
– Performance is linear with clock frequency.
– Power is V2.
Don’t change its inputs.
– Activity-dependent.
Physical:
– Minimize capacitance.
Gate:
– Use low leakage gates.
Combinational:
– Avoid twitches.
Register-transfer:
– Avoid using units.
Architecture:
– Slow things down, turn them off.
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Sources of energy consumption
Static:
– Leakage.
Dynamic:
– Switching activity.
Clock = 25 MHz
Clock = 50 MHz
Clock = 25 MHz
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Power-down modes
Combinational logic
throughput
clock period
# stages
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Adding pipeline stages
Different stages
can’t use ALU on
same cycle.
Ideal pipeline
needs no
-/ALU = op
significant
control:
s1
Simple decision
doesn’t add
1 /ALU = - 0 /ALU = +
states:
s1
Design methodologies.
Requirements.
Specification.
Architecture.
Module designs.
Reference manual, user manual.
Functional description.
Non-functional description: cycle time,
power, etc.
Timetables.
Design verification methods.
Quality metrics.
Job assignments.
In a word, no.
– Things change.
– People don’t have time to conform documents
to the final design.
Some amount of updating is important for
maintenace, future generations.
architectural
simulation
detailed register-transfer logic
specs design design
Timing/area
budget
Final physical
configuration design design
verification
Functional verification:
– runs reasonable set of vectors.
Non-functional verification:
– performance;
– power.
Sources of vectors:
– Previous designs.
– Vectors from higher levels of abstraction.
– Vectors designed previously for this stage.
– Inputs from other modules.
Performance:
– Static timing analysis.
Power:
– Some information from timing analysis.
– Power analysis tools.
Bus interfaces.
Platform FPGAs.
Requirements:
– High performance.
– Variable signal environment.
Techniques:
– Asynchronous logic.
– Handshaking-oriented protocols.
0 1
a
changing
b stable
Timing constraint
c
adrs
adrs D Q
adrs_ready
adrs
Hold time
Setup time
Requirements:
– Imposed by the other side of the system.
Constraints:
– Imposed by this side of the system.
requirements
a b
constraints
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Views of the bus
Hardware:
D Q D Q
a b
Combinational
logic
Timing diagram:
x y
x
D Q D Q
a b
y Combinational
logic
Basic transaction:
– four-cycle handshake.
Go enq enq
0 a 1 0 b 1
ack ack
ack
CPU
mem
High-speed
I/O bridge
Low-speed
Physical
– Connector size, etc.
Electrical
– Voltages, currents, timing.
Protocol
– Sequence of events.
Multi-cycle transfers:
– Several values on one handshake.
– May use implicit addressing.
Logic blocks
running at different
clock rates may
communicate: Logic 1 Logic 2
– Multi-chip.
– Single-chip.
» Slow bus connects 100 MHz 33 MHz
to fast logic.
Registers capturing
transitioning signals
may take an
arbitrarily long time
to settle.
d D Q D Q dout
Major features:
– Large FPGA fabric.
– High-speed I/O.
– PowerPC.
Rocket I/O:
– parallel/serial or serial/parallel transceiver.
Clock recovery circuitry.
Transceivers for multiple standards: Gigabit
Ethernet, Fibre Channel, etc.
Programmable decoding features.
Interface to FPGA fabric.
Hardware/software co-design.
CPU accelerator
CPU accelerator
Data dependencies.
z= x * y;
w = z - v;
Control dependencies.
if (a < b)
u = r + s;
ta tb
synchronization point
tc td
Multi-FPGA systems.
Ad hoc.
– Best suited for specialized systems.
Crossbar.
– Fully connected.
Specialized crossbars.
Multi-stage.
– Not often used in multi-FPGA systems.
Fully connected:
w
x
y
z
a b c d
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Properties of crossbar
Fully connected:
– Single source/destination.
– Multi-point.
n2 area.
1 2 3 # pins
FPGA-Based System Design: Chapter 1 Copyright 2004 Prentice Hall PTR
Partial crossbar
Trees allow
communication
between leaves.
Fat trees provide
more bandwidth
near root.
…
Direct:
– Divide into k sets.
Iterative:
– Extract one set, then another, etc.
Coarse-grained FPGAs.
Reconfigurable systems.
Reconfigurable ASICs.
Reconfigurable pipeline:
– Each stage of the pipeline can be reconfigured
quickly and independently.
Allows virtual pipeline that is longer than
physical pipeline.