Sei sulla pagina 1di 37

Field Programmable Gate Array (FPGA)

Classification of the Programmable


Devices:
- IC semiconductor technology
used
- Vaulted and non-vaulted
- Programmability
- Flexibility
- Capacity
- Routing method
- Characteristics of basic cell
(virtual classification)

The use of programmable devices in DSP Systems


- Most DSP systems use the gate cell types because of their
possibilities of reducing the Boolean expressions
Example: 16-bit adder

- Needs about 73,000,000,000 bits when it is implemented using


bit cell devices such as ROM. Each bit cell usually needs on
average a 1.5 transistor. Hence, bit cell implementation requires
109,500,000,000 transistor.
-It can be implemented using FPGA by using only 20 cells each of
which is built using less than 100 transistor. Hence, it needs less
than 2,000 transistors.
- PLA and PAL (SPLD) have a low flexibility and low capacity
(maximum 32 inputs and 32 outputs) therefore, they use small
DSP circuits in its implementation.
- The Complex Programmable Logic Device (CPLD) is the first
generation of the FPGA that has high flexibility but with low
capacity and is used to implement small circuits and medium
systems with such small control circuits.
- FPGA is a new type of the programmable devices that has the
high flexibility and high capacity. It has a different internal
architecture which depends on the company design (such as Altera,
Xilinx etc).

Architecture of Field Programmable


Gate Arrays (FPGA)
- It is consisting of an array of logic
blocks that can be programmable
interconnected to realize different
designs
- FPGA is programmed via electrically
programmable switches much the
same as traditional programmable
logic devices (PLDs).
- FPGAs can be used to implement
just about any hardware design.
- The common feature of these is that
FPGA is a set of free or semi-free
connection matrix gates.
- FPGA logic blocks differ greatly in their size and implementation
capability. This logic can be very small as two-transistor logic block
used in the Crosspoint FPGA and can be significantly of large size like
the look-up table used in the Xilinx 3000 series FPGA.

Programming Technologies
1- SRAM Programming Technology
- FPGA connections are
achieved using passtransistors, transmission
gates, or multiplexers
that are controlled by
SRAM cells
- It is used in the devices from Xilinx , Altera,
Plessey, Algotronix, Concurrent Logic and
Toshiba.
Disadvantage:
Its large area. It takes at least five transistors to implement an
SRAM cell, plus at least one transistor to serve as a programmable
switch.
Advantages:
1- fast re-programmability (The FPGA can be programmed an
unlimited number
of times)
2- It requires only standard integrated circuit process technology.

Antifuse Programming Technology- 2


- Antifuse is a two terminal device
with an unprogrammed state
presenting a very high resistance
between its terminals.
- When a high voltage (from 11 to 20
volts, depending on the type of
antifuse) is applied across its
terminals the antifuse will blow and
create a low resistance link
- Antifuse technology is used in the FPGAs from Actel, Quicklogic, and
Crosspoint.
Advantages:
1- Its small size. This advantage is somewhat reduced by the large
size of the necessary programming transistors, which must be able
to handle large currents, and the inclusion of isolation transistors
that are sometimes needed to protect low voltage transistors from
high programming voltages.
2- A second major advantage of an antifuse is its relatively low
Disadvantage:
series resistance.
This technology can be used only once on one-time programmable
(OTP) devices

Floating Gate Programming Technology- 3


- The floating gate programming (or
EPROM/E2PROM ) technology uses
technology found in ultraviolet erasable
EPROM and electrically erasable
E2PROM devices. It is used in devices
from Altera and Plus Logic.
- The programmable switch is a transistor that can be permanently
disabled. This is accomplished by injecting a charge on the
floating gate (gate 2 in the Figure) using a high voltage between
the control gate 1 and the drain of the transistor. This charge
increases the threshold voltage of the transistor so that it turns off.
Advantages:
1-its re-programmability.
2- No external permanent memory (like in SRAM) is needed to
program the chip on power-up.
Disadvantage:
An E2PROM cell is roughly twice the size of an EPROM cell

Logic Block Architecture

Fine-Grain Logic Blocks


1- The Crosspoint FPGA
- The FPGA from Crosspoint
solutions uses a single transistor
pair in the logic block
- Since the transistors are
connected together in rows, the
two two-input NAND gates are
isolated by turning off the pair
of transistors between the gates

The Plessey FPGA-2


- The main advantage of
using fine grain logic
blocks is that the useable
blocks are fully
utilized. This is because
it is easier to use small
logic gates efficiently.
- The main disadvantage of fine
grain blocks is that they require a
relatively large number of wire
segments and programmable
switches. Such routing resources
are costly in delay and area.
- As a result, FPGA's employing
fine grain blocks are in general
slower and achieve lower
densities than those employing
coarse grain blocks

Coarse-Grain Logic Blocks


Actel Logic Block- 1
Act-1

logic functions can be realized 702

Act-2

logic functions can be realized 766

- the 16-bit adder needs about 3000 Actel-2 logic block


with each block having 14 transistors. Hence, it needs
about 42,000-transistor.

Quick Logic Logic Block- 2


- The logic block in the FPGA
that forms QuickLogic is
similar to the Actel logic
blocks in that it employs a
four to one multiplexer but
each
input
of
the
multiplexer is fed by an
AND QuickLogic
gate
- The
logic block
is unique among FPGA
architectures in that it
offers up to 14-input-wide
gating functions. This
allows many logic functions
to be accomplished in a
single block delay that
requires two or more delays
with other architecture.

Programmed Quick Logic


FPGA for the logic
function

The Altera Logic Block- 3


- The architecture of the
Altera
FPGA
has
evolved from the PLAbased architecture of
traditional PLDs with its
logic block consisting of
wide (20 to over 100
The Altera 5000 Series logic block
inputs)
AND
gates
feeding into an OR gate
- The
withadvantage
three toof this
eight
type of block is that the wide AND gate can
inputs.
be
used to form logic functions with few levels of logic blocks,
reducing the need for programmable interconnect. As well as logic
connections also serve as the routing function.
- A disadvantage of the wired-AND configuration is the use of
pull-up devices that consume static power. An array full of
these pull-ups will consume significant amount of power. To
mitigate this, each gate in the MAX 7000 series block can be
programmed to consume about 60% less power but at the
expense of about 40% increase in delay.

The Xilinx Logic Block-4


- The basis for the
Xilinx logic block is
an
SRAM
functioning as a
look-up table (LUT).
The truth table for a
K-input
logic
function is stored in
a2 K 1
SRAMof
- The advantage
the look-up tables
is that they exhibit
high
functionality, a Kinput LUT that can
implement
any
function of K inputs
and
2 K there are
2
such
functions

The Xilinx 3000 logic block

Lookup table-based logic

LUT inputs
PROM bits
required
Possible functions

Generall
y
nn

2
22

For 4-input LUT For 5-input LUT


4
16

5
32

16

65,536

32

4,294,967,296

- Its disadvantage is that they will be quite large for more than
about five inputs, since the number of memory cells
K needed for a
2
K-input LUT is

The Xilinx 4000 logic block


Two 4-input LUT can be used directly as an SRAM block. This allows
small amounts of memory to be more efficiently implemented.
Another feature is the inclusion of circuitry that can be used to
implement fast carry addition circuits.

structure of a CLB of Virtex-E


- Each Virtex-E CLB contains four LCs, organized in two similar slices.
- Virtex-E CLB contains logic that combines function generators to
provide functions of five or six inputs.
- F6 input multiplex combines the outputs of all four-function
generators in the CLB by selecting one of the F5-multiplexer
outputs. This permits the implementation of any 6-input function, an
8:1 multiplexer, or select functions of up to 19 inputs.

Comparative Study of Different Types of


Programmable Devices
Approximate cost per 16-bit adder*
Type

Type

Cost of cell /

Total cost /

Total cost /

of IC

of cell

transistor

cell*

transistor*

ROM

bit

1.5

73,000,000,00

109,500,000,00

Plessey

Fine gate

35

500

17,500

Coarse gate

100

20

2000

FPGA
Xilinx FPGA

- The circuits that have internal feedback must be built by using the
gate cell devices.
- The forward transfer function (no internal) such as adder circuit
must be built using either:
Bit cell devices for circuits with less than 16-bit inputs.
Gate cell devices for circuits with more than 16-bit inputs.

- The small circuit that has feedback in its internal design is


recommend to be used for fine-grain devices.
- All types of the large inputs circuit are recommend for fine-grain
devices for simple transfer functions (the outputs have small
Boolean equations).
- All types of the large input circuit is recommend to be used for
coarse-grain devices for complex transfer functions (the outputs
have long and complex Boolean equations).
- The large inputs circuit that has complex transfer functions is
strongly recommended to use Xilinx FPGA devices.

Routing
Architecture
- The routing architecture of an FPGA is the manner in which the
programmable switches and wiring segments are positioned to
allow the programmable interconnection of the logic blocks.
A wire segment is a wire
unbroken by programmable
switches. One or more
switches may attach to the
wire segment
A track is a sequence of one
or more wire segments in a
line.
A routing channel is a
group of parallel tracks.

A Connection Block (CB)


provides connectivity from
the inputs and outputs of a
logic block to the wire
segments in the channels
Switch Block (SB) which
provides
connectivity
between the horizontal as
well
as
vertical
wire
segments
- In some architecture, the
switch block and connection
block are intermingled, and in
others they are combined into a
single structure.
- The switch block topology
is different from device to
another: eg. In topology 1
wire A and B can not be
connected while in topology
2 they can.

The Xilinx Routing Architecture


Connection
block
typically connects each
pin to only two or three
out of the five tracks
passing by a block as the
expanded figure in the
upper left comer to save
area due to the use of
SRAM
programming
technology.
- On all four sides of the
logic block there are
connection blocks that
connect a total of 11
different logic block pins
to the wire segments.

Wire segments types


General-purpose
interconnect consisting of
wire segments that pass
through switches in the
switch block.
Direct
interconnect
consisting of wire segments
that connect each logic block
output directly to four nearest
neighbors (thick black lines )
Long lines, which span the
length or width of the chip,
providing high fan-out
uniform delay connections
(dashed lines)
A clock line, which is a
single net that spans the
entire chip and is driven by
a high drive buffer.

Up Xilinix 4000, Double length wire is used to scan two CLB


offering lower routing delay for moderately long connection.

The Actel Routing Architecture


- The routing architecture is
asymmetric because there
are
more
uncommitted
general purpose tracks in the
horizontal direction than the
vertical.
There
is
no
clearly
separable switch block in the
Actel architecture. Instead,
the switching is distributed
throughout the horizontal
channels
- Each horizontal channel consists of
22 routing tracks, and each track is
broken up into segments of different
lengths
- This wide distribution of segment lengths makes it likely that a
segment of the exact or close length of any given connection can
be found, so that very few series programmable switches are
needed in any intra-channel connection

- In addition to the input


segments
and
output
segments,
there
are
uncommitted
vertical
freeways that either travel
the entire height of the
chip. This allows signals to
travel
longer
vertical
distances than permitted by
the output segments.

- The routing architecture of the Crosspoint FPGA is


similar to that of Actel. The Quicklogic architecture,
which also uses antifuses, is again similar except
that the segments are of two classes: short tracks of
length one, and long tracks that traverse the entire
chip

The Altera Routing Architecture


- The routing architecture of the
Altera FPGA is novel in that it has
a two-level hierarchy
- At the first level of the
hierarchy, 16 or 32 of the logic
blocks are grouped into a logic
array block (LAB)
- The structure of the LAB is
very similar to a traditional
PLD. Each x in the figure
indicates a point where a
connection can be made

The tracks are dedicated to one of


four types of connections:
1) Connections from the outputs of all
logic
blocks in this LAB.
2) Connections from the logic
expanders.
4) Connections
from from
the I/O
pads ofofthe
chip.
3) Connections
outputs
logic
blocks in
other LABs. (next level of
-hierarchy
the advantage
of this scheme is that it makes the routing
PIA)
problem very easy, and the regularity of the physical
design of the silicon allows it to be packed tightly and
efficiently.
- A second advantage of this approach is that the delay
through the PIA is the same regardless of which track is
used since all tracks have identical loading.
- The disadvantage is that many switches are needed, and
these may add more capacitive load than necessary.

The Plessey Routing Architecture


- Programmable routing is
achieved using only a
multiplexer as a connection
block on the inputs of the
two-input NAND gate. The
multiplexers are controlled
by SRAM cells
- The
inputs
to
each
multiplexer are
connected to:
1) The output of the previous
2) NAND
The output of the NAND gate above or below this logic block,
gate inisthe
row.
whichever
closer.
3) A vertical long track.
4) One of the following three connections depending on which NAND
input the
-multiplexer
The NANDdrives:
gate output two blocks previous to the current one
input oflong track (the upper input).
- (lower
A horizontal
Master block).
- The output of the block diagonally away from the current one

Large and Small Gate Cell


- The small gate devices
give low cost circuits
because they dont have
unused inputs and a low
speed
and
this
is
because of the series
connection of the small
- large gate devices need
gates
high cost circuits because
they generally have unused
inputs. However, they have
a high speed because of
the direct generation to the
large logic function
6-input AND gate example :
(a) 2-input AND gate cell devices need 6-AND gates with 5-routing
paths. Hence, it needs about 11 transistors and about 11-time delay
per one 2-input AND gate (tg)
(b) The 8-input AND gate cell devices need 1-AND gate without any
routing path but it needs about 16 transistors and about 2 tg using
the same technology of the 2-input AND gate devices.

Effect of Logic Block Granularity on FPGA Density and


Performance
: Virtex-E Xilinx FPGA case study
- Divide the large number of inputs circuit to small sub circuits
with the number of inputs less than 7-input in each logic
equation
( n -4)
for n 4
Cost 2
- The cost and delay of each circuit
1
for n 4
can be calculated from the following
equation where n is the number of
n
Delay
6
inputs in the circuit:

4-input equations in sub circuits give a minimum cost,


because each such function needs one cell and one unit
delay
6-input equations in sub circuits give the maximum speed,
because each 6-input function needs four cells (one CLB) and
one unit delay
5-input equations in sub circuits give an optimal cost and
speed. This is because each 5-input function needs two cells
and one unit delay.

:Design Example .bit adder-5

direct implementation- 1

Output

S0

S1

S2

S3

S4

Co

Total

No. of variables

11

11

11

Cost/cell

32

128

128

299

Delay/tg

minimum cost implementation- 2

Output

S0

S1

S2

S3

S4

Co

Total

No. of variables

11

Cost/cell

10

Delay/tg

optimal cost/speed implementation- 3

Output

S0

S1

S2

S3

S4

Co

Total

No. of variables

11

Cost/cell

12

Delay/tg

Another optimized- 4
implementation

Output

S0

S1

S2

S3

S4

Co

Total

No. of variables

11

Cost/cell

16

24

Delay/tg

FPGA Generic Design Flow


Design Entry: creation of design
files using schematic editor or
hardware description language
Design Synthesis: creation of a
lower level of logic abstraction
using a library of primitives.
Partition
(or
Mapping):
assigning to each logic element a
specific physical element
Place: maps logic into specific
locations in the target FPGA chip.
Route: connections of the mapped logic.
Program Generation: a bitstream file is generated to
program the device.
Device
Programming:
downloading the bit-stream to
the FPGA.
Design Verification: simulation is used to check
functionalities

Potrebbero piacerti anche