Sei sulla pagina 1di 78

Challenges and opportunities

for FPGA platforms


Ivo Bolsens
Xilinx Research Labs
Xilinx
Outline

• Intro
• ASICs buck the tide
• FPGA ride the tide
• Programmable system platform
• Future

Xilinx
Programmable Logic

4
A LookUp Table (LUT)

16 words x 1 bit memory


any function of 4 inputs

Xilinx
Add a register to make a
Logic Cell
In 4

16 words x 1 bit memory


M
Out
FF
M
CE RST

Xilinx
Memory and Arithmetic

In 4
Carry M

16 words x 1 bit memory


M
Out
FF
M
CE RST

WE
Din

Xilinx
A Slice …

Xilinx
A Slice: 2 Logic Cells + F(5)

Xilinx
A CLB: 4 Slices + …

Xilinx
A CLB: 4 Slices

4 4

4 4

4 4

4 4

Xilinx
A CLB: 4 Slices +
Input/Output

4 4

4 4

40
4 4

4 4

Xilinx
A CLB: 4 Slices +
Input/Output

4 4

4 4

40
4 4

4 4

Xilinx
A CLB: 4 Slices +
Input/Output

4 4

4 4

40
4 4

4 4

Xilinx
A CLB: 4 Slices +
Input/Output

4 4

4 4

40
4 4

4 4

Xilinx
A CLB: 4 Slices +
Input/Output

4 4

4 4

40
4 4

4 4

Xilinx
Add Interconnect

4 4

4 4

40
4 4

4 4

Xilinx
Build an Array

4 4 4 4

4 4 4 4

40 40
4 4 4 4

4 4 4 4

4 4 4 4

4 4 4 4

40 40
4 4 4 4

4 4 4 4

Xilinx
ASICs buck the tide, FPGAs
ride the tide

Xilinx
Moore’s Law

A tale of two numbers

CD CD
Tox, Gate
320nm
Leakage
240nm Gate

160nm
Source Drain
Channel
Substrate Leakage
80nm
1.3 2.7 4.5 6.5 (nm)
Tox
Xilinx
Trend: Line Widths Smaller
Than the Wavelength of Light
0.700

0.600
Process Geometry (micron)

0.500

0.400

0.300

0.200

0.100

-
1988 1990 1992 1994 1996 1998 2000 2002

Optical Processing Wavelength Process Geometry

Xilinx
Painting a one cm line with a
three cm brush…

Courtesy : IBM
Xilinx
Gate Oxide

Polysilicon Gate

Gate Oxide

Silicon crystal

• About 10 molecular layers of SiO2 for this 150nm example


• 90nm technology is about half the thickness

Xilinx
Mask Layers per Mask Set

45
40
35 250
30
180
25
150
20
15 130
10 100 Est.
5
0
# of Mask Layers # of Metal Layers # of OPC Layers # of PSM Layers
/ Set / Set / Set / Set

Xilinx
Mask Set Price Trend vs. Technology

18
16
14
Relative Price

12
10
8
6
4
2
0
250 180 150 130 90

Technology Node [nm]

Xilinx
FPGAs in The Forefront of
The Technology Curve
180 nm
Virtex-EM
150 nm
Virtex-II
130 nm
IT R S
Virtex-II Pro Road
m ap
90 nm
Spartan-3
65 nm FPGA
Road
45 nm map

200 mm wafers 300 mm copper wafers

1999 2000 2001 2002 2003 2004 2005

Xilinx
Wafer Starts

Xilinx
Economy of ‘Scale’

Xilinx
A decade of progress
1000x
1000

Virtex-II
(excl. Block RAM)

100x
100
Capacity
Speed
Price Virtex &
Virtex-E
(excl. Block RAM)

XC4000
10x
10

Spartan

1x1
1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03
Year
Xilinx
FPGA/ASIC Crossover Changes

90nm / 300mm ASICs


Cost

0 m m A S IC s
150nm / 2 0
A s As
P G P G
F m F
m m 0 m
0 0 / 30
/2 9 0 n m
n m
0
15
FPGA FPGA
Cost Advantage
CostFPGA
Advantage
Cost Advantage
ASIC Cost
ASICAdvantage
Cost Advantage

Production Volume

Xilinx
Where are we today
125K 10Mb 556 442 24 4
105K
340

3Mb 168

Logic Cells Block RAM Multipliers 840Mb/s 3.125Gb/s PowerPC


LVDS MGTs CPUs
XC2V8000
= 350M tranistors XC2VP125
Xilinx
Today’s Reconfigurable FPGA
Platform
PowerPC™
Processor 400+ MHz

High-speed 3.125
Gbps Serial
Transceivers

Programmable IO

18 Bit
36 Bit
18 Bit VCCIO

Z
10Mbit Dual-Port™
Z
Z Impedance RAM
>500 DSP datapaths Control
10 Million gates

Xilinx
Design Capability: FPGAs
Meet Most Requirements

10% 20% 30% 40% 50%

10% 20% 30% 40% 50%


42%

52%
25%
20%

23%
13%

13%

11%
20-100K > 1M 50 - 200 > 500
< 20K 100K-1M < 50 200-500

Gates MHz
Source: Gartner Dataquest

Xilinx
FPGA Sweats the Details

 Xilinx programmable system platform gives


designers the benefits of deep submicron
 Rather than focusing on getting the silicon to
work, you can focus on getting the design to work

Xilinx
Complex ASIC Design
The Shrinking Window of Innovation
Interconnect Power Analysis 3% Design
Transistor Analysis 3% authoring 20%
Simulation 5%
Extraction 5%
Place and Route
17% Synthesis
16%
Floorplanning 5%

Static Timing
Analysis Gate Simulation 7% Simulation
5% 14%

• Average iterations between design and layout = 20


(Source Electronic Systems Jan 99)

Xilinx
Simpler/Faster Design Flows
ASIC Design and Silicon System Silicon
Flow Spec Verification Prototype Integration Production

Design
Freeze

FPGA Design and System


Flow Spec Verification Integration

Design
Freeze

• 2:1 proven Time-to-Market Advantage


• No silicon design or verification steps
• More design flexibility through later design
freeze
Xilinx
Today’s Product Lifecycle
Profit for first
Profit to Market

Reduced profit
for latecomers

Time

• 37% of new digital products were late to market


• Entering the market first can result in up to a 40% greater
total profit contribution over the product’s life vs. the #2
entrant
Xilinx
Today’s Product Lifecycle
IRL extends
Profit product life in market

Time

• 37% of new digital products were late to market


• Entering the market first can result in up to a 40% greater
total profit contribution over the product’s life vs. the #2
entrant
Xilinx
Design Starts: FPGAs Rule
500
Number of Designs (thousands)

500

400 400

300

100X
200

100

5 4
2 1.5
2001 2002 2003 2004
FPGA
ASIC
Source: Gartner Group

Xilinx
FPGAs rise to the system-
level design challenge
• PLATFORM • APPLICATIONS
• FPGA • Streaming multimedia
– Logic, Routing & I/O • Network processing
• Software defined radio
• Soft CPUs
• Third+ generation base
• Hard CPUs
stations
• Bus hierarchies • Storage area networks
• Memory hierarchies multipliers
• Gigabit I/Os • DISCIPLINES & DESIGNERS
• RTOS, drivers, network • Systems architects, hardware
protocol stacks, embedded & software engineers, DSP &
real-time s/w communications specialists
Xilinx
Towards programmable platforms
• 32-bit RISC CPU, Harvard Architecture
130nm CMOS with 1.5V Operation
Timers
Fetch & and
Debug

Decode
Logic
• 456 Dhrystone MIPS at 300MHz
• 32 x 32-bit General Purpose Registers
I-CachePPC
I-Cache MMU PPCD-Cache
D-Cache • Hardware Multiply / Divide
16KB 16KB
16KB 16KB
• 5-Stage Execution Pipeline
• 16KB D-Cache, 16KB I-Cache
Execution
ExecutionUnit
Unit • Memory Management Unit (MMU)
32x32b GPR
32x32b GPR
ALU, MAC
ALU, MAC
• High-Bandwidth Interface to Logic
• Built-In Hardware Timers
IBM PowerPC™ 405 RISC CPU
• Built-In JTAG Debug and Trace support

3.8 sq mm = 1% of 2VP100

Xilinx
“Low PowerPC”: 0.59mW/MIPS
400
Full-Custom
Full-CustomIBMIBMCPU
CPUDesign
Design
1.5V
1.5V130nm
130nmCMOS
CMOSTechnology
Technology
Low-K
Low-KDielectric
Dielectric
300
IP-Immersion
IP-Immersion
Power (mW)

200 100mW =
1 LED Indicator

100
…or 169 MIPS!

0 50 100 150 200 250 300 350 400


Performance (Dhrystone MIPS)
Xilinx
System Architecture Options
External External • “Logic-Centric Architecture”
Devices Interfaces – PowerPC Executes Entirely out of Cache
– No FPGA Logic, Memory, or I/O Used
– 10-20 Pages of C-Code or More
– Use as Complex Algorithmic Engine
• Web Server
• Encryption/Decryption
• Packet Processor

• “CPU-Centric Architecture”
– PowerPC forms Heart of Embedded System
PPC PPC – On & Off-Chip Peripherals
– External Interfaces
• e.g. PCI, 3GIO, Gb Ethernet, ZBT SRAM
– CoreConnect™ On-Chip Bus
• Ties System Together
– Peripherals implemented in FPGA Logic
– Typically Runs Embedded OS

External External
Devices Interfaces

Xilinx
IP-Immersion
Embed multiple IP blocks of arbitrary shape with
high-bandwidth connectivity to FPGA core logic, memory & I/O
Technologies Enabling IP-Immersion
Metal 9
Metal 8
Metal 7
Metal 6
Metal 5
Metal 4
Metal 3
PPC PPC Advanced hard-IP block
Metal 2
(e.g. PowerPC CPU)
Metal 1
Poly

Silicon Substrate

Active Interconnect™
Segmented Routing Metal ‘Headroom’

Xilinx
HW acceleration
Virtex-II Pro
Code Stack (C++)
Concatenated FEC Engine
Control Tasks
Inter- Reed-
PowerPC
PowerPC Viterbi
leaver Solomon
Viterbi Processor
Processor RAM
Interleaver
Reed-Solomon
PowerPC with Application-Specific
Control Tasks Hardware Acceleration

XTREME The Virtex-II Pro Advantage


Processing™
Control Tasks
Traditional Viterbi Interleave Reed-Solomon

Processing time

Xilinx
HW/SW Interfacing
• Provides Specialized Connectivity
Between PowerPC & FPGA Logic
6.4Gb/sec 6.4Gb/sec
• Dual-Port BlockRAM Memory
– CPU & Logic Each Own 1
Timers
Fetch & and
Decode Debug
Port
• High-Bandwidth
Logic

I-Cache
16KB MMU
D-Cache
16KB
– 6.4Gb/sec
Execution Unit
32x32b GPR
• Low-Latency
• Non-Caching
ALU, MAC

– Designed for
6.4Gb/sec 6.4Gb/sec Communications Data
Processing
• Enables PowerPC & FPGA Logic
to Work together on Complex
Problems
Acceleration
BlockRAMs Logic
Xilinx
Creating Complete
Communications Solutions
ftp
telnet
Upper Layers
rlogin
on PowerPC
mail
etc
TCP TCP/IP Stack
on PowerPC
IP Link Layer in

MAC
FPGA Logic
MAC (GbE MAC)
RocketIO is PHY
PHY
(1000Base-SX/LX)
Gb Ethernet
TCP/IP (1000BaseLX/SX/CX)

Xilinx

The MicroBlaze
High Performance Soft CPU

ze
ze

la
la

B
B
UART

ro
ro

ic
ic

M
M
PPC 405
32-Bit RISC
130nm Process Interrupt
300+ MHz Core Controller
420 D MIPS
TM
tm
CoreConnect
PPC 405 Technology
32-Bit RISC
130nm Process
300+ MHz Core Arbiter
420 D MIPS

Local ze

ze
ze

la

la
la

OPB
B

B
B

Bus
ro

ro
ro

ic

ic
ic

M
M

Xilinx
System Exploration in Platform
FPGA

Bus Line
System System

Payload Payload Data Line


Tx
Assembly Qualify Format Coding

System Payload
Interfaces Processing

Payload Payload Data Line


Rx
Buffer Quality Alignment Decoding

Xilinx
Traditional Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx

Payload Payload Data Line Rx


Buffer Quality Alignment Decoding

µP Bus Motorola PowerQUICC


System
U-Bus CPM
RAM

Memory AAL5 G704 G703


Interface Processor Framer LIU
FLASH

EEPROM
Payload
Processor Processing

Other Peripherals
PCI Bus MPC860
System

PCI Bridge CPM = Communications Processor Module


Device
Generic Design

System
Interfaces
Xilinx
Traditional Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx

Payload Payload Data Line Rx


Buffer Quality Alignment Decoding

µP Bus
System Motorola PowerQUICC
U-Bus CPM
RAM

Data
Memory AAL5 G704 G703 Direction
Interface Processor Framer LIU
FLASH

EEPROM
Payload
Processor Processing

Other Peripherals
PCI Bus MPC860
System

PCI Bridge CPM = Communications Processor Module


Device
Generic Design

System
Interfaces

Xilinx
Optimized Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx

Payload Payload Data Line Rx


Buffer Quality Alignment Decoding

µP Bus
System

RAM

Dual Port
MicroB G704 G703
Block
Processor Framer LIU
RAM
FLASH

EEPROM
Memory PowerPC Payload
Interface Processor Processing

Other Peripherals
PCI Bus
System

PCI Bridge Fast I/F


Device FIFO

FPGA Boundary
Generic Design

System
Interfaces

Xilinx
Optimized Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx

Payload Payload Data Line Rx


Buffer Quality Alignment Decoding

µP Bus
System

RAM

Dual Port
MicroB G704 G703
Block Processor Framer LIU
RAM
FLASH

EEPROM
Memory PowerPC Payload
Interface Processor Processing

Other Peripherals
PCI Bus
System

PCI Bridge Fast I/F


Device FIFO

FPGA Boundary
Generic Design

System
Interfaces

Xilinx
Software Radio Architecture
BB / IF
Real/
BB BB
Text Text Representative
RF Aux RF Aux Complex Aux Flow Cntl Aux Flow Cntl Aux Information
Digital/ Flow
Analog Bits Bits
Formats

Multimedia/WAP
I/O
I/O I/O
I/O I/O
I/O I/OI/O I/O
I/O
I I I I Voice/PSTN
I
Channel
Channel Baseband
Baseband Call/Message
Call/Message Data/IP
AIR Antenna
ANTENNA RF/IF
RF/IF Selector/
Selector/ Processing
Processing Processing &
Processing Flow Ctrl
Combiner
Combiner DSP
DSP Routing
&
C C C C Routing NSS/Network

C
C C C C
Clock/Strobe
MONITOR/CONTROL Ref, Power

Common
Remote Control/ Local Control Aux: (Optional) I: Information System
I/O for Antenna Diversity, C: Control/Status Equipment
Display
Adaptive Antenna Control IF: Intermediate Freq
Selective Encryption, etc. NSS: Network
Link Processing Control Switching Ext. Ref
PSTN: Public Service Telephone System
Network BB: Baseband

* Figure reproduced with permission of SDR forum: www.sdrforum.org


Xilinx
Re-invent the Signal Processing
High MIPs processing
Platform
• Heterogeneous platform
in logic fabric
Polyphase • Address complex signal
AD Transform processing systems
AD Demod 1
AD
AD Demod 2 Radio PHY
AD
Demod N MAC (Media Access)
DA TCC MAC
DA PPC405
DA Viterbi
DA - Decision oriented tasks
DA - CORBA
- Java Virtual Machine
- NBAP
50 Ω
- SCA
3.125Gb serial XCITE Connectivity to
network Impedance - Other components
connectivity Impedance Control - Other FPGAs
Controller

Xilinx
Platform-Based Design

Application Space
Application Instance

Algorithm Development
System Level Modeling Control Functions

Platform Instances

Architectural Space

Xilinx
System Generator for DSP
• Visual data flow paradigm
• Polymorphic block libraries
• Arbitrary precision fixed-point
• Bit and cycle true modeling

• Seamlessly integrated with


Simulink and MATLAB
– Test bench and data analysis

• Automatic code generation


– Synthesizable VHDL
– IP cores
– HDL test bench
– Project and constraint files

Xilinx
Heterogenous Implementation

Hardware in the loop


co-simulation

HDL co-simulation

Xilinx
Where We Are Going
FPGA 2005
Process Technology 65 nm, 10 layers Cu
Transistors 1B
Logic Cells 200K
Block RAM 15Mb
IO Speed 10Gb/s
Embedded Processors Many
Embedded DSP Blocks Very Many
Embedded Mixed Signal Blocks Yes
Xilinx
Combining the Best of
FPGA and ASIC

PowerPC
Core Embedded
Special FPGA Core
Functions
Block
Block RAM
RAM

100% Programmable 100% Fixed Logic

Traditional FPGA Market Traditional ASIC Market


• Flexible, but expensive • Inflexible, but highest
performance/integration

Xilinx
Reconfigurable Chips for
Digital Signal Processing
(FPGAs, PLDs, Reconfig. Data Paths)
900
800 2002-2007 CAGR
700 of 26.0%
600
Shipments
500
($M)
400
300
200
100 Source:
Forward Concepts
0 February, 2003
2002 2003 2004 2005 2006 2007

Xilinx
Power Analysis
• Typical design
– 5.9uW/CLB/MHz [FPGA00]
– Fabric power is ~69% of total power
– 2V6000 = 5.9uW/CLB/MHz ⋅ 8448CLBs
⋅ 100MHz ÷ 69% = 7.5W Mult
11%
BRAM
13%

IOB
7%
Fabric
69%

Xilinx
Dynamic Power
• Normalized to 2001
– Best fit is a quadratic trend line
– Predicts 5X by 2007
Dynamic Power

6 1996: 4000EX
1997: 4000XL
5 1998: 4000XV
1999: Virtex
2000: Virtex-E
4
2001: Virtex-II

0
1994 1996 1998 2000 2002 2004 2006 2008

Xilinx
Static Power
• Normalized to 2001
– Best fit is a power trend
– Predicts 100X by 2007
• Future data points projected
Staticusing
Power linear trend for 1/VTH
1000

100

10

1
0 2 4 6 8 10 12 14
0.1

0.01
\
0.001

0.0001
0.00001

0.000001
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007

Xilinx
Platform-Based DSP Design

Application Space
Application Instance
Application MATLAB
Specification

API Platform
Simulink
Architecture
Architecture Platform System Generator
Exploration
Platform
HDL Synthesis
Instances FPGA Implementation
Architectural Space

Goal: Provide a software platform to support the Platform FPGA

Xilinx
Future use model
Interconnect Network on FPGA Real-Time
Operating
System with
iPAQ - ICN Hardware
Application Application
interface
tile
tile 1 tile 2 Support

Swappable
Hardware
tasks
Multimedia
Applications
1
Video
Video
Decoder

2
3D
IMEC Gecko demonstrator Game

Xilinx
Partial Runtime Reconfiguration
FPGA
Configuration
Memory

ICAP

CoreConnect OPB
Control
Logic
PowerPC

Dual-port
Block
RAM

Xilinx
Software Stack
Application Code

EDK
Level 3

Hardware ICAP API Level 2


Independent

Hardware
Emulated ICAP
Dependent Device Drivers
Device Drivers Level 1

ICAP
Level 0
Controller

Embedded Microprocessor External (on Windows/Unix)

Xilinx
Self Reconfiguration Under LINUX

Request & claim


ICAP device driver

Configure &
readback LUTs

Xilinx
Partial Reconfigurability
FPGA Flexibility for the Field

011011
• Re-program part of an FPGA
while it’s still running

Fixed
Logic
Fixed PR PR
Logic Logic Logic

Fixed
Logic

User Definable
Boundaries

Xilinx
JBits 3.0 for Virtex II is
available (FOC)

Xilinx
Market requirements
A mass market for one person…

Business
Web

Video
Games
Multimedia messaging

► FPGA is reconfigurable
Xilinx
Technology That
Will Change People’s Lives
Top Technologies That Will Change Our Lives –
Field programmable chameleon chips ranked #1.
Ahead of cloning! BusinessWeek 50: Masters of Innovation, April 7, 2001

#1 Chameleon chips
#2 Custom Kids
#3 Protein maps
#4 Fractal models
#5 Off-planet production
#6 Nanotechnology
#7 Virtual reality
#8 HIV Antivirals
#9 Optical computing
#10 Ambient Intelligence

Xilinx
Performance Scaling
45
40
35 Gate Delay
30
Delay (ps)

25 Wire Delay (Al)

20
Total Delay (Al)
15
10 Wire Delay
5 (Cu+Low k)
Total Delay
0
(Cu+Low k)
0 200 400 600 800 1000
Line Width (nm)









Xilinx
Architecture Requirements
+ + + + + + + +
+ + + +

store
+ + + + + +
store

+ + + + + + + + + +
+ + + + + + + +

λ λ/2 λ/2

Best design practice to benefit from Moore’s Law


Regular architecture
Parallel architecture
Highly testable Heat/area V /λ
3 3

Distributed memory 2 2
Highly pipelined Interconnect Delay ρ.l /λ
λ

FPGA is future proof


Xilinx
Performance requirements
From sequential to spatial computing
Data In Data In
Reg Reg0 Reg1 Reg2 Reg255

C0 C1 C2 .... C255

Data Out Data Out

►FPGA provides highest MOPS/Watt/$


Xilinx
Alternative Architectures…
• Cell based design
– Deep submicron ends
‘composability’ of design
• Processors
– Not scalable because of data
transfer bottleneck
• Application Specific Standard
Products (ASSP)
– Bull’s eye market
• ‘Structured’ ASICs
– Worst of both worlds

Xilinx
Conclusions
• Today : FPGA’s ride the tide of Moore’s Law
• Future proof architecture
• Opportunities :
– Programmable System Platform
– DSP
• Challenges
– Low Power
– Design technology
– Use model to exploit time dimension
Xilinx
Xilinx

Potrebbero piacerti anche