Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
• Intro
• ASICs buck the tide
• FPGA ride the tide
• Programmable system platform
• Future
Xilinx
Programmable Logic
4
A LookUp Table (LUT)
Xilinx
Add a register to make a
Logic Cell
In 4
Xilinx
Memory and Arithmetic
In 4
Carry M
WE
Din
Xilinx
A Slice …
Xilinx
A Slice: 2 Logic Cells + F(5)
…
Xilinx
A CLB: 4 Slices + …
Xilinx
A CLB: 4 Slices
4 4
4 4
4 4
4 4
Xilinx
A CLB: 4 Slices +
Input/Output
4 4
4 4
40
4 4
4 4
Xilinx
A CLB: 4 Slices +
Input/Output
4 4
4 4
40
4 4
4 4
Xilinx
A CLB: 4 Slices +
Input/Output
4 4
4 4
40
4 4
4 4
Xilinx
A CLB: 4 Slices +
Input/Output
4 4
4 4
40
4 4
4 4
Xilinx
A CLB: 4 Slices +
Input/Output
4 4
4 4
40
4 4
4 4
Xilinx
Add Interconnect
4 4
4 4
40
4 4
4 4
Xilinx
Build an Array
4 4 4 4
4 4 4 4
40 40
4 4 4 4
4 4 4 4
4 4 4 4
4 4 4 4
40 40
4 4 4 4
4 4 4 4
Xilinx
ASICs buck the tide, FPGAs
ride the tide
Xilinx
Moore’s Law
CD CD
Tox, Gate
320nm
Leakage
240nm Gate
160nm
Source Drain
Channel
Substrate Leakage
80nm
1.3 2.7 4.5 6.5 (nm)
Tox
Xilinx
Trend: Line Widths Smaller
Than the Wavelength of Light
0.700
0.600
Process Geometry (micron)
0.500
0.400
0.300
0.200
0.100
-
1988 1990 1992 1994 1996 1998 2000 2002
Xilinx
Painting a one cm line with a
three cm brush…
Courtesy : IBM
Xilinx
Gate Oxide
Polysilicon Gate
Gate Oxide
Silicon crystal
Xilinx
Mask Layers per Mask Set
45
40
35 250
30
180
25
150
20
15 130
10 100 Est.
5
0
# of Mask Layers # of Metal Layers # of OPC Layers # of PSM Layers
/ Set / Set / Set / Set
Xilinx
Mask Set Price Trend vs. Technology
18
16
14
Relative Price
12
10
8
6
4
2
0
250 180 150 130 90
Xilinx
FPGAs in The Forefront of
The Technology Curve
180 nm
Virtex-EM
150 nm
Virtex-II
130 nm
IT R S
Virtex-II Pro Road
m ap
90 nm
Spartan-3
65 nm FPGA
Road
45 nm map
Xilinx
Wafer Starts
Xilinx
Economy of ‘Scale’
Xilinx
A decade of progress
1000x
1000
Virtex-II
(excl. Block RAM)
100x
100
Capacity
Speed
Price Virtex &
Virtex-E
(excl. Block RAM)
XC4000
10x
10
Spartan
1x1
1/91 1/92 1/93 1/94 1/95 1/96 1/97 1/98 1/99 1/00 1/01 1/02 1/03
Year
Xilinx
FPGA/ASIC Crossover Changes
0 m m A S IC s
150nm / 2 0
A s As
P G P G
F m F
m m 0 m
0 0 / 30
/2 9 0 n m
n m
0
15
FPGA FPGA
Cost Advantage
CostFPGA
Advantage
Cost Advantage
ASIC Cost
ASICAdvantage
Cost Advantage
Production Volume
Xilinx
Where are we today
125K 10Mb 556 442 24 4
105K
340
3Mb 168
High-speed 3.125
Gbps Serial
Transceivers
Programmable IO
18 Bit
36 Bit
18 Bit VCCIO
Z
10Mbit Dual-Port™
Z
Z Impedance RAM
>500 DSP datapaths Control
10 Million gates
Xilinx
Design Capability: FPGAs
Meet Most Requirements
52%
25%
20%
23%
13%
13%
11%
20-100K > 1M 50 - 200 > 500
< 20K 100K-1M < 50 200-500
Gates MHz
Source: Gartner Dataquest
Xilinx
FPGA Sweats the Details
Xilinx
Complex ASIC Design
The Shrinking Window of Innovation
Interconnect Power Analysis 3% Design
Transistor Analysis 3% authoring 20%
Simulation 5%
Extraction 5%
Place and Route
17% Synthesis
16%
Floorplanning 5%
Static Timing
Analysis Gate Simulation 7% Simulation
5% 14%
Xilinx
Simpler/Faster Design Flows
ASIC Design and Silicon System Silicon
Flow Spec Verification Prototype Integration Production
Design
Freeze
Design
Freeze
Reduced profit
for latecomers
Time
Time
500
400 400
300
100X
200
100
5 4
2 1.5
2001 2002 2003 2004
FPGA
ASIC
Source: Gartner Group
Xilinx
FPGAs rise to the system-
level design challenge
• PLATFORM • APPLICATIONS
• FPGA • Streaming multimedia
– Logic, Routing & I/O • Network processing
• Software defined radio
• Soft CPUs
• Third+ generation base
• Hard CPUs
stations
• Bus hierarchies • Storage area networks
• Memory hierarchies multipliers
• Gigabit I/Os • DISCIPLINES & DESIGNERS
• RTOS, drivers, network • Systems architects, hardware
protocol stacks, embedded & software engineers, DSP &
real-time s/w communications specialists
Xilinx
Towards programmable platforms
• 32-bit RISC CPU, Harvard Architecture
130nm CMOS with 1.5V Operation
Timers
Fetch & and
Debug
•
Decode
Logic
• 456 Dhrystone MIPS at 300MHz
• 32 x 32-bit General Purpose Registers
I-CachePPC
I-Cache MMU PPCD-Cache
D-Cache • Hardware Multiply / Divide
16KB 16KB
16KB 16KB
• 5-Stage Execution Pipeline
• 16KB D-Cache, 16KB I-Cache
Execution
ExecutionUnit
Unit • Memory Management Unit (MMU)
32x32b GPR
32x32b GPR
ALU, MAC
ALU, MAC
• High-Bandwidth Interface to Logic
• Built-In Hardware Timers
IBM PowerPC™ 405 RISC CPU
• Built-In JTAG Debug and Trace support
3.8 sq mm = 1% of 2VP100
Xilinx
“Low PowerPC”: 0.59mW/MIPS
400
Full-Custom
Full-CustomIBMIBMCPU
CPUDesign
Design
1.5V
1.5V130nm
130nmCMOS
CMOSTechnology
Technology
Low-K
Low-KDielectric
Dielectric
300
IP-Immersion
IP-Immersion
Power (mW)
200 100mW =
1 LED Indicator
100
…or 169 MIPS!
• “CPU-Centric Architecture”
– PowerPC forms Heart of Embedded System
PPC PPC – On & Off-Chip Peripherals
– External Interfaces
• e.g. PCI, 3GIO, Gb Ethernet, ZBT SRAM
– CoreConnect™ On-Chip Bus
• Ties System Together
– Peripherals implemented in FPGA Logic
– Typically Runs Embedded OS
External External
Devices Interfaces
Xilinx
IP-Immersion
Embed multiple IP blocks of arbitrary shape with
high-bandwidth connectivity to FPGA core logic, memory & I/O
Technologies Enabling IP-Immersion
Metal 9
Metal 8
Metal 7
Metal 6
Metal 5
Metal 4
Metal 3
PPC PPC Advanced hard-IP block
Metal 2
(e.g. PowerPC CPU)
Metal 1
Poly
Silicon Substrate
Active Interconnect™
Segmented Routing Metal ‘Headroom’
Xilinx
HW acceleration
Virtex-II Pro
Code Stack (C++)
Concatenated FEC Engine
Control Tasks
Inter- Reed-
PowerPC
PowerPC Viterbi
leaver Solomon
Viterbi Processor
Processor RAM
Interleaver
Reed-Solomon
PowerPC with Application-Specific
Control Tasks Hardware Acceleration
Processing time
Xilinx
HW/SW Interfacing
• Provides Specialized Connectivity
Between PowerPC & FPGA Logic
6.4Gb/sec 6.4Gb/sec
• Dual-Port BlockRAM Memory
– CPU & Logic Each Own 1
Timers
Fetch & and
Decode Debug
Port
• High-Bandwidth
Logic
I-Cache
16KB MMU
D-Cache
16KB
– 6.4Gb/sec
Execution Unit
32x32b GPR
• Low-Latency
• Non-Caching
ALU, MAC
– Designed for
6.4Gb/sec 6.4Gb/sec Communications Data
Processing
• Enables PowerPC & FPGA Logic
to Work together on Complex
Problems
Acceleration
BlockRAMs Logic
Xilinx
Creating Complete
Communications Solutions
ftp
telnet
Upper Layers
rlogin
on PowerPC
mail
etc
TCP TCP/IP Stack
on PowerPC
IP Link Layer in
MAC
FPGA Logic
MAC (GbE MAC)
RocketIO is PHY
PHY
(1000Base-SX/LX)
Gb Ethernet
TCP/IP (1000BaseLX/SX/CX)
Xilinx
™
The MicroBlaze
High Performance Soft CPU
ze
ze
la
la
B
B
UART
ro
ro
ic
ic
M
M
PPC 405
32-Bit RISC
130nm Process Interrupt
300+ MHz Core Controller
420 D MIPS
TM
tm
CoreConnect
PPC 405 Technology
32-Bit RISC
130nm Process
300+ MHz Core Arbiter
420 D MIPS
Local ze
ze
ze
la
la
la
OPB
B
B
B
Bus
ro
ro
ro
ic
ic
ic
M
M
Xilinx
System Exploration in Platform
FPGA
Bus Line
System System
System Payload
Interfaces Processing
Xilinx
Traditional Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx
EEPROM
Payload
Processor Processing
Other Peripherals
PCI Bus MPC860
System
System
Interfaces
Xilinx
Traditional Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx
µP Bus
System Motorola PowerQUICC
U-Bus CPM
RAM
Data
Memory AAL5 G704 G703 Direction
Interface Processor Framer LIU
FLASH
EEPROM
Payload
Processor Processing
Other Peripherals
PCI Bus MPC860
System
System
Interfaces
Xilinx
Optimized Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx
µP Bus
System
RAM
Dual Port
MicroB G704 G703
Block
Processor Framer LIU
RAM
FLASH
EEPROM
Memory PowerPC Payload
Interface Processor Processing
Other Peripherals
PCI Bus
System
FPGA Boundary
Generic Design
System
Interfaces
Xilinx
Optimized Architecture
Payload Payload Data Line
Assembly Qualify Format Coding Tx
µP Bus
System
RAM
Dual Port
MicroB G704 G703
Block Processor Framer LIU
RAM
FLASH
EEPROM
Memory PowerPC Payload
Interface Processor Processing
Other Peripherals
PCI Bus
System
FPGA Boundary
Generic Design
System
Interfaces
Xilinx
Software Radio Architecture
BB / IF
Real/
BB BB
Text Text Representative
RF Aux RF Aux Complex Aux Flow Cntl Aux Flow Cntl Aux Information
Digital/ Flow
Analog Bits Bits
Formats
Multimedia/WAP
I/O
I/O I/O
I/O I/O
I/O I/OI/O I/O
I/O
I I I I Voice/PSTN
I
Channel
Channel Baseband
Baseband Call/Message
Call/Message Data/IP
AIR Antenna
ANTENNA RF/IF
RF/IF Selector/
Selector/ Processing
Processing Processing &
Processing Flow Ctrl
Combiner
Combiner DSP
DSP Routing
&
C C C C Routing NSS/Network
C
C C C C
Clock/Strobe
MONITOR/CONTROL Ref, Power
Common
Remote Control/ Local Control Aux: (Optional) I: Information System
I/O for Antenna Diversity, C: Control/Status Equipment
Display
Adaptive Antenna Control IF: Intermediate Freq
Selective Encryption, etc. NSS: Network
Link Processing Control Switching Ext. Ref
PSTN: Public Service Telephone System
Network BB: Baseband
Xilinx
Platform-Based Design
Application Space
Application Instance
Algorithm Development
System Level Modeling Control Functions
Platform Instances
Architectural Space
Xilinx
System Generator for DSP
• Visual data flow paradigm
• Polymorphic block libraries
• Arbitrary precision fixed-point
• Bit and cycle true modeling
Xilinx
Heterogenous Implementation
HDL co-simulation
Xilinx
Where We Are Going
FPGA 2005
Process Technology 65 nm, 10 layers Cu
Transistors 1B
Logic Cells 200K
Block RAM 15Mb
IO Speed 10Gb/s
Embedded Processors Many
Embedded DSP Blocks Very Many
Embedded Mixed Signal Blocks Yes
Xilinx
Combining the Best of
FPGA and ASIC
PowerPC
Core Embedded
Special FPGA Core
Functions
Block
Block RAM
RAM
Xilinx
Reconfigurable Chips for
Digital Signal Processing
(FPGAs, PLDs, Reconfig. Data Paths)
900
800 2002-2007 CAGR
700 of 26.0%
600
Shipments
500
($M)
400
300
200
100 Source:
Forward Concepts
0 February, 2003
2002 2003 2004 2005 2006 2007
Xilinx
Power Analysis
• Typical design
– 5.9uW/CLB/MHz [FPGA00]
– Fabric power is ~69% of total power
– 2V6000 = 5.9uW/CLB/MHz ⋅ 8448CLBs
⋅ 100MHz ÷ 69% = 7.5W Mult
11%
BRAM
13%
IOB
7%
Fabric
69%
Xilinx
Dynamic Power
• Normalized to 2001
– Best fit is a quadratic trend line
– Predicts 5X by 2007
Dynamic Power
6 1996: 4000EX
1997: 4000XL
5 1998: 4000XV
1999: Virtex
2000: Virtex-E
4
2001: Virtex-II
0
1994 1996 1998 2000 2002 2004 2006 2008
Xilinx
Static Power
• Normalized to 2001
– Best fit is a power trend
– Predicts 100X by 2007
• Future data points projected
Staticusing
Power linear trend for 1/VTH
1000
100
10
1
0 2 4 6 8 10 12 14
0.1
0.01
\
0.001
0.0001
0.00001
0.000001
1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Xilinx
Platform-Based DSP Design
Application Space
Application Instance
Application MATLAB
Specification
API Platform
Simulink
Architecture
Architecture Platform System Generator
Exploration
Platform
HDL Synthesis
Instances FPGA Implementation
Architectural Space
Xilinx
Future use model
Interconnect Network on FPGA Real-Time
Operating
System with
iPAQ - ICN Hardware
Application Application
interface
tile
tile 1 tile 2 Support
Swappable
Hardware
tasks
Multimedia
Applications
1
Video
Video
Decoder
2
3D
IMEC Gecko demonstrator Game
Xilinx
Partial Runtime Reconfiguration
FPGA
Configuration
Memory
ICAP
CoreConnect OPB
Control
Logic
PowerPC
Dual-port
Block
RAM
Xilinx
Software Stack
Application Code
EDK
Level 3
Hardware
Emulated ICAP
Dependent Device Drivers
Device Drivers Level 1
ICAP
Level 0
Controller
Xilinx
Self Reconfiguration Under LINUX
Configure &
readback LUTs
Xilinx
Partial Reconfigurability
FPGA Flexibility for the Field
011011
• Re-program part of an FPGA
while it’s still running
Fixed
Logic
Fixed PR PR
Logic Logic Logic
Fixed
Logic
User Definable
Boundaries
Xilinx
JBits 3.0 for Virtex II is
available (FOC)
Xilinx
Market requirements
A mass market for one person…
Business
Web
Video
Games
Multimedia messaging
► FPGA is reconfigurable
Xilinx
Technology That
Will Change People’s Lives
Top Technologies That Will Change Our Lives –
Field programmable chameleon chips ranked #1.
Ahead of cloning! BusinessWeek 50: Masters of Innovation, April 7, 2001
#1 Chameleon chips
#2 Custom Kids
#3 Protein maps
#4 Fractal models
#5 Off-planet production
#6 Nanotechnology
#7 Virtual reality
#8 HIV Antivirals
#9 Optical computing
#10 Ambient Intelligence
Xilinx
Performance Scaling
45
40
35 Gate Delay
30
Delay (ps)
20
Total Delay (Al)
15
10 Wire Delay
5 (Cu+Low k)
Total Delay
0
(Cu+Low k)
0 200 400 600 800 1000
Line Width (nm)
Xilinx
Architecture Requirements
+ + + + + + + +
+ + + +
store
+ + + + + +
store
+ + + + + + + + + +
+ + + + + + + +
λ λ/2 λ/2
Distributed memory 2 2
Highly pipelined Interconnect Delay ρ.l /λ
λ
C0 C1 C2 .... C255
Xilinx
Conclusions
• Today : FPGA’s ride the tide of Moore’s Law
• Future proof architecture
• Opportunities :
– Programmable System Platform
– DSP
• Challenges
– Low Power
– Design technology
– Use model to exploit time dimension
Xilinx
Xilinx