Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
EBV & Xilinx
Design flow: RTL -> HLS -> SDSoC
FPGA basics
– LUT’s & Routing
– Clocking, Memory and DSP resources
C++
Library
Spend less time on the standard connectivity Vivado HLS
AXI I/F
– IPI: configure & generate a platform on a custom board
Page 11
Verified RTL
Page 12
Page 12 © Copyright 2016 Xilinx
.
Accelerate Functions in Hardware with SDSoC™
(Zynq)
No RTL Design Required
Processing System
ARM® Core
1 System-Level Profiling
C/C++ C/C++ C/C++
Input Output
Total Power
Ratio of Logic to Memory Optimized for Area and Power
Reduction
Page 20
© Copyright 2016 Xilinx
.
Configurable Logic Block
LUT5 LUT3
Page 22
© Copyright 2016 Xilinx
.
32-bit Shift Register in One LUT
LUT
Versatile SRL-type shift registers
– Variable-length shift register D
32-bit Shift register Q 31
– Synchronous FIFOs CLK
– Content-addressable memory (CAM)
32
– Pattern generator
– Compensate for delay / latency MUX
A
5
Shift register length is determined by Qn
the address
– Constant value giving fixed delay line SRL Configurations
in one Slice (4 LUTs)
– Dynamic addressing for elastic buffer
16x1, 16x2, 16x4, 16x6, 16x8
128x1
Page 23
© Copyright 2016 Xilinx
.
Arithmetic Carry
Two carry chains per CLB
Carry Out Carry Out – Running South to North for fast
arithmetic addition and subtraction
– Each Slice in a column has
CLB
independent carry chain
Slice Carry lookahead
– Combinatorial carry lookahead
over the four LUTs in a slice
– Improves arithmetic performance
Slice
Carry chains are cascadable
– Carry-Out connects to adjacent
Carry-In in the same column
Carry chain uses
– Carry chain used as large AND/OR
gates and decoders to reduce logic
and improve performance
Carry In Carry In
Page 24
© Copyright 2016 Xilinx
.
Fast Interconnect Between CLBs
Inter-
CLB connect CLB
Page 25
© Copyright 2016 Xilinx
.
Length of Routing Resources
Page 26
© Copyright 2016 Xilinx
.
Using Logic and Interconnect Resources
All logic resources can be inferred
– Xilinx and third party synthesis tools
Design FPGA
... Slice
Process (Clock)
begin
lutout1 <= not (input1 and input2 and input3 and input4 and common);
lutout2 <= (input5 or input6 or input7 or common);
LUT
if rising_edge (Clock) then
flop1 <= lutout1;
end if;
End process;
...
DRAM
DRAM
RAM/SRL 32 • SDRAM
• DDR SDRAM
• FCRAM
SRAM
• RLDRAM
LOGIC BRAM/FIFO
7 series SRAM
• Sync SRAM
FPGA FLASH
• DDR SRAM
• ZBT
• QDR
FLASH
EEPROM
EEPROM
Using LUTs as storage elements Dedicated internal memory arrays Ability to interface to external memory
• Very granular, localized memory • Efficient, on-chip blocks •Memory-controller cores
• Minimal impact on logic routing • Flexible + optional FIFO logic •Cost-effective bulk storage
• Great for small FIFOs • Ideal for mid-sized buffering •For large memory requirements
Granularity Capacity
Page 29
© Copyright 2016 Xilinx
.
Distributed RAM
Page 31
© Copyright 2016 Xilinx
.
36-Kbit Block RAM
Page 32
© Copyright 2016 Xilinx
.
Block RAM Configurations
Page 33
© Copyright 2016 Xilinx
.
Dual Port Block RAM Configurations
Page 34
© Copyright 2016 Xilinx
.
Many BRAM Configurations
Page 35
© Copyright 2016 Xilinx
.
FPGA clock managment & generation
25 Rows
25 Rows
I1
O
I0
MMCM PLL
Functionally similar to Virtex-6 MMCM Spartan-6 PLL/Virtex-6 MMCM features
Seven output counters plus feedback Six output counters plus feedback
Powerdown mode Powerdown mode
Input Clock Switching Input Clock Switching
Fractional Divide on OUT0 and FBOUT
Dynamic Phase Shift
True and Complement outputs (O0-O3)
Spread Spectrum Clock Generation
Lock Detect Lock Detect
Lock Monitor
Lock Lock Monitor Lock
CLKIN1 9 CLKIN1 8
D PFD O0 D PFD
CLKIN2 CLKIN2 O0
Charge Pump Charge Pump
Loop Filter
VCO VCO
Loop Filter
CLKINFB CLKINFB
O1 O1
O2 O2
O3 O3
O4 O4
O5 O5
O6
M CLKFBOUT M CLKFBOUT
Reg
Data In
Reg
Reg
Reg
Coefficients X C0
X C1 C0
X C2
X C3
X …C199 X
Single-MAC Unit
200 clock
cycles
+ +
needed 200 operations
Reg in 1 clock cycle
Data Out
Data Out
B
25x18 48-Bit Accum
DSP48 Tile Pre-Add +
A X - P
Interconnect
DSP48E1
Slice +/-
DSP48E1 D =
Slice
Pattern Detector
C
High Speed
Application
Peripherals
Processor Key Interfaces
64-bit Quad-Core
Application 32-bit Xilinx MicroBlaze 32-bit ARM® Cortex™-A9 64-bit ARM Cortex-A53
Processing Unit Up to 220MHz* @ 1.4DMIPS/MHz Up to 1GHz @ 2.5DMIPS/MHz Up to 1.5GHz @ 2.3DMIPS/MHz
Real-Time 32-bit Xilinx MicroBlaze 32-bit Xilinx MicroBlaze Dual-core ARM Cortex-R5
Processing Unit Up to 220MHz* @ 1.3DMIPS/MHz Up to 220MHz* @ 1.3DMIPS/MHz Up to 600MHz @ 1.6DMIPS/MHz
External DDR
Flexible DDR3, DDR3L, DDR2, LPDDR2 DDR4, LPDDR4, DDR3, DDR3L, LPDDR3
Interface
50
Migrate
Next Gen Design
With Same Tools & Code
Processing Capability
Quad
A53
Migrate Quad
Next Gen Design A53
With Same Tools
Dual
A53 VCU
GPU GPU
Dual
Dual A9 Dual R5 Dual R5 Dual R5
Single A9
MicroBlaze MicroBlaze A9
UltraScale+ UltraScale+ UltraScale+
MicroBlaze Kintex-7
Spartan-7 Artix-7 Artix-7 Artix-7 FPGA Logic FPGA Fabric FPGA Logic
FPGA Logic
Spartan-6 FPGA FPGA FPGA Logic FPGA Logic
FPGA
5
4X
Relative to 28nm
4
3X
3 2.4X
2.1
1.7 1.7
2 1.5
1 1 1 1
0
Logic Fabric Serial Bandwidth DSP Bandwidth On-Chip Memory
Performance/Watt
Enhanced Fabric with Up to 128 transceivers at ~12,000 DSP slices UltraRAM for SRAM
FinFET performance up to 32.75 Gb/s running at ~900 MHz device replacement
task2
....
ARM Cortex-A9 Programmable Logic
for Application Processing for Deterministic Processing
+ +
Compute-IntensiveTasks
Tasks Tasks
Intensive Tasks
Linux RTOS
Offloading CortexTMTM-A9
Cortex -A9 UART
(Single or Dual-Core) Access to On-Chip Memory (OCM)
• Housekeeping Chores (Single or Dual-Core) SDIO
Drag, Drop,
IP Catalog and Customize
Embedded Block RAM See it on1
MicroBlaze
MicroBlaze MCS
MicroBlaze1 MicroBlaze2 . . . MicroBlazeN
Multiple instantiations as
needed
1: “Zynq & MicroBlaze IOP Block, OCM & Memory Resource Sharing”
56
Xilinx IP
HLS IP C/C++
…
Custom IP
Xilinx SDK Ecosystem
…
SDSoC
Complete C/C++ Environment
Design entirely in C/C++ for Zynq
57
FREE TOOL
WEBPACK DOWNLOAD
FREE TOOL
Windows or Linux hosted Eclipse-based IDE WEBPACK DOWNLOAD
eSol eT-kernel
GreenHills – INTEGRITY
Lynx - LynxOS7
QNX - Neutrino
Sysgo – PikeOS
expresslogic - ThreadX
Mentor Graphics
Hypervisor
Wind River
Open Source - Xen
Lynx - LynxSecure
Page 60
Page 61
ARM Cortex-A9 +
Device Name Z-7007S Z-7012S Z-7014S Part Number XC7S6 XC7S15 XC7S25
7 Series PL Equivalent Artix®-7 Artix-7 Artix-7 Logic Cells 6,000 12,800 23,360
Logic Cells 23K 55K 65K
Vterm
x16 x4 SDIO
OSC
24MHz TYPE A
Arduino-style
Headers USB 2.0 USB
ULPI
PHY Peripherals
1.35V
1.0V
1.8V
3.3V
33.33MHz
Page 64
© Copyright 2016 Xilinx
.
Application Development
DNN
CNN
GoogLeNet
Algorithm Development
SSD
FCN …
Platform Development
Page 65
© Copyright 2016 Xilinx
.
Removing the Barrier to Broad Adoption: reVISION Stack
20% Xilinx/80% User
ML Apps
OpenCV
“A subsystem design used
Apps to take 3 weeks. I’ve done
Development Time
Algorithm
to RTL
System
Integration SDSoC
C/C++
Bitstream
Generation Ease of Use
ML Apps
OpenCV
“A subsystem design used
Apps to take 3 weeks. I’ve done
Development Time
Algorithm
80% Xilinx/20% User
to RTL
“reVISION will shorten our development cycle for new
products and upgrades by up to 12 months.”
System - System Architect
Integration SDSoC
C/C++
Bitstream
Generation Ease of Use