Sei sulla pagina 1di 71

FPGA Development for C/C++ Coders

using High-Level Synthesis


By Karl de Boois - EBV

Page 1 © Copyright 2016 Xilinx


.
Agenda

Introduction
EBV & Xilinx
Design flow: RTL -> HLS -> SDSoC
FPGA basics
– LUT’s & Routing
– Clocking, Memory and DSP resources

Product portfolio & demo

Page 2 © Copyright 2016 Xilinx


.
Introduction - Karl de Boois

Hardware Design Engineer since 1991


Field Application Engineer for EBV Elektronik since 2002

Some personal facts:

Love to travel the world with my girlfriend.


Play video games in my mancave.
Mancave starts at the front door.

Page 3 © Copyright 2016 Xilinx


.
Page 4 © Copyright 2016 Xilinx
.
Page 5 © Copyright 2016 Xilinx
.
Design flow: RTL -> HLS -> SDSoC

Page 6 © Copyright 2016 Xilinx


.
Traditional FPGA design in Verilog
Not a programming language but a Hardware Description
Language
What about VHDL ?

Page 7 © Copyright 2016 Xilinx


.
The battle of the HDL’s – VHDL vs Verilog

Page 8 © Copyright 2016 Xilinx


.
The (ideal) RTL design flow for FPGA’s

Page 9 © Copyright 2016 Xilinx


.
Vivado High-Level Synthesis: Accelerated IP
Development and Design Space Exploration
Comprehensive coverage
– C/C++/SystemC
– Arbitrary precision
– Floating-point
Accelerated verification
– 2 to 3 orders of magnitude faster
than RTL for larger design
Fast compilation and design
exploration
– Algorithm feasibility
– Architecture Iteration
Customer proven results

Page 10 © Copyright 2016 Xilinx


.
Accelerating Design Productivity with HLx & SDx

SystemC, C++, OpenCL


Productivity HLx SDx
 Separate platform design from differentiated logic HW flow SW flow
Open
– Let application designers focus on the differentiated logic Source

C++
Library
 Spend less time on the standard connectivity Vivado HLS

AXI I/F
– IPI: configure & generate a platform on a custom board

 Spend more time on the differentiated logic


IP Integrator

– HLS: enabling core technology: C/C++/OpenCL synthesis Platform RTL IP


awareness
SDK
– HLx: IP design (HLS/SysGen) + connectivity platform (IP RTL

Integrator) Vivado Synthesis,


Implementation

Page 11

© Copyright 2016 Xilinx


.
Vivado HLS Accelerates Verification Productivity
RTL-Based Approach C-based Approach

Hours per iteration Seconds per iteration


Functional Functional
Verification C Verification with C
RTL Compiler
with HDL
simulation

Verified RTL

2 to 3 orders of magnitude faster than


RTL Final
RTL for large designs Validation

– RTL verification becomes final check


Verified RTL

Functional Verification Video Example – 10 Frames of Video Data Final Verification


RTL RTL = ~2 Days RTL

Vivado HLS C = 10 Sec RTL

Page 12
Page 12 © Copyright 2016 Xilinx
.
Accelerate Functions in Hardware with SDSoC™
(Zynq)
No RTL Design Required

Processing System
ARM® Core
1 System-Level Profiling
C/C++ C/C++ C/C++
Input Output

2 Toggle SW-HW Partitioning Toggle HW/SW [S]


[H]

System Optimizing Compiler


3
Programmable Logic

Takes C code and creates all hardware processing blocks


and connections to make the software work

Page 13 © Copyright 2016 Xilinx


.
Using the “Hello World”of SDSoC to introduce
the FPGA, the Floating Point Matrix Multiplier.

Page 14 © Copyright 2016 Xilinx


.
Floating Point Matrix Multiplier example

Page 15 © Copyright 2016 Xilinx


.
Using the “Hello World”of SDSoC to introduce
the FPGA, the Floating Point Matrix Multiplier.

Page 16 © Copyright 2016 Xilinx


.
Using the “Hello World”of SDSoC to introduce
the FPGA, the Floating Point Matrix Multiplier.

Page 17 © Copyright 2016 Xilinx


.
FPGA basics

Page 18 © Copyright 2016 Xilinx


.
Logic (FF and LUT) & Interconnect

Page 19 © Copyright 2016 Xilinx


.
Slices in 7 Series FPGAs

A Configurable Logic Block (CLB) Slice_L


contains two slices
Slice_L
– All 7 series FPGAs use the same CLB
structure
Two types of slice
– SLICE_L: Logic-only CLB_LL
– SLICE_M: Memory-capable
Slice_L
(Logic + RAM / Shift Register)
– No SLICE_X (present in Spartan-6 FPGAs) Slice_M
Not every CLB needs memory
– CLB_LL: two SLICE_Ls
– CLB_LM: SLICE_L + SLICE_M CLB_LM

Total Power
 Ratio of Logic to Memory Optimized for Area and Power
Reduction
Page 20
© Copyright 2016 Xilinx
.
Configurable Logic Block

Architecture consistent with Virtex-6


SLICE
and Spartan-6
LUT – Two side-by-side slices per CLB
– Four 6-input Look Up Tables (LUTs) per
SLICE
slice
LUT
– Two flip-flops per LUT
– MUXF7 and MUXF8 for creating larger
logic structures

Ease of design migration


– Designs can be migrated easily from
Spartan-6 FPGAs and Virtex-6 FPGAs to
7 series FPGAs
– Designs can be migrated between 7
series families as user’s requirements
change
CLB
Page 21
© Copyright 2016 Xilinx
.
Logic Resource Uses

LUT6 LUT5 LUT3

LUT5 LUT3

Six input Shared five Split logic


logic function input logic function
function
64x1 SRL32
RAM

Distributed Shift register


memory logic

Page 22
© Copyright 2016 Xilinx
.
32-bit Shift Register in One LUT

LUT
Versatile SRL-type shift registers
– Variable-length shift register D
32-bit Shift register Q 31
– Synchronous FIFOs CLK
– Content-addressable memory (CAM)
32
– Pattern generator
– Compensate for delay / latency MUX
A
5
Shift register length is determined by Qn
the address
– Constant value giving fixed delay line SRL Configurations
in one Slice (4 LUTs)
– Dynamic addressing for elastic buffer
16x1, 16x2, 16x4, 16x6, 16x8

32x1, 32x2, 32x3, 32x4


Cascade up to 128x1 shift register in
64x1, 64x2
one slice
96x1

128x1

Page 23
© Copyright 2016 Xilinx
.
Arithmetic Carry
Two carry chains per CLB
Carry Out Carry Out – Running South to North for fast
arithmetic addition and subtraction
– Each Slice in a column has
CLB
independent carry chain
Slice Carry lookahead
– Combinatorial carry lookahead
over the four LUTs in a slice
– Improves arithmetic performance
Slice
Carry chains are cascadable
– Carry-Out connects to adjacent
Carry-In in the same column
Carry chain uses
– Carry chain used as large AND/OR
gates and decoders to reduce logic
and improve performance
Carry In Carry In
Page 24
© Copyright 2016 Xilinx
.
Fast Interconnect Between CLBs
Inter-
CLB connect CLB

CLBs are distributed in columns of tiles


– Separated by two back to back interconnect
columns
Back to back interconnect results in
metal area saving
– Reducing metal area lowers power
consumption and overall device cost
– Fast connection between CLBs in either side
of interconnect pair
Only clock signals shared between two
columns
– Data signals feed to either left or right CLB
column

Page 25
© Copyright 2016 Xilinx
.
Length of Routing Resources

Different length interconnect


resources
– Single lines connect adjacent tiles
– Double lines connect two tiles
away
– Quad lines connect four tiles away
– Hex lines connect six tiles away
– Long lines connect between 12
and 18 tiles away
Quantity of routing resources
increased over Virtex-6 FPGAs
and Spartan-6 FPGAs
– To avoid encountering routing
congestion

Page 26
© Copyright 2016 Xilinx
.
Using Logic and Interconnect Resources
All logic resources can be inferred
– Xilinx and third party synthesis tools

Software intelligently packs logic


– Placer is aware of the lengths of available routing resources

Design FPGA
... Slice
Process (Clock)
begin
lutout1 <= not (input1 and input2 and input3 and input4 and common);
lutout2 <= (input5 or input6 or input7 or common);
LUT
if rising_edge (Clock) then
flop1 <= lutout1;
end if;

if falling_edge (Clock) then


flop2 <= lutout2; LUT
end if;

End process;
...

Placer software automatically chooses the best routing resources


– The fewer resources used, the quicker the connection, the faster the design
– A single routing resource will be used if possible
– Otherwise, multiple routing resources will be interconnected
Page 27
© Copyright 2016 Xilinx
.
Embedded Memory (BRAM)

Page 28 © Copyright 2016 Xilinx


.
7 Series FPGAs Memory Hierarchy

Distributed RAM/SRL32 On-chip BRAM/FIFO Fast Memory Interfaces

DRAM
DRAM
RAM/SRL 32 • SDRAM
• DDR SDRAM
• FCRAM
SRAM
• RLDRAM

LOGIC BRAM/FIFO
7 series SRAM
• Sync SRAM
FPGA FLASH
• DDR SRAM
• ZBT
• QDR
FLASH
EEPROM
EEPROM

Using LUTs as storage elements Dedicated internal memory arrays Ability to interface to external memory
• Very granular, localized memory • Efficient, on-chip blocks •Memory-controller cores
• Minimal impact on logic routing • Flexible + optional FIFO logic •Cost-effective bulk storage
• Great for small FIFOs • Ideal for mid-sized buffering •For large memory requirements

Granularity Capacity

Page 29
© Copyright 2016 Xilinx
.
Distributed RAM

Distributed memory Capability of single SLICE_M


– Each LUT can be 64-bit memory Simple
– Inherently single-port, Single Dual Quad
Dual
but can be made dual-port, multi-port Port Port Port
Port
Ideal for small and fast memories 32x2 32x2D 32x6SDP 32x2Q
– coefficient storage, 32x4 32x4D 64x3SDP 64x1Q
– small data buffers, 32x6 64x1D
– small state machines, 32x8 64x2D
– small FIFOs 64x1 128x1D
– shift registers 64x2
– etc. 64x3
Adjacent LUTs can be cascaded 64x4
– Up to 256x1-bit single port memory or 128x1
64x1bit quad port memory in a single slice 128x2
256x1

Programmable  Flexible, Efficient Implementation of Common Functions


System Integration
Page 30
© Copyright 2016 Xilinx
.
Quad-Port Memory in One SLICE_M

Write Port Read Port


Write port:
Read address = write address – Four LUT6s can
LUT Associated data share the write
Common Independent read address address and data
write address LUT Associated data
and
write data Independent read address Read ports:
LUT Associated data
– Three
Independent read address independent
LUT Associated data read operations

Page 31
© Copyright 2016 Xilinx
.
36-Kbit Block RAM

36K/18K block RAM


– All Xilinx 7 series FPGA families use same
block RAM as Virtex-6 FIFO
or
Two independent ports address common Dual-Port
data BRAM
– Individual address, clock, write enable, clock
enable
– Independent widths for each port
Integrated control for fast and efficient
FIFOs
Integrated 64 / 72-bit Hamming error
correction

Page 32
© Copyright 2016 Xilinx
.
Block RAM Configurations

36K/18K block RAM


– Single 36K block or two independent 18K
blocks Addr A
Port A
– 32k x 1 to 512 x 72 in one 36K block RAM 36 36
Wdata A Rdata A
Configurations similar to Virtex-6
36Kb
– Single Port, Simple Dual Port and True Memory
Dual Port configurations Array
Addr B
– Integrated cascade logic creates 64k x 1 Port B
from two 32k x 1 blocks 36 36
Wdata B Rdata B
– Byte-write enable
– Software controlled power down of
unused block RAM sites

Page 33
© Copyright 2016 Xilinx
.
Dual Port Block RAM Configurations

True dual port – unrestricted flexibility


– Simultaneous or independent read and write operations Port A
port A and port B
– Each port has its own clock, enable, write enable
Port B
– Every write also performs a read operation
• Read before Write, Write before Read, or No Change
– Simultaneous read + write or write + write to the same
location can cause data corruption. User is responsible
Simple Dual Port – allows widest implementation
Read
– One read port and one write port Port

– Natural structure for FIFOs


– 72-bit data on one or both 36K ports Write
Port
– 36 bits width for 18K BRAM
– This doubles the memory bandwidth per block

Page 34
© Copyright 2016 Xilinx
.
Many BRAM Configurations

Each 18K Each 36K

16Kx1, 8Kx2, 32Kx1, 16Kx2,


Two fully independent
True dual-port 4Kx4, 2Kx9, 8Kx4, 4Kx9,
read and write operations
1Kx18 2Kx18, 1Kx36

16Kx1, 8Kx2, 16Kx2, 8Kx4,


1 read & 1 write port
Simple dual-port 4Kx4, 2Kx9, 4Kx9, 2Kx18,
Read AND write in 1 cycle
1Kx18, 512x36 1Kx36, 512x72

16Kx1, 8Kx2, 16Kx2, 8Kx4,


1 read & 1 write port
Single-port 4Kx4, 2Kx9, 4Kx9, 2Kx18,
Read OR write in 1 cycle
1Kx18, 512x36 1Kx36, 512x72

Page 35
© Copyright 2016 Xilinx
.
FPGA clock managment & generation

Page 36 © Copyright 2016 Xilinx


.
Xilinx 7 Series FPGA Layout
I/O Columns
Clock Management Columns
Clock Routing
CLB, BRAM, DSP Columns
GT Columns
Similar floorplan to Virtex-6 FPGA
– Provides easy migration to 7 series
FPGAs
CMT columns adjacent to I/O
columns
– Support for high performance interfaces
One I/O column per half device
– Uniform skew from center of device
All resources optimized for low
power

Increased System  FPGA Layout Optimized for High Performance Memory


Performance Interfaces

Page 37 © Copyright 2016 Xilinx


.
Clock Region

25 Rows

25 Rows

All 7 series FPGAs split into Every clock region is 50 rows of


uniform height clock regions CLBs tall
– Each region has its own – 25 rows above and 25 rows below the
resources central horizontal clocking row (HROW)
– All regions can share the
All clock regions span from global vertical
available global resources
clock column to the left or right edge of
the device
Page 38 © Copyright 2016 Xilinx
.
BUFG (Global Clock Buffer) IGNORE1
BUFG(CTRL)

For Driving the Global Clock Spine CE1


S1

I1
O
I0

Global buffer for distributing clock signals S0


across the height of the device CE0
IGNORE0

32 BUFG per device located in the center of


the vertical clock spine

16 BUFG driven by resources in north, 16


driven by south

Same primitive as previous generations

Glitch-less switching between clock sources

Clock Enable for clock gating


Page 39 © Copyright 2016 Xilinx
.
MMCM and PLL

MMCM PLL
Functionally similar to Virtex-6 MMCM Spartan-6 PLL/Virtex-6 MMCM features
Seven output counters plus feedback Six output counters plus feedback
Powerdown mode Powerdown mode
Input Clock Switching Input Clock Switching
Fractional Divide on OUT0 and FBOUT
Dynamic Phase Shift
True and Complement outputs (O0-O3)
Spread Spectrum Clock Generation
Lock Detect Lock Detect
Lock Monitor
Lock Lock Monitor Lock

CLKIN1 9 CLKIN1 8
D PFD O0 D PFD
CLKIN2 CLKIN2 O0
Charge Pump Charge Pump
Loop Filter
VCO VCO
Loop Filter
CLKINFB CLKINFB
O1 O1

O2 O2

O3 O3

O4 O4

O5 O5

O6

M CLKFBOUT M CLKFBOUT

Page 40 © Copyright 2016 Xilinx


.
MMCM and PLL Features

Frequency Synthesis Clock Frequency Synthesis


– Fout = Fin * M / (D*O)
– One M and one D value per MMCM or PLL
– Each MMCM and PLL output can have its own O value
• M: 1…64; D: 1…80; O: 1…128

Fractional Divide – MMCM Only


– Ability to configure O0 and CLKFBOUT as a counter with
1/8th granularity (e.g. 2.125, 2.250, 2.375 etc.)

Phase Shift Dynamic Phase Shift – MMCM Only


0 – Phase Shift port to change the phase real time in
45
increments of 1/56 of VCO period
90
135
True and Complement outputs (O0-O3) – MMCM
180
225 Only
270 – Negative polarity clock for easy generation of phase
315
matched inverted clock and migration of DCM designs

Page 41 © Copyright 2016 Xilinx


.
FPGA DSP resources

Page 42 © Copyright 2016 Xilinx


.
Massively Parallel Signal Processing

Standard DSP processor –


Sequential FPGA - Fully Parallel Implementation
(generic DSP) (Virtex-7 FPGA)
Data In

Reg
Data In

Reg

Reg

Reg
Coefficients X C0
X C1 C0
X C2
X C3
X …C199 X
Single-MAC Unit
200 clock
cycles
+ +
needed 200 operations
Reg in 1 clock cycle
Data Out
Data Out

1.2 GHz 741 MHz


= 6.0 MSPS = 741 MSPS
200 clock cycles 1 clock cycle

Page 43 © Copyright 2016 Xilinx


.
DSP Slice Features

7 series FPGAs DSP slice 100% based on Virtex-6 FPGA DSP48E1


– 25x18 multiplier
– 25-bit pre-adder
– Flexible pipeline
– Cascade in and out
– Carry in and out
– 96-bit MACC
– SIMD support
– 48-bit ALU
– Pattern detect
– 17-bit shifter
– Dynamic operation (cycle by cycle)

Page 44 © Copyright 2016 Xilinx


.
7 Series FPGAs DSP Architecture
Based on proven Virtex-6 DSP48E1 design
Adder-chain implementation
– No performance degradation or slow-down when
using pre- and post-adders.
– Consumes zero logic, seamless cascading of
DSP48E1 slices
– Filter speed is optimized if the number of taps fits
within DSP column height
High-precision, high-bandwidth operation
– 25x18 input resolution , 48-bit output resolution
– Up to 5,335 GMAC/s (symmetrical filter
implementation in XC7VX690T)
Cycle-by-cycle operation
– time-sharing DSP slices with multiple data streams,
processed and stored in SRL16s
Single-Instruction, multiple-data operation and
pattern detection
– Fully compatible with Virtex-6 IP
Page 45 © Copyright 2016 Xilinx
.
Pre-adder and Pipelines

Pre-adder and D pipeline with 2 new


registers.
– Doubles the efficiency of symmetrical filters and
convolutions
Fine-grain access to the A and B pipelines
– Optimizes the implementation of certain
algorithms, like short FFTs, sequential complex
multiplications, etc…
Enhanced control of the paths to the post-
adder
and to the multiplier.
– Easier pipeline balancing, higher operation
frequency

Page 46 © Copyright 2016 Xilinx


.
DSP Performance through the DSP48E1 Slice
Virtex-6, Spartan-7, Artix-7, Kintex-7, Virtex-7
DSP48E1 Slice

B
25x18 48-Bit Accum
DSP48 Tile Pre-Add +
A X - P
Interconnect

DSP48E1
Slice +/-
DSP48E1 D =
Slice
Pattern Detector
C

2 DSP48E1 Slices / Tile


Input Flexibility through 5 Shared interconnect
741 MHz Fmax

Page 47 © Copyright 2016 Xilinx


.
Zynq® UltraScale+ MPSoC System Features
Memory
Subsystem
High Bandwidth
Real-Time Graphics
Low Latency
Processors Processor
32-bit Dual-Core ARM Mali-400MP2

High Speed
Application
Peripherals
Processor Key Interfaces
64-bit Quad-Core

Fabric Acceleration Video Codec


Customizable Engines 8K4K (15fps)
High Speed Connectivity 4K2K (60fps)

Platform & Power Configuration &


Management Security Unit
Granular Power Control Anti-Tamper & Trust
Functional Safety Industry Standards

© Copyright 2016 Xilinx


.
New & Enhanced UltraScale+™ Capabilities
New at 16nm New at 16nm

FinFET Perf/Watt SmartConnect


Optimized Process & Addressing IP & Fabric
Voltage Scaling Interconnect bottlenecks

Enhanced at 16nm Enhanced at 16nm New at 16nm (58G)


Security & Reliability Transceivers
Decrypt/Auth/Anti-Tamper, 16G, 32G, & 58G
Improved SEU Performance Fractional PLL

Enhanced at 16nm New at 16nm (HBM) Enhanced at 16nm

3rd Gen 3D IC & HBM Networking IP


Greater Inter-Die FMAX 100G EMAC w/RS-FEC
HBM for 20X bandwidth 150G Interlaken w/300GLL

Enhanced at 16nm Enhanced at 16nm

DSP External Memory


2400 Mb/s (20nm)
Floating/Fixed Pt Enhanced
2,666 Mb/s (16nm)
2.5X Bandwidth (vs. 28nm)
Enhanced at 16nm
Enhanced at 16nm
Block RAM
PCI Express® Hardened Cascading
Gen3 x16 Power-Optimized Silicon
Gen4 x8

New at 16nm New at 16nm New at 16nm

High Density I/O Packaging UltraRAM


Power-Optimized I/O For Signal Integrity, Massive Capacity
MIPI D-PHY Support PCB Area, Thermal SRAM Replacement

Page 49 © Copyright 2016 Xilinx


.
Processing System Summary: From Cost-
Optimized to UltraScale+
FPGA-Based SoC Zynq-7000 SoC Zynq UltraScale+ MPSoC

Application 32-bit Xilinx MicroBlaze 32-bit ARM® Cortex™-A9 64-bit ARM Cortex-A53
Processing Unit Up to 220MHz* @ 1.4DMIPS/MHz Up to 1GHz @ 2.5DMIPS/MHz Up to 1.5GHz @ 2.3DMIPS/MHz

Real-Time 32-bit Xilinx MicroBlaze 32-bit Xilinx MicroBlaze Dual-core ARM Cortex-R5
Processing Unit Up to 220MHz* @ 1.3DMIPS/MHz Up to 220MHz* @ 1.3DMIPS/MHz Up to 600MHz @ 1.6DMIPS/MHz

Multimedia ARM Mali™-400 MP2 GPU


Processing -- -- Up to 667MHz
Video Codec supporting H.264/H.265

External DDR
Flexible DDR3, DDR3L, DDR2, LPDDR2 DDR4, LPDDR4, DDR3, DDR3L, LPDDR3
Interface

High-Speed PCIe® Gen2, USB 3.0, SATA 3.1,


Flexible USB 2.0, Gigabit Ethernet, SD/SDIO
Peripherals DisplayPort, Gigabit Ethernet, SD/SDIO

Max I/O Count Up to 500* Up to 528 Up to 668

Transceivers Up to 16 @ 6.6Gb/s* Up to 16 @ 12.5Gb/s Up to 76 @ Up to 32.75Gb/s

Package Size 8x8 to 35x35* 13x13 to 35x35 19x19 to 45x45

* Specifications based on Cost-Optimized devices, higher capabilities on UltraScale™ devices

50

© Copyright 2016 Xilinx


.
A Broad Range of Processing Performance

Migrate
Next Gen Design
With Same Tools & Code
Processing Capability

Quad
A53
Migrate Quad
Next Gen Design A53
With Same Tools
Dual
A53 VCU
GPU GPU
Dual
Dual A9 Dual R5 Dual R5 Dual R5
Single A9
MicroBlaze MicroBlaze A9
UltraScale+ UltraScale+ UltraScale+
MicroBlaze Kintex-7
Spartan-7 Artix-7 Artix-7 Artix-7 FPGA Logic FPGA Fabric FPGA Logic
FPGA Logic
Spartan-6 FPGA FPGA FPGA Logic FPGA Logic
FPGA

Full Scalable Software, Tools, and IP Ecosystem

Page 51 © Copyright 2016 Xilinx


.
Covering the Full Spectrum of Memory Solutions
External Memory
2666-DDR4
High Bandwidth (Multi-Gigabyte)
Memory
(Multi-Gigabyte)
UltraRAM
(100s of Megabits)

Block RAM Gap in


(10s of megabits)
Memory
Hierarchy

Shallow Deep Video Deeper Buffering at Large


Buffering & Packet Buffering Highest Performance/Watt Data Storage

14.9GB/s 11.7GB/s 460GB/s 21.3GB/s


36Kb Density 288Kb Density 64Kb Density For 16GB Density

Page 52 © Copyright 2016 Xilinx


.
Unlocking Performance, Bandwidth, & Integration
7X
7 28nm 20nm 16nm

5
4X
Relative to 28nm

4
3X
3 2.4X
2.1
1.7 1.7
2 1.5
1 1 1 1

0
Logic Fabric Serial Bandwidth DSP Bandwidth On-Chip Memory
Performance/Watt

Enhanced Fabric with Up to 128 transceivers at ~12,000 DSP slices UltraRAM for SRAM
FinFET performance up to 32.75 Gb/s running at ~900 MHz device replacement

Page 53 © Copyright 2016 Xilinx


.
Hardware Determinism for Critical Tasks & Real-Time Response

ARM Cortex-A9 MicroBlaze Function Accelerated in Fabric


Uncertain Response Time Deterministic Response Time Parallelized, Highest Performance
task0 task1 task0 task1 task2 task0
....
task1

task2

....
ARM Cortex-A9 Programmable Logic
for Application Processing for Deterministic Processing

Soft Processor Dedicated Engines


(pipeline-configurable)
in Programmable Logic
Running RTOS
Non-Critical
Non-CriticalCritical Critical
Compute-

+ +
Compute-IntensiveTasks
Tasks Tasks
Intensive Tasks

Linux RTOS

ARM Cortex-A9 Processor MicroBlaze

Page 54 © Copyright 2016 Xilinx


.
MicroBlaze as a Co-Processor in a Zynq-7000
Maintaining Separate Threads for Greater Reliability and Performance

ARM Processor for


1 Shared Access (via DMA)
Compute-Intensive
Tasks Flash Controller
DDR
Controller
Shared Resources for
Master or Slave
SPI
Integrated Data Passing
 Access to Hardened Peripherals
MicroBlaze for ARM
I2C
2 ARM CAN  Access to DDR Controller (DMA)
AXI Interconnect

Offloading CortexTMTM-A9
Cortex -A9 UART
(Single or Dual-Core)  Access to On-Chip Memory (OCM)
• Housekeeping Chores (Single or Dual-Core) SDIO

• Network Communication GPIO  ARM access to BRAM (as Cache)


512KB L2 Cache 256KB OCM
• User Interfaces USB
 Shared low latency interrupt (GIC)
Timer JTAG Config GIC DMA GigE

Drag, Drop,
IP Catalog and Customize
Embedded Block RAM See it on1
MicroBlaze

MicroBlaze MCS
MicroBlaze1 MicroBlaze2 . . . MicroBlazeN
Multiple instantiations as
needed
1: “Zynq & MicroBlaze IOP Block, OCM & Memory Resource Sharing”

Page 55 © Copyright 2016 Xilinx


.
System-Wide Safety and Security Across the
Portfolio
Hardware Software
Hardware (FPGA Fabric) Attacks Attacks
SECURITY
– AES-256 encryption1 → anti-cloning & reverse eng. Tamper Snooping
– SHA-256 authentication → Ensures trusted source
– Temp/Volt. monitor → Flags ‘out-of-spec’ condition Cloning Code
Modification
– Isolated Design Flow (IDF) → Fault containment2

Software (Zynq SoC) Reverse


Engineering Malware
– Secure boot , protects from attack at startup
– ARM TrustZone3 to isolate ‘main’ OS from secure OS
Spoofing
– Memory protection against malware injection Denial of
Service Attack
– Rich ecosystem of run-time Security IP

1: NIST-Approved (National Institute of Standards and Technology)


2: Physical separation of safety-critical regions of the design
3: A compromised OS cannot access ‘secured’ data in the secure OS

56

© Copyright 2016 Xilinx


.
Common Design Tools across the Portfolio

Vivado® HLx Design Suite


System- & IP-Centric Design
IP Integration and design for hardware design
Scalable IP Catalog, Best-in-Class Familiar SW Dev
High level C synthesis for IP creation Integration Automation Implementation Environments

Xilinx IP
HLS IP C/C++

Custom IP
Xilinx SDK Ecosystem

Processor Design and Debug


Leverage Smallest,
Lowest Cost Device
MicroBlaze and ARM Processors

SDSoC
Complete C/C++ Environment
Design entirely in C/C++ for Zynq

57

© Copyright 2016 Xilinx


.
Vivado® Design Suite: IP Integration and HW/SW
Development
$1,000+ Value (Vivado + SDK)
Project Navigator IP Catalog Implementation & Verification

FREE TOOL
WEBPACK DOWNLOAD

Project navigator for a guided flow from design, to verification, to implementation


Drag and drop hundreds of Xilinx and partner IP cores or design your own in RTL or C
Fast HW and SW implementation and verification environments
58

© Copyright 2016 Xilinx


.
Xilinx SDK for Ease-of-Use for Everything You
Need
A Single Cockpit for Everything You Need

FREE TOOL
Windows or Linux hosted Eclipse-based IDE WEBPACK DOWNLOAD

Linaro GCC compiler


Built for
– Performance optimization
– Firmware & application development
– Linux and bare-metal development
– Code profiling
– Board bring-up

Supports all embedded configurations


– Simple single-processor applications
– Multiple-processor systems for both SMP and AMP configurations

Page 59 © Copyright 2016 Xilinx


.
Embedded Portfolio Run Time Software Support
OS Ecosystem MicroBlaze Cortex-A9 Cortex-A53 Cortex-R5
Baremetal    
OpenAMP 
Linux: Xilinx PetaLinux   
OS

Linux: Mentor, Wind River, MontaVista 


Linux: ArchLinux, Enea, Timesys  
Android  
Microsoft – Windows Emb Compact ’07/’13 
FreeRTOS    
Mentor Graphics - Nucleus    
Micrium - uC/OS-II & III    
Sciopta Systems – Sciopta   
Wind River - VxWorks7   
RTOS

eSol eT-kernel  
GreenHills – INTEGRITY  
Lynx - LynxOS7  
QNX - Neutrino  
Sysgo – PikeOS  
expresslogic - ThreadX  
Mentor Graphics  
Hypervisor

Wind River  
Open Source - Xen 
Lynx - LynxSecure 

Page 60

© Copyright 2016 Xilinx


.
Getting Started with Cost-Optimized and Zynq UltraScale+ Kits
Microboard ARTY S7 ARTY A7 ARTY Z7 MiniZed UltraZed
Spartan-6 LX9 Spartan-7 S25 Artix-7 A35T Zynq-7000 7Z010 Zynq-7007S Zynq UltraScale+ ZU3EG

Spartan-7 S50 Zynq-7000 7Z020


$109 $209

Page 61

© Copyright 2016 Xilinx


.
Zynq-7000S Single-Core on the Avnet MiniZed
Includes SDSoC License

Based on Single-Core Z-7007S Device (23K


LCs)
Free SDSoC License with Evaluation Kit
Comes with Board Support Package C/C++ Applications

– Basic Software Stack


– Board-specific design rule checks for rapid bring-up System-level Profiling

Specify Functions for


Acceleration

Full System Generation

Page 62 © Copyright 2016 Xilinx


.
Low Cost Single Core Zynq-7007S: ARM A9 + 23K
FPGA Logic Cells

ARM Cortex-A9 +
Device Name Z-7007S Z-7012S Z-7014S Part Number XC7S6 XC7S15 XC7S25
7 Series PL Equivalent Artix®-7 Artix-7 Artix-7 Logic Cells 6,000 12,800 23,360
Logic Cells 23K 55K 65K

FPGA fabric equivalent to 3rd smallest Spartan-7 device

Featured in the MiniZed Evaluation Kit


– Integrated Wifi and Bluetooth
– Arduino Shield headers
– 2 Pmod Headers: 100s of Pmods to choose from

Page 63 © Copyright 2016 Xilinx


.
MiniZed Platform Block Diagram

Vterm

512Mb 128Mb 8GB


DDR3L QSPI eMMC

x16 x4 SDIO
OSC
24MHz TYPE A
Arduino-style
Headers USB 2.0 USB
ULPI
PHY Peripherals

PMOD PMOD UART WiFi &


Conn. Conn.
SDIO Bluetooth
PL bi-filament GPIO Module Connected
LED LED Driver

PL bi-filament UART Dual USB To


LED
LED Driver
Zynq-7000S GPIO to Serial Micro AB Computer

PS User Button XC7Z007S-1CLG225C


Power Level External
Done Boot Mode Power
Select Micro AB
PL User Button Switch x 2 Supply
5v
MEMS
Microphone Reset Button Integrated DC/
DC Solution & Power
Reset Supervisor
Motion &
Temp Sensor (Heat Sink)

1.35V
1.0V

1.8V
3.3V
33.33MHz

Page 64
© Copyright 2016 Xilinx
.
Application Development

DNN
CNN
GoogLeNet
Algorithm Development
SSD
FCN …

Platform Development

Page 65
© Copyright 2016 Xilinx
.
Removing the Barrier to Broad Adoption: reVISION Stack
20% Xilinx/80% User

ML Apps
OpenCV
“A subsystem design used
Apps to take 3 weeks. I’ve done
Development Time

it in 4 days with SDSoC.”


- DSP Engineer

Algorithm
to RTL

System
Integration SDSoC
C/C++
Bitstream
Generation Ease of Use

Traditional RTL OpenCV Machine Learning


flow

Page 66 © Copyright 2016 Xilinx


.
Removing the Barrier to Broad Adoption: reVISION Stack
20% Xilinx/80% User

ML Apps
OpenCV
“A subsystem design used
Apps to take 3 weeks. I’ve done
Development Time

it in 4 days with SDSoC.”


- DSP Engineer

Algorithm
80% Xilinx/20% User
to RTL
“reVISION will shorten our development cycle for new
products and upgrades by up to 12 months.”
System - System Architect
Integration SDSoC
C/C++
Bitstream
Generation Ease of Use

Traditional RTL OpenCV Machine Learning


flow

Page 67 © Copyright 2016 Xilinx


.
Page 68 © Copyright 2016 Xilinx
.
Page 69 © Copyright 2016 Xilinx
.
Page 70 © Copyright 2016 Xilinx
.
Page 71 © Copyright 2016 Xilinx
.

Potrebbero piacerti anche