Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ISCA/CARL 2015
2
Exponential growth in mobile.
3
is driving Data Center growth
4
Leading to search for greater performance
efficiencies
5
Across Data Center Workloads
Diverse workloads:
Cloud Services: Search, Web Servers, ..
Analytics: Big Data, Machine Learning,
Scientific: Genomics, Security,
Communication: Packet Processing, Virtual Switching,
Storage: Compression, Deduplication,
Changing dynamics:
No single killer app
Emerging new apps drive changes in workloads
6
A homogenous compute platform for the Data Center?
7
Overview
8
Motivation for Accelerators
Enhanced Performance: Accelerators compliment CPU cores to
meet market needs for performance of diverse workloads in the
Data Center:
Enhance single thread performance with tightly coupled
accelerators or compliment multi-core performance with
loosely coupled accelerators via PCIe or QPI attach
Move to Heterogeneous Computing: Moores Law continues but
demands radical changes in architecture and software.
Architectures will go beyond homogeneous parallelism,
embrace heterogeneity, and exploit the bounty of transistors to
incorporate application-customized hardware.
.
Accelerator Architecture
General
Purpose Cores
Cost
(Area, Power)
Reconfigurable
Ease of Programming/
Development
Fixed-
function
Flexibility
PCIe
attach
Cost
(Latency,
Granularity) QPI attach
On-Package
On-Chip
On-core
13
IVB+FPGA Software Development Platform
Software Development for Accelerating Workloads using Xeon and coherently attached FPGA in-socket
PCIe* 3.0 x8
PCIe* 3.0 x8
PCIe* 3.0 x8
PCIe* 3.0 x8
PCIe* 3.0 x8
Features Agent,, (optional) Memory
DMI2
Controller
Accelerator Abstraction Layer
(AAL) runtime, drivers, sample
Software
applications
Service API
CCI
extended
Accelerator Virtual Memory API Addr Translation
Abstraction
CCI
Layer
standard
Physical Memory API
QPI/KTI Link, Protocol, &
Uncore PHY
QPI/KTI
Programming interfaces will be forward compatible from SDP to future MCP solutions
Simulation Environment available for development of SW and RTL
15
Programming Interfaces : OpenCL
CPU FPGA
C
OpenCL RunTime F OpenCL Kernels
G
Service API
Accelerator CCI
Abstraction Extended
Virtual Memory API VirtMem
Layer CCI
Physical Memory API Physical Memory API Standard
QPI/UPI/P
CIe
System Memory
16
Overview
17
XEON+FPGA in the Cloud : integration with SDI and RSA
Cloud Users
18
XEON+FPGA in the Cloud: IP Store Concept
IA Node
XEON+FPGA
19
Example Usage : Deep Learning Framework for
Visual Understanding
cluster
node
CCI Interface
SRAM Controller
device
Weights
Outputs
Inputs
Control
State
primitives
Machine
P P P
E E E
20
Example Usage: Accelerating Open VSwitch w/DPDK
VM0 VM1
Offload DMA Engine to FPGA :
VirtIO VirtIO
QEMU QEMU
Frees up CPU cycles to
Userspace perform more useful work
DPDK Libraries
ovs-vswitchd IVSHEM VHOST Reduce cache pollution.
Userspace forwarding
FPGA DMA ovsdv-server Add support for Packet
netdev
TAP
Classification, ACL, and other
Kernel space
functions including Direct I/O
in FPGA
Open vSwitch Kernel Module
Physical Switch/NIC
21
Example Usage: High Frequency Trading Accelerator
CPU
Feed Ethernet
Parser PHY &
MAC
Host
Application
Trading Order
Logic / Generation
Statistics
FPGA
22
Academic Research
23
Q&A
24