Sei sulla pagina 1di 24

Xeon+FPGA Platform for the Data Center

ISCA/CARL 2015

PK Gupta, Director of Cloud Platform Technology, DCG/CPG


Overview

Data Center and Workloads


Xeon+FPGA Accelerator Platform
Applications and Eco-system

2
Exponential growth in mobile.

3
is driving Data Center growth

4
Leading to search for greater performance
efficiencies

5
Across Data Center Workloads
Diverse workloads:
Cloud Services: Search, Web Servers, ..
Analytics: Big Data, Machine Learning,
Scientific: Genomics, Security,
Communication: Packet Processing, Virtual Switching,
Storage: Compression, Deduplication,
Changing dynamics:
No single killer app
Emerging new apps drive changes in workloads
6
A homogenous compute platform for the Data Center?

7
Overview

Data Center and Workloads


Xeon+FPGA Accelerator Platform
Applications and Eco-system

8
Motivation for Accelerators
Enhanced Performance: Accelerators compliment CPU cores to
meet market needs for performance of diverse workloads in the
Data Center:
Enhance single thread performance with tightly coupled
accelerators or compliment multi-core performance with
loosely coupled accelerators via PCIe or QPI attach
Move to Heterogeneous Computing: Moores Law continues but
demands radical changes in architecture and software.
Architectures will go beyond homogeneous parallelism,
embrace heterogeneity, and exploit the bounty of transistors to
incorporate application-customized hardware.
.
Accelerator Architecture

General
Purpose Cores

Cost
(Area, Power)
Reconfigurable
Ease of Programming/
Development

Fixed-
function

Flexibility

Performance Efficiency: Performance/Watt, Performance/$


Programming Complexity : Effort, Cost
Accelerator Attach

PCIe
attach

Cost
(Latency,
Granularity) QPI attach

On-Package
On-Chip
On-core

Distance from Core

Best attach technology might be application or even algorithm dependent


Coherency and Programming Model
Data Movement
In-line
Accelerator processes data fully or partially from direct I/O
Shared Virtual Memory :
Virtual addressing eliminates need for pinning memory buffers
Zero-copy data buffers

Interaction between Core and Accelerator


Off-load
Hybrid : algorithm implemented on host and accelerator
Proposed Platform for the Data Center
FPGA with coherent low-latency interconnect:
Simplified programming model
Support for virtual addressing
Data Caching
Enables new classes of algorithms for acceleration with:
Full access to system memory
Support for efficient irregular data pattern access
Remapping of algorithms from off-load model to hybrid processing model
Fine grained interactions

13
IVB+FPGA Software Development Platform
Software Development for Accelerating Workloads using Xeon and coherently attached FPGA in-socket

Intel Xeon E5-26xx v2


Processor
Processor
DDR3
FPGA Module Altera Stratix V
6.4 GT/s full width
DDR3 DDR3 QPI Speed
Intel Xeon (target 8.0 GT/s at full width)
E5-2600 v2 QPI FPGA Memory to FPGA 2 channels of DDR3
DDR3 Product Family Module (up to 64 GB)
DDR3
Expansion
PCIe 3.0 x8 lanes - maybe used
connector
DDR3 for direct I/O e.g. Ethernet
to FPGA Module
Configuration Agent, Caching
PCIe* 3.0 x8

PCIe* 3.0 x8

PCIe* 3.0 x8

PCIe* 3.0 x8
PCIe* 3.0 x8

PCIe* 3.0 x8
Features Agent,, (optional) Memory
DMI2

Controller
Accelerator Abstraction Layer
(AAL) runtime, drivers, sample
Software
applications

Heterogeneous architecture with homogenous platform support


14
Programming Interfaces
CPU FPGA

Accelerator Function Units


Host Application
(AFU)

Service API
CCI
extended
Accelerator Virtual Memory API Addr Translation
Abstraction
CCI
Layer
standard
Physical Memory API
QPI/KTI Link, Protocol, &
Uncore PHY

QPI/KTI

Programming interfaces will be forward compatible from SDP to future MCP solutions
Simulation Environment available for development of SW and RTL

15
Programming Interfaces : OpenCL
CPU FPGA

OpenCL OpenCL OpenCL


Host Code Application Kernel
Code

C
OpenCL RunTime F OpenCL Kernels
G

Service API
Accelerator CCI
Abstraction Extended
Virtual Memory API VirtMem
Layer CCI
Physical Memory API Physical Memory API Standard

QPI/UPI/P
CIe
System Memory

Unified application code abstracted from the hardware environment


Portable across generations and families of CPUs and FPGAs

16
Overview

Data Center and Workloads


Xeon+FPGA Accelerator Platform
Applications and Eco-system

17
XEON+FPGA in the Cloud : integration with SDI and RSA
Cloud Users

Workload IA Optimized Software

SDI (Exposing IA capabilities)

Cloud Controller Intel HAAS


Cloud
Service
Provider Placing dynamic node
workload composition

IA Logical Node with


Acceleration Intel
Rackscale

18
XEON+FPGA in the Cloud: IP Store Concept

Cloud Users Intel


3rd party IP
Accelerators
developers
3rd party
Workload IA Optimized Software 3rd party
Accelerators
Accelerators

SDI (Exposing IA capabilities)


-IP Catalogue

Cloud Controller Intel Store


Cloud Intel HAAS
Client
Service
Provider Placing
workload static/dynamic
FPGA programming

IA Node
XEON+FPGA

19
Example Usage : Deep Learning Framework for
Visual Understanding
cluster
node

CCI Interface

SRAM Controller
device

Read Write Reg IP


Access Regist Processing Tile n
DMA ers
Processing Tile 1
Processing Tile 0

Weights

Outputs
Inputs
Control
State
primitives

Machine
P P P
E E E

20
Example Usage: Accelerating Open VSwitch w/DPDK
VM0 VM1
Offload DMA Engine to FPGA :
VirtIO VirtIO
QEMU QEMU
Frees up CPU cycles to
Userspace perform more useful work
DPDK Libraries
ovs-vswitchd IVSHEM VHOST Reduce cache pollution.

Userspace forwarding
FPGA DMA ovsdv-server Add support for Packet
netdev

TAP
Classification, ACL, and other
Kernel space
functions including Direct I/O
in FPGA
Open vSwitch Kernel Module

Physical Switch/NIC

21
Example Usage: High Frequency Trading Accelerator

CPU
Feed Ethernet
Parser PHY &
MAC
Host
Application
Trading Order
Logic / Generation
Statistics

FPGA

22
Academic Research

Call for Proposals: Intel-Altera Heterogeneous Architecture


Research Platform Program
Submitted by Nicholas Carter

Intel-Altera Heterogeneous Architecture Research


Platform (HARP) Program

Intel Corporation and Altera Corporation are pleased to announce


the Heterogeneous Architecture Research Platform (HARP)
program, which will provide faculty with computer systems
containing Intel microprocessors and an Altera Stratix V FPGA
module that incorporates Intel QuickAssist Technology in order to
spur research in programming tools, operating systems, and
innovative applications for accelerator-based computing systems.

23
Q&A

24

Potrebbero piacerti anche