Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Processor
Hot Chips
August 18-20, 2019
Jeff Stuecheli
Scott Willenborg
William Starke
Proposed POWER Processor Technology and I/O Roadmap
Focus of 2018 talk
POWER7 Architecture POWER8 Architecture POWER9 Architecture POWER10
Up To Up To Up To Up To Up To Up To Up To Up To
Sustained Memory Bandwidth 65 GB/s 65 GB/s 210 GB/s 210 GB/s 150 GB/s 210 GB/s 650 GB/s 800 GB/s
Standard I/O Interconnect PCIe Gen2 PCIe Gen2 PCIe Gen3 PCIe Gen3 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen5
Up To Up To Up To Up To Up To Up To Up To Up To
Sustained Memory Bandwidth 65 GB/s 65 GB/s 210 GB/s 210 GB/s 150 GB/s 210 GB/s 650 GB/s 800 GB/s
Standard I/O Interconnect PCIe Gen2 PCIe Gen2 PCIe Gen3 PCIe Gen3 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen5
Up To Up To Up To Up To Up To Up To Up To Up To
Sustained Memory Bandwidth 65 GB/s 65 GB/s 210 GB/s 210 GB/s 150 GB/s 210 GB/s 650 GB/s 800 GB/s
Standard I/O Interconnect PCIe Gen2 PCIe Gen2 PCIe Gen3 PCIe Gen3 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen4 x48 PCIe Gen5
Yesterday’s Plumbing
Tomorrow’s Differentiation
B B B B B B B B B B B B B B B B B B Emerging JEDEC Buffers
OMI Memory
DRAM/
– NVLink – Next generation of GPU/CPU bandwidth SCM/
OMI
etc
IBM Confidential
© 2019 IBM Corporation Same Open Memory Interface used for all Systems and Memory Technologies 10
Primary Tier Memory Options
8b ~25G diff
8b
DDR4 OMI DIMM
BUF
Capacity ~256GB→4 TB
OMI Strategy
8b
BW ~320 GB/sec
Same System
8b ~25G diff
BW Opt OMI DIMM
16b
Capacity ~128→512 GB
BUF
BW ~650 GB/sec
1024b
On module
On Module HBM
Si interposer Capacity ~16→32 GB Unique System
BW ~1 TB/sec
© 2019 IBM Corporation 11
Final Addition to the POWER9 Processor Family
Processor Chip Details
• 728 mm2 ( 25.3 x 28.8 mm)
The Bandwidth Beast Open Memory Interface (OMI)
Advanced I/O (AIO) • 16 channels x8 at 25 GT/s
• 8 Billion Transistors
• 650 GB/s peak 1:1 r/w bandwidth
• Up to 24 SMT4 Cores PowerAXON (x48) Memory Signaling (8x8 OMI)
• Technology Agnostic
• Up to 120 MB eDRAM L3 cache
Core Core Core Core Core Core Core Core
• Offered w/ Microchip DDR4 buffer
• 14nm finFET
L3 Region
L3 L3
PowerAXON 25 GT/s Attach
• Improved device performance L2 L2 L2 L2 • Up to 16 socket glue-less SMP
• Reduced energy (4x24 SMP added to 3x30 local)
Core Core Core Core Core Core Core Core
• eDRAM • Up to x48 NVIDIA NVLINK GPU
• 17 layer metal stack attach
L3 L3 • Up to x48 OpenCAPI 4.0 coherent
L2 L2 L2 L2 accelerator / memory attach
High Bandwidth Signaling
• 25 GT/s low energy differential Core Core Core Core Core Core Core Core
Industry Standard I/O Attach
• PowerAXON, OMI memory
PowerAXON (x48) Memory Signaling (8x8 OMI) • x48 PCIe Gen 4 at 16 GT/s
• 16 GT/s low energy differential
• Up to x16 CAPI 2.0 coherent
• Local SMP accelerator / storage attach
2 TB/s Raw Signaling Bandwidth
• 16 GT/s PCIe Gen4
Shared by 6 Attach Protocols
© 2019 IBM Corporation 12
Unleashing the Bandwidth: POWER9 AIO + NVIDIA GPU
Potential System
Offering NVLink
GPU GPU
GPU GPU
OpenCAPI OpenCAPI
IBM IBM
POWER9 AIO POWER9 AIO
PCIe Gen4 SMP PCIe Gen4
- OMI Bandwidth
- OpenCAPI function
- NVLINK function
3x bandwidth 6x bandwidth
per mm2 as per mm2 as
DDR signaling DDR signaling
x24 Accelerator 2 x Local SMP x48 PowerAXON 3 x Local SMP x48 PowerAXON 3 x Local SMP
4 x DDR4 memory 4 x DMI buffered memory 8 x OMI buffered memory
© 2019 IBM Corporation 15
DRAM DIMM Comparison
• Technology agnostic
• Low cost
JEDEC DDR • Ultra-scale system density
DIMM • Enterprise reliability
• Low-latency
• High bandwidth
SMC 1000
DRAM
DRAM
DRAM
DRAM
HSS HSS Frame Read DDR4 Data
Receiver
Receive Transmitter Formatting Dataflow Receive PHY
IBM
POWER9
AIO
Microchip’s SMC 1000 8x25G features an innovative low latency design that delivers
less than four ns incremental latency over a traditional integrated DDR controller
with LRDIMM. This results in OMI-based DDIMM products having virtually identical
bandwidth and latency performance to comparable LRDIMM products.
6X bandwidth / PHY area advantage gives POWER9 AIO bandwidth of 16 DDR ports
© 2019 IBM Corporation
18
Final Addition to POWER9: Stay Tuned for POWER10
GPU L2 L2 L2 L2
GPU
L3 L3
L2 L2 L2 L2