Sei sulla pagina 1di 15

Vendor Tutorial

InfiniBand-over-Distance Transport
using Low-Latency WDM
Transponders &
IB Credit Buffering

Christian Illmer
ADVA Optical Networking

InfiniBand-over-distance transport
using

low-latency WDM transponders


and

IB credit buffering
October, 2008

Fiber Link Capacity [b/s]

10T

WDM

InfiniBand

1T

100G
10G

Ethernet

aw 8m
L
1
s

12xQDR

e ery
r
o ev
o
s
M

4xDDR
12x
4x

e
bl
u
Do

FC

1G

100M
1980

1985

1990

1995

2000

2005

2010

 Bandwidth requirements follow Moores Law (# transistors on a chip)


 So far, InfiniBand outperforms Fibre Channel and also Ethernet regarding bandwidth,
and can cope with Moores growth rate

2008 ADVA Optical Networking. All rights reserved.

Ishida, O., Toward Terabit LAN/WAN Panel, iGRID2005

Connectivity performance

InfiniBand data rates


InfiniBand

IBx1

IBx4

IBx12

Single Data Rate, SDR

2.5Gbit/s

10Gbit/s

30Gbit/s

Double Data Rate, DDR

5Gbit/s

20Gbit/s

60Gbit/s

Quad Data Rate, QDR

10Gbit/s

40Gbit/s

120Gbit/s

IB uses 8B/10B coding, e.g., IBx1 DDR has 4Gbit/s throughput


Copper
 Defined for all data rates and multiplyers
 Serial for SDR x1, DDR x1, QDR x1
 Parallel copper cables (x4 or x12)
Fiber optic
 Defined for all data rates, up to x4
 Serial for SDR x1, DDR x1, QDR x1 and SDR x4 LX (serialized I/F)
 Parallel for SDR x4 SX

2008 ADVA Optical Networking. All rights reserved.

Protocols and bit rates


10M

100M

1G

STM-1
Synchronous
10bT

Ethernet

10G

STM-4

FE

STM-16

ETR/CLO

Fibre Channel etc.

OTU3

10GbE

40GbE

100GbE

IBx4
QDR

IBx12
QDR

IBx1
SDR
ESCON

HDD

STM-64

GbE

InfiniBand
1G-FC
FICON
ISC

IBx4
SDR

2G-FC
8G-FC
FICON2
ISC3 4G-FC 10G-FC
FICON4

Ultra160 SCSI
Ultra320 SCSI

2008 ADVA Optical Networking. All rights reserved.

100G

IBx4
DDR

CPU connectivity-market
 Market penetration of different CPU interconnect technologies
 InfiniBand clearly dominating new high-end DC implementations
50%
2006

2007

40%

TOP
TOP 100
100 Supercomputers
Supercomputers
37%
37% in
in 07
07
50%
50% in
in 08
08

30%
20%
10%
0%

G
bE

In
M
yr
fin
in
iB
et
an
d

SP

In
Q
Pr
Cr
N
M
te Cr
ua
U
op
ix
o
a
r
M
Sw
ss
ed
dr
co y
rie
Al
ba
ic
n
itc
in
ta
ne
s
r
k
h
ry
ct

2008 ADVA Optical Networking. All rights reserved.

HPC networks today


FC and GbE HBAs
and IB HCAs

IB
FC Eth

Server
cluster

Relevant parameters

IB

 LAN HBA based on GbE/10GbE

FC Eth

 SAN HBAs based on 4G/8G-FC


 HCAs based on IB(x4) DDR/QDR

IB
FC Eth

IB
FC Eth

FC
FC
FC

FC SAN

Ethernet LAN

Typical HPC DC today


 Dedicated networks / technologies for LAN, SAN, CPU (server) interconnect
 Consolidation required

2008 ADVA Optical Networking. All rights reserved.

Unified InfiniBand architecture


InfiniBand
(HPC Messaging etc.)

Network

BSD Sockets
TCP

SDP
TS

IP
Drivers

BSD Sockets

IPoIB

SDP
TS

uDAPL
MPI

TS API

SAN

NFS-RDMA
DAT
TS API

FS API
File System
SCSI
SRP

FCP

VAPI

Ethernet

InfiniBand HCA

FC

Ethernet
Switch

InfiniBand Switch

FC
Switch

Ethernet GW

LAN/WAN

API: Application Programming I/F


SDP: Sockets Direct Protocol
SRP: SCSI RDMA Protocol
8
DAT:
Direct Access Transport

FC GW

Unified Fabric

SAN

VAPI: Verbs API


TS API: Terminal Server API
uDAPL: User-level Direct-Access Programming Library
2008 ADVA Optical Networking. All rights reserved.
BSD Socket: Berkeley Socket API

HPC networks tomorrow?


IB

Server
cluster

IB

IB

IB

IB

IB

IB

IB

IB HCAs
n
do
e
y
plo
de

be
to e
y
l
al
like d sc
n
U roa
ab

FC
FC
FC

Gate
way

Gate
way

SCSI RDMA protocol

IPoIB
IB SF

Consolidation step: Unified IB switch fabric


 IB SF used for CPU cluster and LAN, storage (using IPoIB, SRP and gateways)
 LAN now based on IPoIB and Ethernet gateway

2008 ADVA Optical Networking. All rights reserved.

InfiniBand connections over distance


IB server cluster A

IB server cluster B
+50km

Dark fiber

IB switch
fabric

IB-over-DWDM

IB switch
fabric

 Why is it relevant?
 Data centers disperse geographically
(GRID computing, virtualization, disaster recovery, )
 Native, low-latency IB-over-distance transport was still the missing part

 Cluster connectivity via IB-over-WDM


 WAN protocol is IB, no conversion needed
 No additional latency
 Fully transparent transport

10

2008 ADVA Optical Networking. All rights reserved.

InfiniBand throughput over distance


with B2B credits
Throughput

Throughput drops significantly after several meters


Only buffer credits (B2B credits) ensure maximum
InfiniBand performance over distance
w/o B2B credits

Buffer credit size directly related to distance


Distance

What is the solution?


 IB range extender credit buffering, low latency, and conversion to 10G optical
 WDM lowest latency, transparency, capacity, reach, fiber relief

What are the commercial requirements?


 Solution must be based on commercial products
 Interworking capabilities must be demonstrated

11

2008 ADVA Optical Networking. All rights reserved.

Demonstrator setup at HLRS


Voltaire ISR2012 Grid Director
 288 x DDR IBx4 ports
 11.5 Tb/s backplane
 <450 ns latency

ADVA FSP 2000 DWDM


Obsidian Longbow Campus
 4 x 10Gbit/s transponders  4 x SDR copper to 10G optical
 <100 ns link latency
 2-port switch architecture
 840 ns port-to-port latency
 10/40 km reach (Buffer Credits)

IBM cluster

Cell cluster

DWDM

DWDM

0.4...100.4 km
G.652 SSMF

Site: HLRS Nobelstrasse

12

Site: HLRS Allmandring

2008 ADVA Optical Networking. All rights reserved.

Demonstrator results
 The Intel MPI benchmark SendRecV was used
 Constant performance up to 50km
 Decreasing performance after 50km
SendRecV Throughput vs. Distance

0.8

0.8

Throughput
[GB/s]

Throughput [GB/s]

SendRecV Throughput vs. Message Length

0.6

0.4 km
0.4

25.4 km

0.6

32 kB

0.4

128 kB
50.4 km
0.2

512 kB

0.2

75.4 km

4096 kB

100.4 km
0

0
0

1000

2000

3000

4000

20

40

60

80

Distance [km]

Message Length [kB]

Full
Full InfiniBand
InfiniBand throughput
throughput over
over more
more than
than 50km
50km
13

2008 ADVA Optical Networking. All rights reserved.

100

Thank you

public-relations@advaoptical.com

Thank you
Danke

Potrebbero piacerti anche