Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
17th
17th
IEEE
IEEE
Symposium
Symposium
on on
High
High
Performance
Performance
Interconnects
Interconnects
Laxmi Bhuyan
I.
INTRODUCTION
52
BACKGROUND
A. Integration of NICs
It is well known that TCP/IP over Ethernet is a dominant
overhead for commercial web and data servers [19, 20].
Researchers gradually realized that a comprehensive solution
across hardware platforms and software stacks is necessary to
eliminate the overhead [1, 2]. Existing work for improving
TCP/IP performance falls into two categories: TOE [5] and the
integration of NICs [1, 2]. Although TOE reduces the
communication overhead between processors and NICs, it
lacks scalability due to the limited processing and memory
capacity. It also requires extensive modification of OS and
development of firmware in NICs. Recently, a counter-TOE
approach is to integrate NICs onto CPUs. It is envisioned as
the next generational network infrastructure. It not only
reduces the latency of accessing I/O registers, but also
leverages extensive resources in multi-core CPUs.
Binkert [1, 2] made a first attempt to couple a simple NIC
with a CPU for high bandwidth networks. They claimed that
the device driver is one of the dominant overheads for
processing high speed networks and an integrated NIC can
eliminate the overhead. Additionally, they also go further to
redesign the integrated NIC to eliminate the overheads of
DMA descriptor management and data copy. Evaluation on
III.
EXPERIMENT METHODOLOGY
A. Testbed Setup
Our experimental testbed consists of a Sun T5120 server
connected to an Intel Quad Core DP Xeon server, which
53
PERFORMANCE EVALUATION
40%
35%
Bandwidth (Gbps)
B. Server Software
The SUT runs the Solaris 10 Operating System while the
stressor runs Vanilla Linux kernel 2.6.22. In Solaris 10, a
STREAMS-based network stack is replaced by a new
architecture named FireEngine [6] which provided better
connection affinity to CPUs, greatly reducing the connection
setup cost and the cost of per-packet processing. It merges all
protocol layers into one STREAMS module that is fully
multithreaded.
30%
7
6
25%
20%
15%
10%
CPU Utilization
TABLE I.
5%
1
0
0%
64
INIC (BW)
1K
DNIC (BW)
2K
4K
8K
I/O Size
INIC (CPU Util)
54
45%
40%
35%
30%
25%
20%
15%
10%
5%
Bandwidth (Gbps)
50%
CPU Utilization
0%
64
128
256
INIC(BW)
512
1K
DNIC(BW)
2K
4K
I/O Size
8K
16K
INIC(CPU Util)
32K
64K
DNIC(CPU Util)
40%
35%
Bandwidth (Gbps)
30%
7
6
25%
20%
15%
10%
CPU Utilization
5%
1
0
0%
1
16
32
64
Connections
DNIC(BW)
INIC(BW)
DNIC(CPU Util)
INIC(CPU Util)
120
115
Latency (us)
INIC
DNIC
110
105
100
95
90
64
128
256
512
I/O Size(Bytes)
1K
1.5K
55
25
Kernel
Iperf
20
Copy
15
Stack
Driver
10
Socket
Buffer
5
0
DNIC
INIC
5000
4000
atomic
inst_sw
3000
inst_other
2000
store
load
1000
0
16
32
64
Connections
4000
3500
3000
atomic
2500
inst_sw
2000
inst_other
1500
store
1000
load
500
0
1
8
16
Connections
32
64
4.0E+05
3.5E+05
DNIC
INIC
3.0E+05
2.5E+05
2.0E+05
1.5E+05
1.0E+05
5.0E+04
0.0E+00
1
8
Connections
16
32
64
3) Interrupt Rates
56
DNIC
INIC
1.6E+05
1.2E+05
8.0E+04
4.0E+04
0.0E+00
1
8
Connections
16
32
64
1.6E+05
Others
1.2E+05
CrossCall
8.0E+04
250
DNIC
200
INIC
150
100
50
0
1
16
32
64
Connections
4.0E+04
NIC
intterrupt
0.0E+00
INIC
0.12
DNIC
0.1
INIC
0.08
Packet
DNIC
0.06
0.04
0.02
0
1
8
16
Connections
32
64
DNIC
24
22
INIC
20
18
16
14
12
10
1
16
32
Conne ctions
57
64
DNIC
INIC
ACKNOWLEDGMENT
8
16
Connections
32
64
3000
DNIC
INIC
2500
2000
1500
1000
500
REFERENCES
0
Memory Read
Memory Write
[1]
[3]
[4]
[5]
[6]
58
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
Ram Huggahalli, Ravi Iyer, Scott Tetrick, Direct Cache Access for High
Bandwidth Network I/O, 32nd Annual International Symposium on
Computer Architecture (ISCA'05), 2005.
Intel
Core
2
Extreme
quad-core
processor,
http://www.intel.com/products/processor/core2XE/.
Intel
10
Gigabit
Ethernet
Controllers,
http://download.intel.com/design/network/prodbrf/317796.pdf
Iperf, http://dast.nlanr.net/Projects/Iperf
Amit Kumar, Ram Huggahalli, Impact of Cache Coherence Protocols on
the Processing of Network Traffic. 40th Annual IEEE/ACM
International Symposium on Microarchitecture (MICRO-40), 2007.
Srihari Makineni, Ravi Iyer, Architectural characterization of TCP/IP
packet processing on the Pentium M microprocessor. 10th HPCA, 2004.
Netpipe. http://www.scl.ameslab.gov/netpipe/
Michael Schlansker, Nagabhushan Chitlur, Erwin Oertli et al., HighPerformance Ethernet-Based Communications for Future Multi-Core
[15]
[16]
[17]
[18]
[19]
[20]
59