Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mark Wagner
Senior Software Engineer, Red Hat
10 Gb Ethernet - Overview
Tuning steps
environment
Intel, AMD
Don't assume settings shown will work for you without some tweaks
Take Aways
Do not assume all setting will work for you without some
tweaks
/usr/src/redhat/BUILD/kernel-2.6.18/linux-2.6.18.x86_64/Documentation/networking
AQuickExample
Internetsearchforlinuxtcp_window_scalingperformance
willshowsomesitessaytosetitto0otherssaysetitto
1
[root@perf12np2.4]#sysctlwnet.ipv4.tcp_window_scaling=0
[root@perf12np2.4]#./netperfP1l30H192.168.10.100
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.002106.40
[root@perf12np2.4]#sysctlwnet.ipv4.tcp_window_scaling=1
[root@perf12np2.4]#./netperfP1l30H192.168.10.100
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.015054.68
Platform Features
Multiple processors
Fast memory
Server
Driver
Tools
mpstat
vmstat
netstat
lspci
oprofile
sar
Tools (cont)
Tuning tools
ethtool
sysctl
ifconfig
setpci
netperf
/proc
ethtool
ethtoolcinterruptcoalesce
[root@perf10~]#ethtoolceth2
Coalesceparametersforeth2:
AdaptiveRX:offTX:off
statsblockusecs:0
sampleinterval:0
pktratelow:0
pktratehigh:0
rxusecs:5
rxframes:0
rxusecsirq:0
rxframesirq:0
txusecs:0
txframes:0
txusecsirq:0
txframesirq:0
<truncated>
ethtoolgHWRingBuffers
[root@perf10~]#ethtoolgeth2
Ringparametersforeth2:
Presetmaximums:
RX:16384
RXMini:0
RXJumbo:16384
TX:16384
Currenthardwaresettings:
RX:1024
RXMini:1024
RXJumbo:512
TX:1024
ethtoolkHWOffloadSettings
[root@perf10~]#ethtoolketh2
Offloadparametersforeth2:
Cannotgetdeviceudplargesendoffloadsettings:Operation
notsupported
rxchecksumming:on
txchecksumming:on
scattergather:on
tcpsegmentationoffload:on
udpfragmentationoffload:off
genericsegmentationoffload:off
These provide the ability to offload the CPU for calculating the
checksums, etc.
ethtoolidriverinformation
[root@perf10~]#ethtoolieth2
driver:cxgb3
version:1.0ko
firmwareversion:T5.0.0TP1.1.0
businfo:0000:06:00.0
[root@perf10~]#ethtoolieth3
driver:ixgbe
version:1.1.18
<truncated>
[root@dhcp47154~]#ethtoolieth2
driver:Neterion(ed.notes2io)
version:2.0.25.1
<truncated>
sysctl
sysctl -a
sysctl -q
- queries a variable
sysctl -w
- writes a variable
by core.rmem_max
by core/wmem_max
netperf
http://netperf.org
Feature Rich
Read documentation
UDP_STREAM
Many others
Check
1GbE
10GbE
lspci
lspcivalidateyourslotsettingforeachNIC
lspcivvs09:00.0
09:00.0Ethernetcontroller:10GbESinglePortProtocolEngineEthernetAdapter
<truncated>
Capabilities:[58]ExpressEndpointIRQ0
Device:Supported:MaxPayload4096bytes,PhantFunc0,ExtTag+
Device:LatencyL0s<64ns,L1<1us
Device:AtnBtnAtnIndPwrInd
Device:Errors:CorrectableNonFatalFatalUnsupported
Device:RlxdOrd+ExtTagPhantFuncAuxPwrNoSnoop+
Device:MaxPayload128bytes,MaxReadReq512bytes
Link:SupportedSpeed2.5Gb/s,Widthx8,ASPML0sL1,Port0
Link:LatencyL0sunlimited,L1unlimited
Link:ASPMDisabledRCB64bytesCommClkExtSynch
Link:Speed2.5Gb/s,Widthx4
Vectortable:BAR=4offset=00000000
Disable irqbalance
service irqbalance stop
chkconfig irqbalance off
Disable cpuspeed
Process
affinity
Use taskset or
Interrupt affinity
MRGs Tuna
Interrupt coalescing
sysctl.conf
Driver Setting
HW ring buffers
You just got those new 10GbE cards that you told the CIO
would greatly improve performance
You plug them in and run a quick netperf to verify your choice
NewBoards,firstRun
#./netperfP1l60H192.168.10.10
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638460.005012.24
NewBoards,firstrunmpstatPALL5
Transmit
CPU%sys%iowait%irq%soft%steal%idleintr/s
all2.170.000.351.170.0096.2312182.00
017.400.002.809.200.0070.0012182.00
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
50.000.000.000.000.00100.000.00
60.000.000.000.000.00100.000.00
70.000.000.000.000.00100.000.00
Receive
CPU%sys%iowait%irq%soft%steal%idleintr/s
all4.860.000.077.560.0087.4910680.90
038.900.000.6060.400.000.0010680.90
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
50.000.000.000.000.00100.000.00
60.000.000.000.000.00100.000.00
70.000.000.000.000.00100.000.00
echo80>/proc/irq/192/smp_affinity
Use TUNA
KnowYourCPUcorelayout
#cat/proc/cpuinfo
processor:0
physicalid:0
coreid:0
processor:1
physicalid:1
coreid:0
processor:2
physicalid:0
coreid:1
processor:3
physicalid:1
coreid:1
processor:4
physicalid:0
coreid:2
processor:5
physicalid:1
coreid:2
processor:6
physicalid:0
coreid:3
processor:7
physicalid:1
coreid:3
Socket0
Socket1
SettingIRQAffinity
Now lets move the interrupts, remember your core mapping is important
Note that the separate irq for TX and RX
Transmit
#grepeth2/proc/interrupts
CPU0CPU1CPU2CPU3CPU4CPU5CPU6CPU7
74:3603450000000PCIMSIXeth2tx0
82:6479600000000PCIMSIXeth2rx0
90:00000000PCIMSIXeth2lsc
#echo40>/proc/irq/74/smp_affinity
#echo80>/proc/irq/82/smp_affinity
Receive
#grepeth2/proc/interrupts
CPU0CPU1CPU2CPU3CPU4CPU5CPU6CPU7
194:64770000000PCIMSIXeth2
202:57954050000000PCIMSIXeth2(queue0)
#echo40>/proc/irq/194/smp_affinity
#echo80>/proc/irq/202/smp_affinity
TuningRun2,IRAAffinity
#./netperfP1l30H192.168.10.10
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.005149.89
Run2mpstatPALL5outputs
Transmit
CPU%sys%iowait%irq%soft%steal%idleintr/s
all2.350.000.431.430.0095.7512387.80
00.000.000.000.000.00100.001018.00
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
50.000.002.802.000.0095.204003.80
618.600.000.609.400.0070.807366.40
70.000.000.000.000.00100.000.00
Receive
CPU%sys%iowait%irq%soft%steal%idleintr/s
all4.670.000.077.750.0087.4910989.79
00.000.000.000.000.00100.001018.52
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
50.000.000.000.000.00100.000.00
60.000.000.000.000.00100.000.00
737.360.000.6061.940.000.009971.17
Run3AddProcessAffinity
#./netperfP1l30H192.168.10.10T5,5
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.004927.19
Run3Bottlenecks
Transmit
CPU%sys%iowait%irq%soft%steal%idleintr/s
all4.350.000.3516.000.0079.2511272.55
00.000.000.000.000.00100.001020.44
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
534.730.002.5962.080.000.004011.82
60.000.000.4065.600.0034.006240.28
70.000.000.000.000.00100.000.00
Receive
CPU%sys%iowait%irq%soft%steal%idleintr/s
all3.850.000.1017.620.0078.3917679.00
00.000.000.000.000.00100.001016.80
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
530.900.000.0063.700.005.100.00
60.000.000.000.000.00100.000.00
70.000.000.8077.300.0021.9016661.90
Currently at 1500
Run4Changesend()tosendfile()
#./netperfP1l30H192.168.10.10T5,5tTCP_SENDFILEF
/data.file
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.006689.77
Run4mpstatoutputsendfileoption
Transmit
CPU%sys%iowait%irq%soft%steal%idleintr/s
all1.550.000.387.300.0090.7012645.00
00.000.000.000.000.00100.001018.20
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
512.380.002.2014.170.0070.663973.40
60.000.000.8044.200.0055.007653.20
70.000.000.000.000.00100.000.00
Receive
CPU%sys%iowait%irq%soft%steal%idleintr/s
all5.730.310.0418.490.0075.006050.75
00.202.500.100.100.0095.201051.65
10.000.000.000.000.00100.000.00
20.100.000.000.000.0099.900.00
30.300.000.000.000.0098.200.00
40.000.000.000.000.00100.000.00
545.250.000.0052.750.001.900.00
60.000.000.000.000.00100.002.70
70.000.000.4095.010.004.594996.60
ifconfigeth2mtu9000up
Note
Run5KickupMTU=9000
#./netperfP1l30H192.168.10.10T5,5tTCP_SENDFILEF
/data.file
RecvSendSend
SocketSocketMessageElapsed
SizeSizeSizeTimeThroughput
bytesbytesbytessecs.10^6bits/sec
87380163841638430.009888.66
Run5KickupMTU=9000
TX
CPU%sys%iowait%irq%soft%steal%idleintr/s
all1.400.000.324.270.0093.9013025.80
00.000.000.000.000.00100.001015.00
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
511.000.002.207.400.0078.804003.80
60.000.000.4026.600.0073.008007.20
70.000.000.000.000.00100.000.00
RX
CPU%sys%iowait%irq%soft%steal%idleintr/s
all6.630.000.395.400.0087.1069932.10
00.000.000.000.000.00100.001017.80
10.000.000.000.000.00100.000.00
20.000.000.000.000.00100.000.00
30.000.000.000.000.00100.000.00
40.000.000.000.000.00100.000.00
553.000.000.0020.200.0022.900.00
60.000.000.000.000.00100.000.00
70.000.003.1023.020.0073.8768914.20
Features - Multi-queue
SingleRXQueueMultiplenetperf
1016.95192.168.10.37
819.41192.168.10.12
898.93192.168.10.17
961.87192.168.10.16
3696
CPU%sys%iowait%irq%soft%steal%idleintr/s
all3.900.000.0023.630.0072.441054.80
00.000.000.00100.000.000.001054.80
10.200.000.000.200.0099.600.00
28.200.000.0019.000.0072.800.00
38.220.000.0024.050.0067.740.00
40.000.000.000.000.00100.000.00
57.400.000.0024.800.0067.800.00
67.210.000.0021.240.0071.540.00
70.000.000.000.000.00100.000.00
MultipleRXQueues,multiplenetperfs
1382.25192.168.10.37
2127.18192.168.10.17
1726.71192.168.10.16
1986.31192.168.10.12
7171
CPU%sys%iowait%irq%soft%steal%idleintr/s
all6.550.000.1842.840.0050.4428648.00
02.400.000.006.600.0091.001015.40
111.450.000.4083.130.005.029825.00
22.000.000.2097.800.000.002851.80
30.000.000.000.000.00100.000.00
410.600.000.0036.200.0053.200.00
50.000.000.000.000.00100.000.00
611.220.000.0035.870.0052.910.00
714.770.000.8083.030.001.2014955.80
Interruptdistributionw/MultiQueue
[]#grepeth2/proc/interrupts
CPU0CPU1CPU2CPU3CPU4CPU5CPU6CPU7
130:50000000eth2
138:2411032798000000eth2q0
146:10182180300000eth2q1
154:10052425180000eth2q2
162:10001849812000eth2q3
170:10000730195000eth2q4
178:10000084269400eth2q5
186:10000001809018eth2q6
A different ballgame
TuningNetworkAppsMessages/sec
10 Gbit Nics Stoakley 2.67 to Bensley 3.0 Ghz
Tuning enet gains +25% in Ave Latency,
RT kernel reduced peak latency but smoother how much?
RedHatMRGPerformanceAMQPMess/s
Intel8cpu/16gb,10Gbenet
Messages/sec(32bytesize)
600000
500000
400000
300000
200000
100000
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
Samples(MillionMessage/sample)
rhel52_base
rhel52_tuned
rhelrealtime_tune
Latency
LowerRXLatencywithethtoolC
#ethtoolceth6
Coalesceparametersforeth6:
<truncate>
rxusecs:125
rxframes:0
rxusecsirq:0
rxframesirq:0
#./netperfH192.168.10.12tTCP_RR
Local/Remote
SocketSizeRequestResp.ElapsedTrans.
SendRecvSizeSizeTimeRate
bytesBytesbytesbytessecs.persec
16384873801110.008000.27
10GbEScalingw/XenonRHEL5.2
Going forward:
Wrap up
There are lots of knobs to use, the trick is finding them and
learning how to use them
Questions ?
Credits
Andy Gospo
Don Dutile
D John Shakshober
JanMark Holzer