Sei sulla pagina 1di 91

PerformanceAnalysisandSystemTuning

LarryWoodman
DJohnShakshober

Agenda
RedHatEnterpriseLinux(RHEL)PerformanceandTuning

Referencesvaluabletuningguides/books

Part1MemoryManagement/FileSystemCaching

Part2DiskandFileSystemIO

Part3PerformanceMonitoringTools

Part4PerformanceTuning/Analysis

Part5CaseStudies

LinuxPerformanceTuningReferences

Alikins,?SystemTuningInfoforLinuxServers,
http://people.redhat.com/alikins/system_tuning.html

Axboe,J.,?DeadlineIOSchedulerTunables,SuSE,EDFR&D,2003.

Braswell,B,Ciliendo,E,?TuningRedHatEnterpriseLinuxon
IBMeServerxSeriesServers,http://www.ibm.com/redbooks

Corbet,J.,?TheContinuingDevelopmentofIOScheduling?,
http://lwn.net/Articles/21274.

Ezolt,P,OptimizingLinuxPerformance,www.hp.com/hpbooks,Mar
2005.

Heger,D,Pratt,S,?WorkloadDependentPerformanceEvaluationofthe
Linux2.6IOSchedulers?,LinuxSymposium,Ottawa,Canada,July
2004.

RedHatEnterpriseLinuxPerformanceTuningGuide
http://people.redhat.com/dshaks/rhel3_perf_tuning.pdf

Network,NFSPerformancecoveredinseparatetalks

MemoryManagement

PhysicalMemory(RAM)Management

VirtualAddressSpaceMaps

32bit:x86up,smp,hugemem,1G/3Gvs4G/4G

64bit:x86_64,IA64

KernelWiredMemory

StaticBoottime

Slabcache

Pagetables

HughTLBfs

ReclaimableUserMemory

NUMAversusUMA

Pagecache/Anonymoussplit

PageReclaimDynamics

kswapd,bdflush/pdflush,kupdated

PhysicalMemory(RAM)Management

PhysicalMemoryLayout

NUMANodes

Zones

mem_maparray

Pagelists

Freelist

Active

Inactive

Memory Zones
32bit

64bit
Upto64GB(PAE)

HighmemZone

EndofRAM

NormalZone

896MBor3968MB

NormalZone

16MB
DMAZone
0

16MB(or4GB)
DMAZone
0

PerNUMANodeResources

Memoryzones(DMA&Normalzones)

CPUs

IO/DMAcapacity

Pagereclamationdaemon(kswapd#)

NUMA Nodes and Zones


64bit

Node1

EndofRAM

NormalZone

NormalZone

Node0
16MB(or4GB)
DMAZone
0

Memory Zone Utilization

DMA

Normal

24bitI/O KernelStatic
KernelDynamic
slabcache
bouncebuffers
driverallocations
UserOverflow

Highmem(x86)
User
Anonymous
Pagecache
Pagetables

PerZoneResources

mem_map

Freelists

Activeandinactivepagelists

Pagereclamation

Pagereclamationwatermarks

mem_map

Kernelmaintainsapagestructforeach4KB(16KBonIA64)
pageofRAM

Themem_maparrayconsumessignificantamountof
lowmematboottime.

Pagestructsize:

RHEL332bit=60bytes

RHEL364bit=112bytes

RHEL432bit=32bytes

RHEL464bit=56bytes

16GBx86runningRHEL3:

17179869184/4096*60=~250MBmem_maparray!!!

RHEL4mem_mapisonlyabout50%oftheRHEL3mem_map.

PerzoneFreelist/buddyallocatorlists

Kernelmaintainsperzonefreelist

Buddyallocatorcoalescesfreepagesintolargerphysicallycontiguouspieces
DMA
1*4kB4*8kB6*16kB4*32kB3*64kB1*128kB1*256kB1*512kB0*1024kB1*2048kB2*4096kB=11588kB)

Normal
217*4kB207*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=3468kB)

HighMem
847*4kB409*8kB17*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=7924kB)

Memoryallocationfailures

Freelistexhaustion.

Freelistfragmentation.

Perzonepagelists

Activemostrecentlyreferenced

Anonymousstack,heap,bss

Pagecachefilesystemdata

Inactiveleastrecentlyreferenced

Dirtymodified

Laundrywritebackinprogress

Cleanreadytofree

Free

Coalescedbuddyallocator

VirtualAddressSpaceMaps

32bit

3G/1Gaddressspace

4G/4Gaddressspace

64bit

X86_64

IA64

Linux 32-bit Address Spaces


3G/1GKernel(SMP)
VIRT
0GB3GB4GB
RAM

DMANormalHighMem
4G/4GKernel(Hugemem)
VIRT

User(s)
Kernel
0GB3968MB

DMANormal3968MBHighMem

Linux 64-bit Address Space


x86_64
VIRT

Kernel

User

01TB(2^40)
RAM
IA64
VIRT

0
RAM

MemoryPressure
32bit
DMA

Normal

Highmem

KernelAllocationsUserAllocations
64bit
DMA

Normal
KernelandUserAllocations

KernelMemoryPressure

StaticBoottime(DMAandNormalzones)

Kerneltext,data,BSS

Bootmemallocator

Tablesandhashes(mem_map)

Slabcache(Normalzone)

Kerneldatastructs

Inodecache,dentrycacheandbufferheaderdynamics

Pagetables(Highmem/Normalzone)

32bitversus64bit

HughTLBfs(Highmem/Normalzone)

ie4Kpagew/4GBmemory=1MillionTLBentries

4Mpagew/4GBmemory=1000TLBentries

UserMemoryPressure
Anonymous/pagecachesplit

PagecacheAllocationsPageFaults

pagecache

anonymous

PageCache/Anonymousmemorysplit

Pagecachememoryisglobalandgrowswhenfilesystemdataisaccessed
untilmemoryisexhausted.

Pagecacheisfreed:

Underlyingfilesaredeleted.

Unmountofthefilesystem.

Kswapdreclaimspagecachepageswhenmemoryisexhausted.

Anonymousmemoryisprivateandgrowsonuserdemmand

Allocationfollowedbypagefault.

Swapin.

Anonymousmemoryisfreed:

Processunmapsanonymousregionorexits.

Kswapdreclaimsanonymouspages(swapout)whenmemoryis
exhausted

Balancebetweenpagecacheandanonymousmemory.

Dynamic.

32-bit Memory Reclamation


KernelAllocationsUserAllocations
DMA

Normal

Highmem

KernelReclamationUserReclamation
(kswapd)(kswapd,bdflush/pdflush)
slapcachereaping

pageaging

inodecachepruningpagecacheshrinking

bufferheadfreeing swapping
dentrycachepruning

64-bit Memory Reclamation

RAM
KernelandUserAllocations

KernelandUserReclamation

Anonymous/pagecachereclaiming
PagecacheAllocationsPageFaults

pagecache

anonymous

kswapd(bdflush,kupdated)

kswapd

pagereclaim

pagereclaim(swapout)

deletionofafile
unmountfilesystem

unmap
exit

Per Node/Zone Paging Dynamics


UserAllocations
Reactivate

ACTIVE
Pageaging

INACTIVE

FREE

(Dirty>Clean)
Swapout
bdflush

Userdeletions

Reclaiming

Part2PerformanceMonitoringTools

StandardUnixOStools

Monitoringcpu,memory,process,disk

oprofile

KernelTools

/proc,info(cpu,mem,slab),dmesg,AltSysrq

Profilingnmi_watchdog=1,profile=2

Tracing(separatesummittalk)

strace,ltrace

dprobe,kprobe

3rdpartyprofiling/capacitymonitoring

Perfmon,Caliper,vtune

RedHatTopTools

CPUTools

MemoryTools

ProcessTools

1top

1top

1top

2vmstat

2vmstats

2psopmem

3psaux

3psaur

3gprof

4mpstatPall

4ipcs

4strace,ltrace

5saru

5sarrBW

5sar

6iostat

6free

7oprofile

7oprofile

1iostatx

8gnome

8gnome

2vmstatD

systemmonitor

systemmonitor

3sarDEV#

9KDEmonitor

9KDEmonitor

4nfsstat

10/proc

10/proc

5NEEDMORE!

DiskTools

toppresshhelp,mmemory,tthreads,>columnsort
top09:01:04up8days,15:22,2users,loadaverage:1.71,0.39,0.12
Tasks:114total,1running,113sleeping,0stopped,0zombie
Cpu0:5.3%us,2.3%sy,0.0%ni,0.0%id,92.0%wa,0.0%hi,0.3%si
Cpu1:0.3%us,0.3%sy,0.0%ni,89.7%id,9.7%wa,0.0%hi,0.0%si
Mem:2053860ktotal,2036840kused,17020kfree,99556kbuffers
Swap:2031608ktotal,160kused,2031448kfree,417720kcached

PIDUSERPRNIVIRTRESSHRS%CPU%MEMTIME+COMMAND
27830oracle1601315m1.2g1.2gD1.360.90:00.09oracle
27802oracle1601315m1.2g1.2gD1.061.00:00.10oracle
27811oracle1601315m1.2g1.2gD1.060.80:00.08oracle
27827oracle1601315m1.2g1.2gD1.061.00:00.11oracle
27805oracle1701315m1.2g1.2gD0.761.00:00.10oracle
27828oracle1502758466484620S0.30.30:00.17tpcc.exe
1root1604744580480S0.00.00:00.50init
2rootRT0000S0.00.00:00.11migration/0
3root3419000S0.00.00:00.00ksoftirqd/0

vmstatofIOzonetoEXT3fs6GBmem
#!depletememoryuntilpdflushturnson
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
200448352420052423457600546315251303096
020169784020052429314400057850482108539941221463
3001537884200524384109200193589463243144307321842
02052812020052462281720047888810177133921322246
01046140200524671373600179110719144718251303535
22050972200524670574400232119698131619710253144
....
#!nowtransitionfromwritetoreads
procsmemoryswapiosystemcpu
rbswpdfreebuffcachesisobiboincsussywaid
14051040200524670554400213351912658390265618
1103506420052467127240040118911136720210354223
01068264234372664702000767445420484032072073
01034468234372667801600773913416202834091872
01047320234372669035600810507717832916072073
10038756234372669834400761364420273705191972
01031472234372670653200767253316012807081973

iostatxofsameIOzoneEXT3filesystem
Iostatmetrics
ratesperfsecsizesandresponsetime
r|wrqm/srequestmerged/saverqszaveragerequestsz
r|wsec/s512bytesectors/savequszaveragequeuesz
r|wKB/sKilobyte/sawaitaveragewaittimems
r|w/soperations/ssvcmaveservicetimems
Linux2.4.2127.0.2.ELsmp(node1)05/09/2005

avgcpu:%user%nice%sys%iowait%idle
0.400.002.630.9196.06

Device:rrqm/swrqm/sr/sw/srsec/swsec/srkB/swkB/savgrqszavgquszawaitsvctm%util
sdi16164.600.00523.400.00133504.000.0066752.000.00255.071.001.911.8898.40
sdi17110.100.00553.900.00141312.000.0070656.000.00255.120.991.801.7898.40
sdi16153.500.00522.500.00133408.000.0066704.000.00255.330.981.881.8697.00
sdi17561.900.00568.100.00145040.000.0072520.000.00255.311.011.781.76100.00

SAR
[root@localhostredhat]#saru33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005

10:32:28PMCPU%user%nice%system%idle
10:32:31PMall0.000.000.00100.00
10:32:34PMall1.330.000.3398.33
10:32:37PMall1.340.000.0098.66
Average:all0.890.000.1199.00
[root]sarnDEV
Linux2.4.2120.EL(localhost.localdomain)03/16/2005

01:10:01PMIFACErxpck/stxpck/srxbyt/stxbyt/srxcmp/s
txcmp/srxmcst/s
01:20:00PMlo3.493.49306.16306.160.00
0.000.00
01:20:00PMeth03.893.532395.34484.700.00
0.000.00
01:20:00PMeth10.000.000.000.000.00
0.000.00

free/numastatmemoryallocation
[root@localhostredhat]#freel
totalusedfreesharedbuffers
cached
Mem:511368342336169032029712
167408
Low:51136834233616903200
0
High:00000
0
/+buffers/cache:145216366152
Swap:104324001043240
numastat(on2cpux86_64basedsystem)
node1node0
numa_hit980333210905630
numa_miss20490181609361
numa_foreign16093612049018
interleave_hit5868954749
local_node977092710880901
other_node20814231634090

ps,mpstat
[root@localhostroot]#psaux
[root@localhostroot]#psaux|more
USERPID%CPU%MEMVSZRSSTTYSTATSTARTTIMECOMMAND
root10.10.11528516?S23:180:04init
root20.00.000?SW23:180:00[keventd]
root30.00.000?SW23:180:00[kapmd]
root40.00.000?SWN23:180:00[ksoftirqd/0]
root70.00.000?SW23:180:00[bdflush]
root50.00.000?SW23:180:00[kswapd]
root60.00.000?SW23:180:00[kscand]

[root@localhostredhat]#mpstat33
Linux2.4.2120.EL(localhost.localdomain)05/16/2005
10:40:34PMCPU%user%nice%system%idleintr/s
10:40:37PMall3.000.000.0097.00193.67
10:40:40PMall1.330.000.0098.67208.00
10:40:43PMall1.670.000.0098.33196.00
Average:all2.000.000.0098.00199.22

[root@dhcp8336proc]#pstree

pstree

init atd
auditd
2*[automount]
bdflush
2*[bonoboactivati]
cannaserver
crond
cupsd
dhclient
eggcups
gconfd2
gdmbinarygdmbinary X
gnomesessionsshagent
2*[gnomecalculato]
gnomepanel
gnomesettings
gnometerminal bashxchat

bashcscopebashcscopebashcscopebashcscopebashcscopebash
bashcscopebashcscopebashcscopebashcscopevi
gnomeptyhelpe
gnometerminal bashsubashpstree
bashcscopevi
gnomeptyhelpe

The/procfilesystem

/proc

acpi

bus

irq

net

scsi

sys

tty

pid#

32bit/proc/<pid>/maps
[root@dhcp8336proc]#cat5808/maps
0022e0000023b000rxp0000000003:034137068/lib/tls/libpthread0.60.so
0023b0000023c000rwp0000c00003:034137068/lib/tls/libpthread0.60.so
0023c0000023e000rwp0000000000:000
0037f00000391000rxp0000000003:03523285/lib/libnsl2.3.2.so
0039100000392000rwp0001100003:03523285/lib/libnsl2.3.2.so
0039200000394000rwp0000000000:000
00c4500000c5a000rxp0000000003:03523268/lib/ld2.3.2.so
00c5a00000c5b000rwp0001500003:03523268/lib/ld2.3.2.so
00e5c00000f8e000rxp0000000003:034137064/lib/tls/libc2.3.2.so
00f8e00000f91000rwp0013100003:034137064/lib/tls/libc2.3.2.so
00f9100000f94000rwp0000000000:000
080480000804f000rxp0000000003:031046791/sbin/ypbind
0804f00008050000rwp0000700003:031046791/sbin/ypbind
09794000097b5000rwp0000000000:000
b5fdd000b5fde000p0000000000:000
b5fde000b69de000rwp0000100000:000
b69de000b69df000p0000000000:000
b69df000b73df000rwp0000100000:000
b73df000b75df000rp0000000003:033270410/usr/lib/locale/localearchive
b75df000b75e1000rwp0000000000:000
bfff6000c0000000rwpffff800000:000

64bit/proc/<pid>/maps
#cat/proc/2345/maps
004000000100b000rxp00000000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
0110b00001433000rwp00c0b000fd:001933328/usr/sybase/ASE12_5/bin/dataserver.esd3
01433000014eb000rwxp0143300000:000
4000000040001000p4000000000:000
4000100040a01000rwxp4000100000:000
2a95f730002a96073000p0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960730002a96075000rp0012b000fd:00819273/lib64/tls/libc2.3.4.so
2a960750002a96078000rwp0012d000fd:00819273/lib64/tls/libc2.3.4.so
2a960780002a9607e000rwp2a9607800000:000
2a9607e0002a98c3e000rws0000000000:06360450/SYSV0100401e(deleted)
2a98c3e0002a98c47000rwp2a98c3e00000:000
2a98c470002a98c51000rxp00000000fd:00819227/lib64/libnss_files2.3.4.so
2a98c510002a98d51000p0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d510002a98d53000rwp0000a000fd:00819227/lib64/libnss_files2.3.4.so
2a98d530002a98d57000rxp00000000fd:00819225/lib64/libnss_dns2.3.4.so
2a98d570002a98e56000p00004000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e560002a98e58000rwp00003000fd:00819225/lib64/libnss_dns2.3.4.so
2a98e580002a98e69000rxp00000000fd:00819237/lib64/libresolv2.3.4.so
2a98e690002a98f69000p00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f690002a98f6b000rwp00011000fd:00819237/lib64/libresolv2.3.4.so
2a98f6b0002a98f6d000rwp2a98f6b00000:000
35c7e0000035c7e08000rxp00000000fd:00819469/lib64/libpam.so.0.77
35c7e0800035c7f08000p00008000fd:00819469/lib64/libpam.so.0.77
35c7f0800035c7f09000rwp00008000fd:00819469/lib64/libpam.so.0.77
35c800000035c8011000rxp00000000fd:00819468/lib64/libaudit.so.0.0.0
35c801100035c8110000p00011000fd:00819468/lib64/libaudit.so.0.0.0
35c811000035c8118000rwp00010000fd:00819468/lib64/libaudit.so.0.0.0
35c900000035c900b000rxp00000000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c900b00035c910a000p0000b000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
35c910a00035c910b000rwp0000a000fd:00819457/lib64/libgcc_s3.4.420050721.so.1
7fbfff10007fc0000000rwxp7fbfff100000:000

#cat/proc/meminfo
MemTotal:514060kB
MemFree:23656kB
Buffers:53076kB
Cached:198344kB
SwapCached:0kB
Active:322964kB
Inactive:60620kB
HighTotal:0kB
HighFree:0kB
LowTotal:514060kB
LowFree:23656kB
SwapTotal:1044216kB
SwapFree:1044056kB
Dirty:40kB
Writeback:0kB
Mapped:168048kB
Slab:88956kB
Committed_AS:372800kB
PageTables:3876kB
VmallocTotal:499704kB
VmallocUsed:6848kB
VmallocChunk:491508kB
HugePages_Total:0
HugePages_Free:0
Hugepagesize:2048kB

/proc/meminfo

/proc/slabinfo
slabinfoversion:2.0

biovec128256260153652:tunables24128:slabdata52520
biovec6425626076851:tunables54278:slabdata52520
biovec16256270256151:tunables120608:slabdata18180
biovec425630564611:tunables120608:slabdata550
biovec159069385907188162261:tunables120608:slabdata26138261380
bio59069465907143128311:tunables120608:slabdata1905531905530
file_lock_cache712396411:tunables120608:slabdata330
sock_inode_cache296351271:tunables54278:slabdata990
skbuff_head_cache202540256151:tunables120608:slabdata36360
sock610384101:tunables54278:slabdata110
proc_inode_cache139209360111:tunables54278:slabdata19190
sigqueue227148271:tunables120608:slabdata110
idr_layer_cache82116136291:tunables120608:slabdata440
buffer_head6602713380052751:tunables120608:slabdata178417840
mm_struct447076851:tunables54278:slabdata14140
kmem_cache150150256151:tunables120608:slabdata10100

AltSysrqMRHEL3/UMA
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:2929min:0low:0high:0
Zone:Normalfreepages:1941min:510low:2235high:3225
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:4870(0HighMem)
(Active:72404/13523,inactive_laundry:2429,inactive_clean:1730,free:4870)
aa:0ac:0id:0il:0ic:0fr:2929
aa:46140ac:26264id:13523il:2429ic:1730fr:1941
aa:0ac:0id:0il:0ic:0fr:0
1*4kB4*8kB2*16kB2*32kB1*64kB2*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11716kB)
1255*4kB89*8kB5*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=7764kB)
Swapcache:add958119,delete918749,find4611302/5276354,race0+1
27234pagesofslabcache
244pagesofkernelstacks
1303lowmempagetables,0highmempagetables
0bouncebufferpages,0areontheemergencylist
Freeswap:598960kB
130933pagesofRAM
0pagesofHIGHMEM
3497reservedpages
34028pagesshared
39370pagesswapcached

AltSysrqMRHEL3/NUMA
SysRq:ShowMemory
Meminfo:
Zone:DMAfreepages:0min:0low:0high:0
Zone:Normalfreepages:369423min:1022low:6909high:9980
Zone:HighMemfreepages:0min:0low:0high:0
Zone:DMAfreepages:2557min:0low:0high:0
Zone:Normalfreepages:494164min:1278low:9149high:13212
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:866144(0HighMem)
(Active:9690/714,inactive_laundry:764,inactive_clean:35,free:866144)
aa:0ac:0id:0il:0ic:0fr:0
aa:746ac:2811id:188il:220ic:0fr:369423
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:2557
aa:1719ac:4414id:526il:544ic:35fr:494164
aa:0ac:0id:0il:0ic:0fr:0
2497*4kB1575*8kB902*16kB515*32kB305*64kB166*128kB96*256kB56*512kB39*1024kB30*2048kB300*4096kB=1477692kB)
Swapcache:add288168,delete285993,find726/2075,race0+0
4059pagesofslabcache
146pagesofkernelstacks
388lowmempagetables,638highmempagetables
Freeswap:1947848kB
917496pagesofRAM
869386freepages
30921reservedpages
21927pagesshared
2175pagesswapcached
Buffermemory:9752kB
Cachememory:34192kB
CLEAN:696buffers,2772kbyte,51used(last=696),0locked,0dirty0delay
DIRTY:4buffers,16kbyte,4used(last=4),0locked,3dirty0delay

AltSysrqMRHEL4/UMA
SysRq:ShowMemory
Meminfo:
Freepages:20128kB(0kBHighMem)
Active:72109inactive:27657dirty:1writeback:0unstable:0free:5032slab:19306mapped:41755pagetables:945
DMAfree:12640kBmin:20kBlow:40kBhigh:60kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:847
all_unreclaimable?yes
protections[]:000
Normalfree:7488kBmin:688kBlow:1376kBhigh:2064kBactive:288436kBinactive:110628kBpresent:507348kB
pages_scanned:0all_unreclaimable?no
protections[]:000
HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0
all_unreclaimable?no
protections[]:000
DMA:4*4kB4*8kB3*16kB4*32kB4*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB2*4096kB=12640kB

0*1024kB0*2048kB0*4096kB=7488kB

Normal:1052*4kB240*8kB39*16kB3*32kB0*64kB1*128kB0*256kB1*512kB
HighMem:empty
Swapcache:add52,delete52,find3/5,race0+0
Freeswap:1044056kB
130933pagesofRAM
0pagesofHIGHMEM
2499reservedpages
71122pagesshared
0pagesswapcached

AltSysrqMRHEL4/NUMA
Freepages:16724kB(0kBHighMem)
Active:236461inactive:254776dirty:11writeback:0unstable:0free:4181slab:13679mapped:34073pagetables:853
Node1DMAfree:0kBmin:0kBlow:0kBhigh:0kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node1Normalfree:2784kBmin:1016kBlow:2032kBhigh:3048kBactive:477596kBinactive:508444kBpresent:1048548kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node1HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node0DMAfree:11956kBmin:12kBlow:24kBhigh:36kBactive:0kBinactive:0kBpresent:16384kBpages_scanned:1050all_unreclaimable?yes
protections[]:000
Node0Normalfree:1984kBmin:1000kBlow:2000kBhigh:3000kBactive:468248kBinactive:510660kBpresent:1032188kBpages_scanned:0
all_unreclaimable?no
protections[]:000
Node0HighMemfree:0kBmin:128kBlow:256kBhigh:384kBactive:0kBinactive:0kBpresent:0kBpages_scanned:0all_unreclaimable?no
protections[]:000
Node1DMA:empty
Node1Normal:0*4kB0*8kB30*16kB10*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=2784kB
Node1HighMem:empty
Node0DMA:5*4kB4*8kB4*16kB2*32kB2*64kB3*128kB2*256kB1*512kB0*1024kB1*2048kB2*4096kB=11956kB
Node0Normal:0*4kB0*8kB0*16kB0*32kB1*64kB1*128kB1*256kB1*512kB1*1024kB0*2048kB0*4096kB=1984kB
Node0HighMem:empty
Swapcache:add44,delete44,find0/0,race0+0
Freeswap:2031432kB
524280pagesofRAM
10951reservedpages
363446pagesshared
0pagesswapcached

AltSysrqT
bashRcurrent016091606
(NOTLB)
CallTrace:[<c02a1897>]snprintf[kernel]0x27(0xdb3c5e90)
[<c01294b3>]call_console_drivers[kernel]0x63(0xdb3c5eb4)
[<c01297e3>]printk[kernel]0x153(0xdb3c5eec)
[<c01297e3>]printk[kernel]0x153(0xdb3c5f00)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f0c)
[<c010c289>]show_trace[kernel]0xd9(0xdb3c5f14)
[<c0125992>]show_state[kernel]0x62(0xdb3c5f24)
[<c01cfb1a>]__handle_sysrq_nolock[kernel]0x7a(0xdb3c5f38)
[<c01cfa7d>]handle_sysrq[kernel]0x5d(0xdb3c5f58)
[<c0198f43>]write_sysrq_trigger[kernel]0x53(0xdb3c5f7c)
[<c01645b7>]sys_write[kernel]0x97(0xdb3c5f94)
*thiscangetBIGloggedin/var/log/messages

Kernelprofiling
1.Enablekernelprofiling.
Onthekernelbootlineaddprofile=2nmi_watchdog=1
i.e.kernel/vmlinuz2.6.928.EL.smproprofile=2
nmi_watchdog=1root=0805
thenreboot.
2.Createaandrunashellscriptcontainingthefollowinglines:
#!/bin/sh
while/bin/true;do
echo;date
/usr/sbin/readprofilev|sortnr+2|head15
/usr/sbin/readprofiler
sleep5
done

Kernelprofiling
/usr/sbin/readprofilev|sortnr+2|head15
[root]tiobench]#morerhel4_read_64k_prof.log
FriJan2808:59:19EST2005
0000000000000000total2394230.1291
ffffffff8010e3a0do_arch_prctl238564213.0036
ffffffff80130540del_timer950.5398
ffffffff80115940read_ldt500.6250
ffffffff8015d21c.text.lock.shmem440.1048
ffffffff8023e480md_do_sync400.0329
ffffffff801202f0scheduler_tick380.0279
ffffffff80191cf0dma_read_proc300.2679
ffffffff801633b0get_unused_buffer_head250.0919
ffffffff801565d0rw_swap_page_nolock250.0822
ffffffff8023d850status_unused240.1500
ffffffff80153450scan_active_list240.0106
ffffffff801590a0try_to_unuse230.0288
ffffffff80192070read_profile220.0809
ffffffff80191f80swaps_read_proc180.1607
Linux2.6.95.ELsmp(perf1.lab.boston.redhat.com)01/28/2005

oprofilebuiltintoRHEL4(smp)

opcontrolon/offdata

opreportanalyzeprofile

startstartcollection

rreverseordersort

stopstopcollection

t[percentage]theshold

dumpoutputtodisk

event=:name:count

toview

Example:
#opcontrolstart

f/path/filename

ddetails

opannotate

#/bin/timetest1&

s/path/source

#sleep60

a/path/assembly

#opcontrolstop
#opcontroldump

oprofileopcontrolandopreportcpu_cycles
#vmlinux2.6.9prep
CPU:Itanium2,speed1300MHz(estimated)
CountedCPU_CYCLESevents(CPUCycles)withaunitmaskof0x00(Nounitmask)count100000
samples%imagenameappnamesymbolname
909368968.9674vmlinuxvmlinuxdefault_idle
9698857.3557vmlinuxreread_spin_unlock_irq
7444455.6459vmlinuxreread_spin_unlock_irqrestore
4201033.1861vmlinuxvmlinux_spin_unlock_irqrestore
1464131.1104vmlinuxreread__blockdev_direct_IO
749180.5682vmlinuxvmlinux_spin_unlock_irq
652130.4946vmlinuxrereadkmem_cache_alloc
594530.4509vmlinuxvmlinuxdio_bio_complete
586360.4447vmlinuxrereadmempool_alloc
566750.4298scsi_mod.korereadscsi_decide_disposition
539650.4093vmlinuxrereaddio_bio_complete
530790.4026vmlinuxrereadbio_check_pages_dirty
530350.4022vmlinuxvmlinuxbio_check_pages_dirty
474300.3597vmlinuxvmlinux__end_that_request_first
472630.3584vmlinuxrereadget_request
433830.3290vmlinuxreread__end_that_request_first
402510.3053qla2xxx.korereadqla2xxx_get_port_name
359190.2724scsi_mod.koreread__scsi_device_lookup
355640.2697vmlinuxrereadaio_read_evt
328300.2490vmlinuxrereadkmem_cache_free
327380.2483scsi_mod.koscsi_modscsi_remove_host

Profiling Tools: OProfile

Open source project


http://oprofile.sourceforge.net

Events to measure with Oprofile:

Upstream; Red Hat contributes

Originally modeled after DEC Continuous


Profiling Infrastructure (DCPI)

Initially time-based samples most useful:

System-wide profiler (both kernel and


user code)
Sample-based profiler with SMP machine
support
Performance monitoring hardware support

Relatively low overhead, typically <10%

Designed to run for long times

Included in base Red Hat Enterprise Linux


product

PPro/PII/PIII/AMD:
CPU_CLK_UNHALTED

P4: GLOBAL_POWER_EVENTS

IA64: CPU_CYCLES

TIMER_INT (fall-back profiling mechanism)


default

Processor specific performance monitoring


hardware can provide additional kinds of sampling

Red Hat Confidential

Many events to choose from

Branch mispredictions

Cache misses - TLB misses

Pipeline stalls/serializing instructions

Profiling Tools: SystemTap

Open Source project (started 01/05)

Collaboration between Red Hat, Intel,


and IBM

Linux answer to Solaris DTrace

A tool to take a deeper look into a running


system:

Provides insight into system operation

Assists in identifying causes of


performance problems

Simplifies building instrumentation

probescript

elaborate
probesetlibrary

translatetoC,compile*
loadmodule,startprobe

Current snapshots available from:


http://sources.redhat.com/systemtap

Scheduled for inclusion Red Hat Enterprise


Linux Update 2 (Fall 2005)

parse

probekernelobject

extractoutput,unload
probeoutput

X86, X86-64, PPC64, Itanium2


*SolarisDtraceisinterpretive

Red Hat Confidential

Part3GeneralSystemTuning

HowtotuneLinux

Capacitytuning

Fixedbyaddingresources

CPU,memory,disk,network

PerformanceTuning

Methodology
1)Documentconfig
2)Baselineresults
3)Whileresultsnonoptimal
a)Monitor/Instrumentsystem/workload
b)Applytuning1changeatatime
c)Analyzeresults,exitorloop
4)Documentfinalconfig

Tuninghowtosetkernelparameters

/proc
[root@hairballfs]#cat/proc/sys/kernel/sysrq
0
[root@hairballfs]#echo1>/proc/sys/kernel/sysrq
[root@hairballfs]#cat/proc/sys/kernel/sysrq
1

Sysctlcommand
[root@hairballfs]#sysctlkernel.sysrq
kernel.sysrq=0
[root@hairballfs]#sysctlwkernel.sysrq=1
kernel.sysrq=1
[root@hairballfs]#sysctlkernel.sysrq
kernel.sysrq=1

Editthe/etc/sysctl.conffile
#KernelsysctlconfigurationfileforRedHatLinux
#ControlstheSystemRequestdebuggingfunctionalityofthekernel
kernel.sysrq=1

CapacityTuning

Memory

/proc/sys/vm/overcommit_memory

/proc/sys/vm/overcommit_ratio

/proc/sys/vm/max_map_count

/proc/sys/vm/nr_hugepages

Kernel

/proc/sys/kernel/msgmax

/proc/sys/kernel/msgmnb

/proc/sys/kernel/msgmni

/proc/sys/kernel/shmall

/proc/sys/kernel/shmmax

/proc/sys/kernel/shmmni

/proc/sys/kernel/threadsmax

Filesystems

/proc/sys/fs/aio_max_nr

/proc/sys/fs/file_max

OOMkillsswapspaceexhaustion
Meminfo:
Zone:DMAfreepages:975min:1039low:1071high:1103
Zone:Normalfreepages:126min:255low:1950high:2925
Zone:HighMemfreepages:0min:0low:0high:0
Freepages:1101(0HighMem)
(Active:118821/401,inactive_laundry:0,inactive_clean:0,free:1101)
aa:1938ac:18id:44il:0ic:0fr:974
aa:115717ac:1148id:357il:0ic:0fr:126
aa:0ac:0id:0il:0ic:0fr:0
6*4kB0*8kB0*16kB1*32kB0*64kB0*128kB1*256kB1*512kB1*1024kB1*2048kB0*4096kB=3896kB)
0*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB0*512kB0*1024kB0*2048kB0*4096kB=504kB)
Swapcache:add620870,delete620870,find762437/910181,race0+200
2454pagesofslabcache
484pagesofkernelstacks
2008lowmempagetables,0highmempagetables
Freeswap:0kB
129008pagesofRAM
0pagesofHIGHMEM
3045reservedpages
4009pagesshared
0pagesswapcached

OOMkillslowmemconsumption
Meminfo:
Zone:DMAfreepages:2029min:0low:0high:0
Zone:Normalfreepages:1249min:1279low:4544high:6304
Zone:HighMemfreepages:746min:255low:29184high:43776
Freepages:4024(746HighMem)
(Active:703448/665000,inactive_laundry:99878,inactive_clean:99730,free:4024)
aa:0ac:0id:0il:0ic:0fr:2029
aa:128ac:3346id:113il:240ic:0fr:1249
aa:545577ac:154397id:664813il:99713ic:99730fr:746
1*4kB0*8kB1*16kB1*32kB0*64kB1*128kB1*256kB1*512kB1*1024kB1*2048kB1*4096kB=8116kB)
543*4kB35*8kB77*16kB1*32kB0*64kB0*128kB1*256kB0*512kB1*1024kB0*2048kB0*4096kB=4996kB)
490*4kB2*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=2984kB)
Swapcache:add4327,delete4173,find190/1057,race0+0
178558pagesofslabcache
1078pagesofkernelstacks
0lowmempagetables,233961highmempagetables
Freeswap:8189016kB
2097152pagesofRAM
1801952pagesofHIGHMEM
103982reservedpages
115582774pagesshared
154pagesswapcached
OutofMemory:Killedprocess27100(oracle).

PerformanceTuningVM(RHEL3)

/proc/sys/vm/bdflush

/proc/sys/vm/pagecache

/proc/sys/vm/inactive_clean_percent

/proc/sys/vm/pagecluster

/proc/sys/vm/kscand_work_percent

Swapdevicelocation

Kernelselection

X86smp

X86Hughmem

x86_64numa

RHEL3/proc/sys/vm/bdflush
intnfract;/*Percentageofbuffercachedirtytoactivatebdflush*/

intndirty;/*Maximumnumberofdirtyblockstowriteoutperwakecycle*/
intdummy2;/*old"nrefill"*/
intdummy3;/*unused*/
intinterval;/*jiffiesdelaybetweenkupdateflushes*/
intage_buffer;/*Timefornormalbuffertoagebeforeweflushit*/
intnfract_sync;/*Percentageofbuffercachedirtytoactivatebdflushsynchronously
intnfract_stop_bdflush;/*Percetangeofbuffercachedirtytostopbdflush*/
intdummy5;/*unused*/

Example:
SettingsforServerwithampleIOconfig(defaultr3gearedforws)
sysctlwvm.bdflush=505000002005000300060200

RHEL3/proc/sys/vm/pagecache

pagecache.minpercent

Lowerlimitforpagecachepagereclaiming.

Kswapdwillstopreclaimingpagecachepagesbelowthis
percentofRAM.

pagecache.borrowpercnet

KswapdattemptstokeepthepagecacheatthispercentorRAM

pagecache.maxpercent

Upperlimitforpagecachepagereclaiming.

RHEL2.1hardlimit,pagecachewillnotgrowabovethispercent
ofRAM.

RHEL3kswapdonlyreclaimspagecachepagesabovethis
percentofRAM.

Increasingmaxpercentwillincreaseswapping

Example:echo11050>/proc/sys/vm/pagecache

PerformanceTuningVM(RHEL4)

/proc/sys/vm/swappiness

/proc/sys/vm/dirty_ratio

/proc/sys/vm/dirty_background_ratio

/proc/sys/vm/vfs_cache_pressure

/proc/sys/vm/lower_zone_protection

Swapdevicelocation

Kernelselection

X86smp

X86Hughmem

x86_64numa

kernelselection

X86standardkernel(noPAE,3G/1G)

UPsystemswith<=4GBRAM

PAEcosts~5%inperformance

X86SMPkernel(PAE,3G/1G)

SMPsystemswith<~12GBRAM

Highmem/Lowmemratio<=10:1

4G/4Gcosts~5%

X86Hugememkernel(PAE,4G/4G)

SMPsystems>~12GBRAM

X86_64,IA64

Ifsingleapplicationisusing>1NUMAzoneofRAM

numa=offboottimeoption

/proc/sys/vm/numa_memory_allocator

kernelselection(16GBx86runningSMP)
Zone:DMAfreepages:2207min:0low:0high:0
Zone:Normalfreepages:484min:1279low:4544high:6304
Zone:HighMemfreepages:266min:255low:61952high:92928
Freepages:2957(266HighMem)
(Active:245828/1297300,inactive_laundry:194673,inactive_clean:194668,free:2957)
aa:0ac:0id:0il:0ic:0fr:2207
aa:630ac:1009id:189il:233ic:0fr:484
aa:195237ac:48952id:1297057il:194493ic:194668fr:266
1*4kB1*8kB1*16kB1*32kB1*64kB0*128kB0*256kB1*512kB0*1024kB0*2048kB2*4096kB=8828kB)
48*4kB8*8kB97*16kB4*32kB0*64kB0*128kB0*256kB0*512kB0*1024kB0*2048kB0*4096kB=
1936kB)
12*4kB1*8kB1*16kB1*32kB1*64kB1*128kB1*256kB1*512kB0*1024kB0*2048kB0*4096kB=
1064kB)
Swapcache:add3838024,delete3808901,find107105/1540587,race0+2
138138pagesofslabcache
1100pagesofkernelstacks
0lowmempagetables,37046highmempagetables
Freeswap:3986092kB
4194304pagesofRAM
3833824pagesofHIGHMEM

aa:0ac:0id:0il:0ic:0fr:0

x86_64numa

aa:901913ac:1558id:61553il:11534ic:6896fr:10539
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:0
aa:867678ac:879id:100296il:19880ic:10183fr:17178
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:0
aa:869084ac:1449id:100926il:18792ic:11396fr:14445
aa:0ac:0id:0il:0ic:0fr:0
aa:0ac:0id:0il:0ic:0fr:2617
aa:769ac:2295id:256il:2ic:825fr:861136
aa:0ac:0id:0il:0ic:0fr:0

CPU Scheduler

Recognizes differences between


logical and physical processors

I.E. Multi-core, hyperthreaded


& chips/sockets

Optimizes process scheduling


to take advantage of shared
on-chip cache, and NUMA memory
nodes

Socket 0
Core 0
Thread 0

Core 1
Thread 0

Implements multilevel run queues


for sockets and cores (as
opposed to one run queue
per processor or per system)

Thread 1

Strong CPU affinity avoids


task bouncing

Socket 1

Thread 1

Thread 1

Socket 2

Process

Process

Process

Process

Process

Process

Process

Process

Process

Process

Requires system BIOS to report


CPU topology correctly

Thread 0

Process

Process

Scheduler Compute Queues


Red Hat Confidential

NUMA considerations
Red Hat Enterprise Linux 4 provides improved NUMA support over version3

Goal to locate application pages in low latency memory (local to CPU)

AMD64, Itanium2

Enabled by default (or boot command line NUMA=[on,off])

Numactl to setup NUMA behavior

Used by latest TPC/H benchmark (>5% gain)


RHEL4U2HPL5854dualcoreAMD64
McCalpinStreamCopyb(x)=a(x)

8000

120.0%

7000

100.0%

6000
5000
4000
3000

80.0%

Copynuma=off

60.0%

%gainnuma/nonnuma

40.0%

2000
20.0%

1000
0

0.0%
1

Red Hat Confidential

Copynuma=on

DiskIO

iostacklunlimits

RHEL3255inSCSIstack

RHEL42**20,18kusefulwithFiberChannel

/proc/scsituning

quedepthtuningperlun

IRQdistributiondefault,smpaffinitymask

editR3/etc/modules.conf,R4modprob
echo03>/proc/irq/<pid>/smp_affinity

scalability

Lunstestedupto64luns

FCadaptors12HBA@2.2GB/s,74kIO/sec

Nodestestedupto20nodesw/DIO

Asynchronous I/O to File Systems

Eliminates Synchronous I/O stall


Critical for I/O intensive server applications

Stall for
completion

App I/O
Request

Device
Driver
I/O Request
Issue
I/O

Red Hat Enterprise Linux feature since 2002

Synchronous I/O

Allows application to continue processing while


I/O is in progress

Support for RAW devices only

Application

With Red Hat Enterprise Linux 4, significant


improvement:

Support for Ext3, NFS, GFS file system


access
Supports Direct I/O (e.g. Database
applications)
Makes benchmark results more appropriate
for real-world comparisons

Asynchronous I/O

No stall for
completion

App I/O
Request

Device
Driver
I/O Request
Issue
I/O

I/O
Completion
Application

Red Hat Confidential

I/O Request
Completion

I/O Request
Completion

R4 U2 FC AIO Read

R4 U2 FC AIO Write Perf


180

140

160

120

140

100

4k
8k

80

16k

60

32k
64k

40

MB/sec

MB/sec

160

120

4k
8k
16k
32k
64k

100
80
60
40

20

20

aios

16

32

64

Red Hat Confidential

aios

16

32

64

PerformanceTuningDISKRHEL3
[root@dhcp8336sysctl]#/sbin/elvtune/dev/hda

/dev/hdaelevatorID0
read_latency:2048
write_latency:8192
max_bomb_segments:6

[root@dhcp8336sysctl]#/sbin/elvtuner1024w
2048/dev/hda

/dev/hdaelevatorID0
read_latency:1024
write_latency:2048
max_bomb_segments:6

DiskIOtuningRHEL4

RHEL44tunableI/OSchedulers

CFQelevator=cfq.CompletelyFairQueuingdefault,
balanced,fairformultipleluns,adaptors,smpservers

NOOPelevator=noop.Nooperationinkernel,simple,low
cpuoverhead,leaveopttoramdisk,raidcntrletc.

Deadlineelevator=deadline.Optimizeforruntimelike
behavior,lowlatencyperIO,balanceissueswithlargeIO
luns/controllers

Anticipatoryelevator=as.Insertsdelaystohelpstack
aggregateIO,bestonsystemw/limitedphysicalIOSATA

Setatboottimeoncommandline

FileSystems

Separateswapandbusypartitionsetc.

EXT2/EXT3separatetalk
http://www.redhat.com/support/wpapers/redhat/ext3/*.html

Tune2fsormountoptions

data=orderedonlymetadatajournaled

data=journalbothmetadataanddatajournaled

data=writebackusewithcare!

SetupdefaultblocksizeatmkfsbXX

RHEL4EXT3improvesperformance

Scalabilityupto5Mfile/system

Sequentialwritebyusingblockreservations

Increasefilesystemupto8TB

GFSglobalfilesystemclusterfilesystem

Part4RHEL3vsRHEL4PerformanceCaseStudy

SchedulerO(1)taskset

IOzoneRHEL3/4

EXT3

GFS

NFS

OLTPOracle10G

o_direct,asyncIO,hughmem/page

RHELIOelevators

IOzone Benchmark
http://www.iozone.org/
IOzone is a filesystem benchmark tool.
The benchmark tests file I/O performance for
the following operations:
Write, re-write, random write
Read, re-read, random read, read backwards, read
strided, pread
Fread, fwrite, mmap, aio_read, aio_write

IOzone Sample Output


Rhel4 Ext 3 Seq Write 100 MB

60000
5000060000

50000

4000050000
3000040000

Bandwidth 40000
KB/sec
30000

2000030000
1000020000
010000

20000

512

10000

128
0
1024

32
2048

4096

8192

Transfersize(bytes)

16384

32768

65536

Filesize(k)

Understanding IOzone Results

GeoMean per category are


statistically meaningful.
Understand HW setup
Disk, RAID, HBA, PCI

Layout file systems


LVM or MD devices
Partions w/ fdisk

Baseline raw IO DD/DT


EXT3 perf w/ IOzone
In-cache file sizes which fit
goal -> 90% memory BW.
Out-of-cache file sizes more
tan 2x memory size
O_DIRECT 95% of raw

Global File System GFS goal


--> 90-95% of local EXT3

Use raw command


fdisk /dev/sdX
raw /dev/raw/rawX /dev/sdX1
dd if=/dev/raw/rawX bs=64k

Mount file system


mkfs t ext3 /dev/sdX1
Mount t ext3 /dev/sdX1 /perf1

IOzone commands
Iozone a f /perf1/t1 (incache)
Iozone a -I f /perf1/t1 (w/ dio)
Iozone s 2xmem f /perf1/t1 (big)

NFS vs EXT3 Comparison


IOzone cached R4 U2 EXT3 vs NFS
GeoMean 1mb-4gb files, 1k-1m transfers
2000000

120.0%
100.0%

1500000

80.0%
60.0%
40.0%

1000000
500000

20.0%
0.0%

0
Fwrite

Re-fwrite

Fread

Re-fread

Overall
GeoMean

Red Hat Confidential

R4_U2 EXT3
R4_U2_NFS
%Diff

GFS vs EXT3 Iozone Comparison


IOzone cached R4 U2 EXT3 vs GFS
GeoMean 1mb-4gb files, 1k-1m transfers
2000000

98.0%

1500000

96.0%
94.0%

1000000

92.0%

500000

90.0%

88.0%
Fwrite

Re-fwrite

Fread

Re-fread

Red Hat Confidential

Overall
GeoMean

R4_U2
R4_U2_GFS
%Diff

UsingIOzonew/o_directmimicdatabase

Problem:

Filesystemsusememoryforfilecache

Databasesusememoryfordatabasecache

Userswantfilesystemformanagementoutside
databaseaccess(copy,backupetc)

YouDON'TwantBOTHtocache.

Solution:

FilesystemsthatsupportDirectIO

Openfileswitho_directoption

DatabaseswhichsupportDirectIO(ORACLE)

NODOUBLECACHING!

NFSvsEXT3DIOIozoneComparison
IOzone (DIO) R4 U2 EXT3 vs NFS
GeoMean 1mb-4gb files, 1k-1m transfers
100000
90000
80000
70000
60000
50000
40000
30000
20000
10000
0

70.0%
60.0%
50.0%
40.0%
30.0%
20.0%
10.0%

Random
Read
Random
Write
Backward
Read
Record
Rewrite
Stride
Read
Overall
GeoMean

Re-reader

Reader

Re-writer

Writer

0.0%

R4_U2 EXT3
R4_U2_NFS
%Diff

GFSGlobalClusterFileSystem

GFSseparatesummittalk

V6.0shippinginRHEL3

V6.1shipsw/RHEL4U1

HintatGFSPerformanceinRHEL3

Datafromdifferentserver/setup

HPAMD644cpu,2.4Ghz,8GBmemory

1QLA2300FiberChannel,1EVA5000

ComparedGFSiozonetoEXT3

GFSvsEXT3DIOIozoneComparison
IOzone (DIO) R4 U2 EXT3 vs GFS
GeoMean 1mb-4gb files, 1k-1m transfers

105.0%

R4_U2

60000

100.0%

R4_U2_GFS

40000

95.0%

%Diff

20000

90.0%

85.0%
Random
Read
Random
Write
Backward
Read
Record
Rewrite
Stride
Read
Overall
GeoMean

80000

Re-reader

110.0%

Reader

100000

Re-writer

115.0%

Writer

120000

EvaluatingOraclePerformance

UseOLTPworkloadbasedonTPCC

ResultswithvariousOracleTuningoptions

RAWvsEXT3w/o_direct(iedirectIOiniozone)

ASYNCIOoptionsw/Oracle,supportedinRHEL4/EXT3

HUGHMEMkernelsonx86kernels

ResultscomparingRHEL4IOschedulers

CFQ

DEADLINE

NOOP

AS

RHEL3baseline

Oracle10GOLTPext3,gfs/nfssync/aio/dio
AIOinOracle10G
cd$ORACLE_HOME/rdbms/lib
makefins_rdbms.mkasync_on
makefins_rdbms.mkioracle
Addtoinit.ora(usuallyin$ORACLE_HOME/dbs)
disk_synch_io=true#forraw
filesystemio_options=asynch
filesystemio_options=directio
filesystemio_options=setall

Oracle OLTP Filesystem Performance

RHEL4U2Oracle10GOLTPPerformancewithdifferentfi
10000

Trans/MinuteTPM

9000
8000
7000
6000

EXT3

5000

NFS

4000

GFS

3000
2000
1000
0

OLTPsyncio

OLTPdio

OLTPaio

OLTP
aio+dio

DiskIOelevators

R3generalpurposeI/Oelevatorsparameters

R44tunableI/Oelevators

CFQCompletelyFairQueuing

NOOPNooperationinkernel

DeadlineOptimizeforruntime

AnticipatoryOptimizeforinteractiveresponse

2Oracle10Gworkloads

OLTP4krandom50%R/50%W

DSS32k256ksequentialRead

RHEL4IOschedulesvsRHEL3forDatabase
Oracle10Goltp/dss(relativeperformance)
100.0%
100.0%

CFQ
Dead

87.2%

line

108.9%
84.1%
84.8%

Rhel3

77.7%
75.9%

Noop
As
0.0%

%tran/min
%queries/hour

28.4%
23.2%

25.0%

50.0%

75.0%

Red Hat Confidential

100.0%

125.0%

HugeTLBFS

The Translation Lookaside Buffer (TLB) is a


small CPU cache of recently used virtual to
physical address mappings

TLB misses are extremely expensive on


today's very fast, pipelined CPUs

Large memory applications


can incur high TLB miss rates

TLB

HugeTLBs permit memory to be


managed in very large segments

E.G. Itanium:

Standard page: 16KB

Default huge page: 256MB

16000:1 difference

File system mapping interface

Ideal for databases

128data
128instruction

VirtualAddress
Space

E.G. TLB can fully map a 32GB


Oracle SGA

Red Hat Confidential

PhysicalMemory

RHEL3U6 with Oracle 10g TPC-C results


comparing performance of the Hugemem kernel with and without Hugepages enabled
40000
35000

tpmC

30000
25000
EXT3 No Hugepages

20000

RAW No Hugepages
EXT3 With
Hugepages

15000

RAW With Hugepages

10000
5000
0

hugemem kernel(4G4G)

Testsperformedona2XeonEM64TcpuHTsystem
with6GRAMand14spindlesusingmdadmraid0

LinuxPerformanceTuningSummary

LinuxPerformanceMonitoringTools

*stat,/proc/*,top,sar,ps,oprofile

Determinecacacityvstunableperformanceissue

TuneOSparmetersandrepeat

RHEL4vsRHEL3PerfComparison

RHEL4vsRHEL3

haveityourwayIOwith4IOschedulers

EXT3improvedblockreservationsupto3x!

GFSwithin95%ofEXT3,NFSimproveswithEXT3

Oraclew/FSo_direct,aio,hughpages95%ofraw

Questions?

top2streamsrunningon2dualcoreAMDcpus
1)sometimesschedulerchoosescpupaironmemoryinterfacedependingonosstate
Tasks:101total,3running,96sleeping,0stopped,0zombie
Cpu0:0.0%us,0.0%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu1:0.1%us,0.1%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu2:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
Cpu3:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
2)schedulerw/tasksetccpu#./stream,roundrobinodd,thenevencpus
Tasks:101total,2running,96sleeping,0stopped,0zombie
Cpu0:0.0%us,0.0%sy,0.0%ni,100.0%id,0.0%wa,0.0%hi,0.0%si
Cpu1:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si
Cpu2:0.0%us,0.3%sy,0.0%ni,99.7%id,0.0%wa,0.0%hi,0.0%si
Cpu3:100.0%us,0.0%sy,0.0%ni,0.0%id,0.0%wa,0.0%hi,0.0%si

McCalpinStreamon2cpudualcore,4CPU
bindingviataskset
RHEL4U12cpu,dualcoreAMD64
McCalpinStreamCopyb(x)=a(x)

RHEL4U12cpu,dualcoreAMD64
McCalpinStreamTriadc(x)=a(x)+b(x).c(x)
7000

6000
5000
Copy

4000

Copyw/Aff

3000
2000
1000
0

NumberofCPUs

BandwidthinMB/sec

BandwidthinMB/sec

7000

6000
5000
Triadsmp

4000

Triadw/Aff

3000
2000
1000
0

NumberofCPUs

Potrebbero piacerti anche