Sei sulla pagina 1di 57

Performance Analysis and Tuning Part 2

John Shakshober Director of Performance Engineering, Red Hat Jeremy Eder Principal Performance Engineer, Red Hat June 13, !13

Agenda: Performance Analysis Tuning Part II

Part I

RHEL tuned profiles top !enc"mar# results $cala!ilty %&$ $c"eduler tuna!les ' %groups Hugepages Transparent Hugepages 2()'*+) ,on-niform (emory Access .,-(A/ and ,-(A0 ,et1or# Performance and Latency2performance 0is# and &ilesystem I3 2 T"roug"put2performance $ystem Performance'Tools perf tuna systemtap

Part II

45A

RHEL 6 ,et1or#ing performance $ystem setup

Disable unnecessary ser"ices, runle"el 3 #ollo$ "endor guidelines for %&'S (uning

)ogical cores * Po$er +anagement * (urbo * Disabling filesystem ,ournal Ensure mount using relatime SSD-+emory Storage Running s$apless Reducing $riteback thresholds if your app does disk &-'

&n the 'S, consider


RHEL6 tuned pac#age


# yum install tune* # tuned-adm profile latency-performance # tuned-adm list Available profiles: - latency-performance - default - enterprise-storage - virtual-guest - throughput-performance - virtual-host Current active profile: latency-performance # tuned-adm profile default (to disable)

tuned Profile %omparison (atri7


(unable
kernel/sched0min0 granularity0ns kernel/sched0$akeup 0granularity0ns "m/dirty0ratio "m/dirty0background 0ratio "m/s$appiness &-' Scheduler 6Ele"ator7

default 1ms 1ms !3 R4+ 1!3 R4+ 5! 8#9

enterprise. storage 1!ms 12ms 1!3

"irtual. host 1!ms 12ms 1!3 23 1!

"irtual. guest 1!ms 12ms 1!3

latency2 performance

throughput. performance 1!ms 12ms 1!3

3!
deadline

deadline 'ff performance 1<

deadline 'ff

deadline

deadline

#ilesystem %arriers 'n 8P: ;o"ernor Disk Read.ahead 0isa!le THP %P- %2$tates ondemand

'ff performance performance

8es Loc#ed 9 *

https=--access/redhat/com-site-solutions-35>!>3

Locality of Pac#ets
Stream from 8ustomer 1 Socket1-8ore1

Stream from 8ustomer

Socket1-8ore

Stream from 8ustomer 3

Socket1-8ore3

Stream from 8ustomer 1

Socket1-8ore1 $oc#et*

,et1or# Tuning: IR4 affinity

:se ir?balance for the common case @e$ ir?balance automates @:+4 affinity for &R9s #lo$.Steering (echnologies +o"e Ap1p1BA &R9s to Socket 1= # tuna -q p1p1* -S1 -m -x # tuna -Q | grep p1p1

+anual &R9 pinning for the last C percent-determinism

;uide on Red Hat 8ustomer Portal

,-(A Affinity %LI Reference

# numactl -N1 -m1 ./command

Sets 8P: affinity for AcommandA to 8P: node 1 4llocates memory out of +emory node 1 8hose node 1 because of P8&.bus $iring

:pstream kernel community $orking on automatic @:+4 balancing/ (est numad in RHE)5

,et1or# Tuning: ,I% 3ffloads fa:or T"roug"put

Reduce the D of packets-&R9s the kernel processes (hroughput "s )atency trade.off
3ffload $ummary +(:.chunking offloaded to @&8 Protocol 0irection (8P (C

(S' tcp segment offload :#' udp fragment offload

+(:.chunking offloaded to @&8

:DP

(C

;S' +(:.chunking done in.kernel generic segment offload ;R' generic recei"e offload )R' large recei"e offload @&8-dri"er batches certain RC packets

(8P, :DP (8P, :DP (8P

(C

RC

@&8-dri"er batches all RC packets

RC

,et1or# Tuning: )uffer )loat

Eernel buffers=
Local ddress!"ort 1'(.1'.1.)*!)+,*( 1'(.1'.1.)*!-++-* "eer ddress!"ort 1'(.1'.1.),!1(+*1'(.1'.1.),!-),.1

# ss |grep -v ssh State Recv-Q Send-Q #S$ % & & #S$ % & )'()1(+

@&8 ring buffers= D ethtool .g p1p1


;ernel )uffer 4ueue 0ept" *<+!it T%P=$TREA(
1/2! 1/!! 3/2! 3/!!

9ueue Depth 6+%7

/2!

/!!

1/2! 1/!! !/2! !/!!

1!; line.rate F1+% ?ueue depth +atching ser"ers

Tim e .*2sec inter:als/


$e nd24 0ept"

$R2I3>: RHEL 6?@


Round.trip )atencies &nto ;uest
6)o$er is %etter7
*A< *6D *6<

Latency .(icroseconds/

*@< *2< *<< AD A< 6< @< 2< < *B *6 2< * *C 2* @ RHEL6?@ .$R2I3> tuned/ RHEL6?@ .)ridge tuned/ *B *6 62 @C BD 22 E@

(in RHEL6?@ .tuned/ RHEL6?@ .untuned/ (ean EE?EF %are metal $td0e:

EG+ H SR.&'G

EG+ H %ridge

%P- Tuning: P2states .freGuency/

Gariable fre?uencies for each core


P2state Impact on Latency .Lo1er is !etter/
@< BC BB B6

Latency .(icroseconds/

BD B< 2D 2< *D *< D < B B B <?@ A:erage *C *B *B *2

*A

<?@

Po1ersa:e 3ndemand $td 0e: Performance Performance H %<

(a7

%P- Tuning: %2states .idle states/

Gariable idle states for each core 8state lock disables turbo, but increases determinism
8.state &mpact on Jitter
2!

)atency 6+icroseconds7

!! 12! 1!! 2! !

85

83

81

8!

+a<

(ime 61.sec inter"als7

Tur!ostat s"o1s P'%2states on Intel %P-s


turbostat begins shipping in RHEL6.4, cpupowerutils package Default pk cor C ! # # # # # ) . * # ) . * "c# $%& '(C "c) 2-,. *-)* )-/, )-.) "c* )-*. #-)2 #-## #-/, "c+ #-## #-## #-## #-). "c, 0.-,. 0/-)1 0+-.2 0+-//

#-./ .-0* .-11 .-2/ *-#* .-11 .-.0 *-#1 .-11 )-,2 )-,2 .-11

latency-performance pk cor C ! # # # # # ) . * "c# $%& '(C "c) "c* #-## #-## #-## #-## "c+ #-## #-## #-## #-## "c, #-## #-## #-## #-##

# #-## ) #-## . #-## * #-##

*-*# .-0# )##-## *-*# .-0# )##-## *-*# .-0# )##-## *-*# .-0# )##-##

Po1er %onsumption RHEL6?@ :s RHEL6?@9%<

8.state lock increases po$er dra$ o"er Iout of the bo<J


(est Eernel %uild Disk Read Disk Lrite :npack tar/gN 4cti"e &dle Efficiency KLhM 3 Diff H1 /23 H3 / 3 H 2/53 H 3/33 H113

:se cron to set latency.performance tuned profile $hen necessary/ Set tuned profile in application init script

(emory Tuning: Transparent Hugepages


&ntroduced in RHE) 5

4dded counters in RHE) 5/ Enhanced again to reduce o"erhead in 5/1 # egrep /trans|thp/ /proc/vmstat

nr0anon0transparent0hugepages (&1+ thp01ault0alloc ')&( thp01ault01all2ac3 & thp0collapse0alloc ,&1 thp0collapse0alloc01a4led & thp0spl4t (1

Transparent Hugepages

Transparent Hugepages 0isa!led

Tuna .ne1 in RHEL6?@/

Tuna IR4'%P- affinity conte7t menus

8P: affinity for &R9s

8P: affinity for P&Ds Scheduler Policy Scheduler Priority

Tuna for processes


# tuna -t netserver -" thread ctxt0s54tches cmd netserver

p4d S67#80 rtpr4 a114n4t9 voluntar9 nonvoluntar9 1),++ :$7#R & &x111 1 &

# tuna -c( -t netserver -m # tuna -t netserver -" thread ctxt0s54tches cmd netserver

p4d S67#80 rtpr4 a114n4t9 voluntar9 nonvoluntar9 1),++ :$7#R & ( 1 &

Tuna for IR4s

+o"e Ap1p1BA &R9s to Socket 1= # tuna -q p1p1* -S& -m -x # tuna -Q | grep p1p1 '+ p1p1-& '. p1p1-1 +& p1p1-( +1 p1p1-) +( p1p1-, 6ore & 1 ( ) , s1c s1c s1c s1c s1c

...

Tuna for core'soc#et isolation


# tuna -S1 -4 # grep 6pus0allo5ed0l4st /proc/;pgrep rs9slogd;/status 6pus0allo5ed0l4st! &-1-

# tuna -S1 -4 <tuna sets a114n4t9 o1 /4n4t/ tas3 as 5ell=

# grep 6pus0allo5ed0l4st /proc/;pgrep rs9slogd;/status 6pus0allo5ed0l4st! &>1>(>)>,>->*>'

,-(A Topology and P%I )us

Ser"ers may ha"e more than 1 P8& bus/ &nstall adapters IcloseJ to the 8P: that $ill run the performance critical application/ Lhen %&'S reports locality, ir?balance handles @:+4-&R9 affinity automatically/

/.:##-# 3et4ork controller: 5ellano6 'echnologies 5'.,2## 7amily 8Connect9-*: # cat ;sys;devices;pci####<:/#;####<:/#<:#*-#;####<:/.<:##-#;local=cpulist )>*>2>,>0>))>)*>)2

# dmesg ? grep @3!5A node@ pci=bus ####:##: on 3!5A node # (p6m )) pci=bus ####:/#: on 3!5A node ) (p6m .) pci=bus ####:*f: on 3!5A node # (p6m )) pci=bus ####:,f: on 3!5A node ) (p6m .)

;no1 8our Hard1are ."1loc/

P8& %us IlocalJ to this @:+4 node

P%I 0e:ice Affinity

# lstopo-no-graph4cs |egrep /N?@ |eth,/ N?@ Node L#& <"#& 1,,A%= N?@ Node L#1 <"#1 1,,A%= Net L#1& Beth,B

Performance (onitoring Tool 2 perf

:serspace tool to read 8P: counters and kernel tracepoints


RHE) 5/1 includes perf from upstream kernel 3/5 https=--perf/$iki/kernel/org perf top 6dynamic7 perf record - report 6sa"e and replay7 perf stat OcommandP 6analyNe a particular $orkload7

Tracing 1it" IperfI: perf top

System.$ide AtopA "ie$ of acti"e processes

Tracing 1it" IperfI: perf stat

4ttach to e<isting P&D and report all kmem tracepoints=


# per1 stat -a ./m90cmd

Tracing 1it" IperfI: perf diff

Sho$ differences bet$een

perf/data recordings

Run perf record t$ice, each $ith different tuning

# per1 d411 -@ per1.data.old per1.data

Tracing 1it" IperfI: perf script H gprof2dot


D perf script .i perf/data Q R gprof dot .f perf Q R dot .(s"g .o output/s"g

Interesting ne1 ,et1or#'Perf t"ings in RHEL6?@

tuna included latency.performance ItunedJ profile beefed up


)ock 8.states Disable (ransparent Hugepages

turbostat included in cpupo$erutils package h$loc no$ reports P8& bus topology P(P (ech Pre"ie$ +ellano< &nfiniband SR.&'G (ech Pre"ie$

RHEL vs Windows Server 2012 Comparison Network

S In "oth ##$ and #pti%i&ed cases Red Hat 'nterprise (in)* deli+ers "etter thro)ghp)t and lo,er latenc- to critical net,or./hea+- applications S http0//,,,.principledtechnologies.co%/RedHat/RH'(12net,or.20113.pd!
34

Principled Technologies, Inc. & Red Hat, Inc. on!idential

05/29/13

I'3 Tuning -nderstanding I'3 Ele:ators

Deadline

($o ?ueues per de"ice, one for read and one for $rites &-'s dispatched based on time spent in ?ueue Per process ?ueue Each process ?ueue gets fi<ed time slice 6based on process priority7 #&#' Simple &-' +erging )o$est 8P: 8ost

8#9

@oop

IoJone Performance %omparison EKT@'K&$'+&$

RHE) 5/3 2!!! 12!! 1!!! 32!! 3!!! 2!! !!! 12!! 1!!! 2!! ! e<t3 e<t1

RHE) 5/1 2!!! 12!! 1!!! 32!! 3!!! 2!! !!! 12!! 1!!! 2!! ! gfs <f s e<t3

RHE) 5/3

RHE) 5/1

e<t1

gfs

<f s

$A$ Application on $tandalone $ystems


Pic#ing a RHEL &ile $ystem
7fs most recommended

$A$ (i7ed Analytics E?B running RHEL6?B


%omparing Total time and $ystem %P- usage
T3TALtime $ystemTime

+a< file system siNe 1!!(% +a< file siNe %est performing 1!!(%

e7t@ recommended

7fs &ile system

+a< file system siNe 15(% +a< file siNe 15(%

e7t@

e7tB not recommended

e7tB
< B6<< C2<< *<A<< *@@<<

+a< file system siNe 15(% +a< file siNe (%

Time in seconds .lo1e r is !etter/

RHEL6 tuned pac#age


# yum install tune* # tuned-adm profile enterprise-storage # tuned-adm list Available profiles: - latency-performance - default - enterprise-storage - virtual-guest - throughput-performance - virtual-host Current active profile: enterprise-storage # tuned-adm profile default (to disable)

tuned Profile %omparison (atri7


(unable
kernel/sched0min0 granularity0ns kernel/sched0$akeup 0granularity0ns "m/dirty0ratio "m/dirty0background 0ratio "m/s$appiness &-' Scheduler 6Ele"ator7

default 1ms 1ms !3 R4+ 1!3 R4+ 5! 8#9

enterprise2 storage *<ms *Dms @<F

"irtual. host 1!ms 12ms 1!3 23 1!

"irtual. guest 1!ms 12ms 1!3

latency. performance

throughput. performance 1!ms 12ms 1!3

3!
deadline

deadline 3ff performance 1<

deadline 'ff

deadline

deadline

#ilesystem %arriers 'n 8P: ;o"ernor Disk Read.ahead 0isa!le THP 0isa!le %2$tates ondemand

'ff performance performance

Tes Tes

https0//access.redhat.co%/site/sol)tions/319093

Tuning (emory &lus"ing %ac"es

Drop unused 8ache U to control pagecache dynamically


#rees most pagecache memory #ile cache

&f the D% uses cache, may notice slo$do$n


@'(E= :se for benchmark en"ironments/ &ree pagecac"e

L syncM echo 1 P -proc-sys-"m-drop0caches D syncV echo P -proc-sys-"m-drop0caches

&ree sla!cac"e

&ree pagecac"e and sla!cac"e

D syncV echo 3 P -proc-sys-"m-drop0caches

IoJone Performance Effect of T-,E0 EKT@'K&$'+&$


RHE)5/1 #ile System &n 8ache Performance
&ntel )arge #ile &-' 6ioNone7
X!! not tuned tuned 12!! 1!!! 32!!
T"roug"put in ( )'$ec

RHE)5/1 #ile System 'ut of 8ache Performance


&ntel )arge #ile &-' 6ioNone7

W!!

5!!

2!!

T"roug"put in ( )'$ec

3!!! 2!! !!! 12!! 1!!!

not tuned tuned

1!!

3!!

!!

1!! 2!! ! e<t3 e<t1 <f s gf s ! e<t3 e<t1 <fs gf s

RHEL )I3$ and Tuned profiles


%&'S to 'S controlled D tuned2adm profile enterprise2storage kernel/sched0min0granularity0ns Y 1!!!!!!! kernel/sched0$akeup0granularity0ns Y 12!!!!!! "m/dirty0ratio Y 1! E)EG4('RYZdeadlineZ &f
-sys-block-sdC-de"ice-scsi0disk-C=C=C=C-cache0type=$rite back

then %4RR&ERSYoff 6for mounts other than root-boot "ols7 D set cpuspeed go"ernorsYperformance

RHEL :s Nindo1s $er:er 2<*2 %omparison


&ile system: in2cac"e file2access met"od

http0//,,,.principledtechnologies.co%/RedHat/RH'(12I#20113.pd!
43

Principled Technologies, Inc. & Red Hat, Inc. on!idential

05/29/13

RHEL :s Nindo1s $er:er 2<*2 %omparison


&ile system: 0irect I'3 file2access met"od

http0//,,,.principledtechnologies.co%/RedHat/RH'(12I#20113.pd!
44

Principled Technologies, Inc. & Red Hat, Inc. on!idential

05/29/13

Per de:ice'file'L-, page flus" daemon

Each file system or block de"ice has its o$n flush daemon 4llo$s different flushing thresholds and resources for each daemon-de"ice-file system/ Pre"ents some de"ices from not getting flushed because a shared daemon blocks used all resources Replaces pdflushd $here a pool of threads flushed all de"ices/

Per file system flus" daemon


pagecac"e

Read67-Lrite67
memory copy
Pagecac"e page

#lush daemon

buffer

:ser space
#ile system

Eernel

High End HP D) >X! 4&+W results $- IktuneJ 6r27 Ituned.admJ 6r57


1!!,!!! >!,!!! X!,!!! W!,!!! 5!,!!! 2!,!!! 1!,!!! 3!,!!! !,!!! 1!,!!! !

RHE) 2/2

RHE) 5

RHE) 5 ZtunedZ

HP 0LEA< 6@2core'2D6+)'B< &%'@A< lun

AI(C results 1' tuned

34

>irtual (emory (anager .>(/ Tuna!les


Reclaim Ratios 'proc'sys':m's1appiness 'proc'sys':m':fs=cac"e=pressure 'proc'sys':m'min=free=#!ytes Nrite!ac# Parameters 'proc'sys':m'dirty=!ac#ground=ratio 'proc'sys':m'dirty=ratio Reada"ead parameters 'sys'!loc#'O!de:P'Gueue'read=a"ead=#!

dirty=ratio and dirty=!ac#ground=ratio


pagecac"e
1!!3 of pagecache R4+ dirty

flushd and $rite67Ang processes $rite dirty buffers

dirty0ratio6 !3 of R4+ dirty7 U processes start synchronous $rites flushd $rites dirty buffers in background

dirty0background0ratio61!3 of R4+ dirty7 U $akeup flushd do0nothing

!3 of pagecache R4+ dirty

;>( ' RH$ Tuning

gluster :olume set <volume> group :irt C#S mkfs .n siNeYX1> , mount inode51, noatime RHS ser"er= tuned2adm profile r"s2:irtualiJation

&ncrease in readahead, lo$er dirty ratioAs %etter response time shrink guest block de"ice ?ueue

EG+ host= tuned2adm profile :irtual2"ost

/sys/block/vda/queue/nr_request (16 or 8) /sys/block/vda/queue/read_ahead_kb (4096/8192)

%est se?uential read throughput, raise G+ read.ahead

IoJone Performance %omparison RH$2?*'K&$ 1' RHE>


3ut2of2t"e2!o7
C<<< 6<<< D<<< @<<< B<<< 2<<< *<<< <

tuned r"s2:irtualiJation

rnd.$rite

rnd.read

se?.$rite

se?.read

$ummary ' 4uestions

Red Hat Enterprise )inu< 5 Performance #eatures

I(:@EDJ tool U ad,usts system parameters to match en"ironments . throughput-latency/ (ransparent Huge Pages U auto select large pages for anonymous memory, static hugepages for shared mem @on.uniform +emory 4ccess 6@:+47

@:+4stat enhancements @:+4ctl for manual control @:+4D daemon for auto placement

(:@4 U integration $- RHE)5/1

Helpful Lin#s

Performance (uning ;uide )o$ )atency Performance (uning ;uide 'ptimiNing RHE) Performance by (uning &R9 4ffinity EG+ Performance ;uide S(48 @et$ork &-' S&; %log= http=--$$$/breakage/org- or [,eremyeder

RHEL )enc"mar# *<2year History in TP%2%

B <<< <<< 2 D<< <<< 2 <<< <<< * D<< <<< * <<< <<< D<< <<< <
RHA $ RHE LB RHE L@ RHE L@ RHE L@ RHE L@ RHE L@ RHE L6 RHE LB RHE LD RHE LD

2< *A *6 *@ *2 *< A 6 @ 2 <

tpm% Q'tpm%

o%parison o! T P / res)lts )sing the Red Hat operat/ ing s-ste%. 5or %ore in!or%ation a"o)t the T P and the "ench%ar. res)lts re!erenced here see ,,,.tpc.org.

RHEL6 )enc"mar# TP%2%2 2 soc#et impro:ements


15!!!!! 32

* D<B D@@
11!!!!!

* B2< <A2

3!

1 !!!!! 2 1!!!!!! ! X!!!!! 12 5!!!!! 1! 1!!!!! tpm8 \-tpm8 cpus

2C6 BAB
!!!!!

*EC 66E D* D<6

!
HP << 26 r7

Q*?A*
pD er : er e$ < DC

QB?EB
s es pr 7 <E DD

Q2?DD
m te ys $ e7 &l @< 72

Q<?DB
m te ys $ e7 &l < 6D 7B

Q<?D*

B EL H R

@ EL RH

( I)

D EL RH

( I)

r 1e Po

6 EL RH

( I)

>( '; 6 EL RH

( I)

RHEL6 )enc"mar# *<2year History in TP%2H


D<< <<< @D< <<< @<< <<< BD< <<< B<< <<< 2D< <<< 2<< <<< *D< <<< *<< <<< D< <<< < BD *@*
\1/2W \2>/>3

QC<?<< @B6 CAA @@D D2E Q6<?<<

QD<?<< B*D A@2 Q@<?<<

4p"H Q'4p"H

QB<?<< *6@ C@C Q2<?<< *<2 BCD


\5/X2 \3/53 \!/XX \!/W2

Q*<?<<

Q<?<<

RHELB

RHEL@

RHELD

RHEL6

RHEL6

RHEL6

o%parison o! T P / res)lts )sing the Red Hat operat/ ing s-ste%. 5or %ore in!or%ation a"o)t the T P and the "ench%ar. res)lts re!erenced here see ,,,.tpc.org.

RHEL6 )enc"mar# *<2year History in TP%2H

12!,!!! 1!!,!!! 32!,!!! \ /52

@<B 2B<

@2< <E2

\3/!!

\ /2!

BB2 @A* B<B 2AE


\ /!!

3!!,!!! 2!,!!! !!,!!! 12!,!!! 1!!,!!! 2!,!!! !

2D* D6*
\1/2!

9phH \-9phH

\1/!!

EA ADC
\!/3X \!/15 \!/12 \!/1 \!/11 \!/!! \!/2!

RHEL@

RHEL6

RHEL6

RHEL6

RHEL6

RHEL6

o%parison o! T P / res)lts )sing the Red Hat operat/ ing s-ste%. 5or %ore in!or%ation a"o)t the T P and the "ench%ar. res)lts re!erenced here see ,,,.tpc.org.

Potrebbero piacerti anche