Sei sulla pagina 1di 15

29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

Menu

Home

Topics

About

Backend

LearnRedisthehardway(inproduction)
Postedon25January2017byAndyGrunwald

Forourproducts,likethetrivagohotelsearch,weareusingRedisalot.Theuse
casesvary:Caching,temporarystorageofdatabeforemovingthoseintoanother
storageoratypicaldatabaseforhotelmetadataincludingpersistence.

ThemainpartsofthehotelsearcharebuiltwithPHPandtheSymfonyFrameworkfor
thefrontend(web)andJavaforthebackendpart.Inthisarticle,wewillfocusonthe
collaborationbetweenourPHPapplicationandRedis.Botharerunningfine,butit
wasalongandhardwayuptothecurrentsituation.Thisisthestoryofhowwe
learnedtouseRedis,includingourfailuresandexperience.

Thebeginning
ThisstorybeganonFriday,September3,2010.At10:33amthefirstclassesofPredis
,aRedisclientlibraryforPHP,werecommittedintoourcodebase.

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 1/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

ThismomentcanbemarkedastheintroductionofRedisintoourPHPstack.Fast
forwardtoFebruary2013.WereplacedthelibraryPredis(PHPimplementation)with
thePHPextensionphpredis(Cimplementation).Thereasonwassimple:
Performance.Thereplacementwentwell.EverythingwithRediswasfineandwe
enjoyedthesummerthatyear.

TherealfunstartedexactlyoneyearlaterinFebruary2014.Aroundthistime,we
launchednewfeaturesandaddednewlanguagestoourplatform.Theresultofthis:
TheHTTPtrafficdoubledinashorttime.Duetogoodcapacityplanningonthe
hardwareside,wewereabletohandlethegrowth.Onthesoftwaresidehowever,we
wereconfrontedwith40%oftheincomingrequestsresultinginHTTP500:Internal
ServerError.

Afterinvestigatingourlogs,wesawerrorsrelatedtothePHP<>Redisconnection
handling.MostofthemwerereaderroronconnectionandRedisserverwentaway.
Ourloggingwasverboseandgaveusenoughdetailtostartourdebuggingsession:

www7.trivago.com20140202102259941|WARN|...Redis\ConnectException:Unabletoconnect:
#0/.../vendor/.../Redis/RedisPool.php(106):...\Redis\RedisPool>connect(Object(Redis),O
#1/.../vendor/.../Redis/RedisClient.php(130):...\Redis\RedisPool>get('default',true)
#2/.../vendor/.../Redis/RedisClient.php(94):...\Redis\RedisClient>setMode(false)
...
#17/.../app/bootstrap.php.cache(551):Symfony\Bundle\FrameworkBundle\HttpKernel>handle(O
#18/.../web/app.php(15):Symfony\Component\HttpKernel\Kernel>handle(Object(Symfony\Compo
#19{main}
|12.34.56.78|www.trivago.de|/?aDateRange%5Barr%5D=20140520&aDateRange%5Bdep%5D=2014

Aquickgooglesearchshowedthatwewerenotalonewiththisissue.See
debugreaderroronconnection#70andreaderroronconnection#492.

Debuggingandfixingtheissue
Basedonthediscussioninthetickets,wethought:Congratulations,wegotourselves
anastybugthere.

Wehadnocluewhattherootcausewas.Redishadbeenworkingfineformorethan
3.5years,andhadnevercausedusanytroublebefore.Leavinguswiththequestion:
Howdowecontinuefromhere?Ourfirstattemptwastotryeverythingthatwas
mentionedintheGithubissue:

RaisingPHPconnectionandcommandtimeoutsfrom500msto2.5seconds

DisablingthePHPsettingdefault_socket_timeout
DisablingSYNcookiesonthehostsystems

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 2/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

CheckingthenumberoffiledescriptorsonRedisandWebservers

Raisingthembufferofthehostsystems

ControlandadjusttheTCPbacklogsizes
andmuchmore

Nothinghelpedtosolvethisissueinareliableway.Sobacktotraditionaldebugging!
Wetriedtoreproducetheissueinourpreproductionenvironments.Sadly,wewere
notsuccessful.Wethoughtthatthoseissuesonlyappearedwithmuchhighertraffic.
Sowecontinuedtodeepdiveintoourapplications

ClosingRedisconnectionsattheendofawebrequest

PHPapplicationsareusuallystateless.Everythingyouallocateinarequestisgone
oncetherequestfinishes.Atthistime,wewerenotusingphpfpmandpersistent
connections.ThismeansthateveryHTTPrequestwouldcreateanewRedis
connection.WhilecheckingourconnectionhandlingtoRedis,weopeneda
connection,butneverclosedit.

ThisshouldnotmakeadifferenceinnewerPHPversions.PHPwillautomatically
closetheconnectionwhenyourscriptends.Inolderversions,however,thiscould
leadtoproblemslikestaleconnectionsormemoryleaks.Besidesthat:Itisgood
practicetocloseyourconnections,sowefixedit.Butthatdidnothelpuswithour
initialproblem.

A/BTestingofconnectionlibraries

Wekeptonsearchingandaskedourselvesifwehitabuginphpredis(php
extension).Toverifythishypothesis,weimplementedanA/BTest.Thenecessary
infrastructuretorunA/BTestswasalreadythere.Soweuseditandswitchedfrom
theCextensionbacktothepredislibrary.

Wewerepleasedtofindthatprediswasstillbeingmaintainedandhadreceiveda
lotofdevelopmentlove.

Duetogoodcodestructure,thischangewasdonequickly.Weimplementedone
interface,replacedthephpredisconnectionimplementationwithpredisand
reconfiguredourdependencyinjectioncontainer.

Wedeployedthetestto20%ofourusersinonedatacenter.Theerrorsoccured
again.Inbothlibraries!Wasthisanotherfailureonourroadtofixthisbug?No!We
considereditapartialsuccess.Wewereabletoexcludetheextension(phpredis)as
apossiblerootcause.

UpgradeRedis
http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 3/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

OurnextstepwastohaveadeeperlookattheRedisside.Commonstepsinan
investigationofabugarecheckingouttheissuetrackeroftheprojectorgettingin
contactwithoneofthemaintainers.ThefirstthingtheywillaskisWhatversionare
yourunning?.Onceyoumentionaversionthatisnotthelatestintheupstream,they
willanswerPleaseupgradeandcheckifitstilloccurs.Thisiswhatwedid.

AtthistimewewererunningRedisv2.6.Thelatestupstreamversionwasv2.8.9.We
thoughtthatmaybewehitabugthatwasalreadyresolved.Unfortunately,thiswas
notthecase.Wemadenoprogressrelatedtoourproblem,butatleastourRedis
serverswereuptodate.:)

Debugginglatencyproblems

Afterreadingalotofdocumentation,wecameacrossafeaturefordebugginglatency
problems.ItscalledRedisSoftwareWatchdog.Itwas(andstillis)markedas
experimentalintheofficialdocumentationbutwewantedtogiveitatry.Theideawas
toidentifylongrunningandblockingcommands.

Tip
Redisissinglethreaded.Everycommandmayblockothercommands.Keepthis
inmindifyoustartthinkingaboutproblemsorusecases.Thisseemsobvious,but
ismostlyoverlookedandtherootcauseofmanyissues.

SoweactivatedWatchdog,waitedafewsecondsandMurphyslawkickedin.We
hitabugandourRedisservercrashed.Inproduction!See
Softwarewatchdogcrashesredisduringrdbsavepoint#1771.Again:Relatedtoour
mainproblem,wemadenosignificantprogressatthistime.Instead,wehadanew
problem:AnofflineRedisdatabase(whichwasimmediatelyrestarted,ofcourse).So
wekeptgoing.

WestartedthenexttrybymeasuringthelatencybaselineofourRedissetup.The
numbersoftheintrinsiclatencylookedprettygood.Thebaselatencylooked
horrifying.

$redisclilatencyp6380h1.2.3.4
min:0,max:463,avg:2.03(19443samples)

WecheckedtheRedislogsanddiscoveredthatRediswassavingdatatodiskevery
fewminutes:

...
[20398]22May09:20:55.351*10000changesin60seconds.Saving...

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 4/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

[20398]22May09:20:55.759*Backgroundsavingstartedbypid41941
[41941]22May09:22:48.197*DBsavedondisk
[20398]22May09:22:49.321*Backgroundsavingterminatedwithsuccess
[20398]22May09:25:23.299*10000changesin60seconds.Saving...
[20398]22May09:25:23.644*Backgroundsavingstartedbypid42027
[20398]22May09:26:50.646#Acceptingclientconnection:accept:Softwarecausedconnect
[20398]22May09:26:50.900#Acceptingclientconnection:accept:Softwarecausedconnect
...

WerunRedisonbaremetalserverswith(nearly)defaultconfiguration,becauseitis
shippedwithsanedefaults.OurfirstquestionwasWhydidtheforkofabackground
savingprocesstake~400ms?(havealookatthefirsttwologlines).Afterreadinga
fewmailinglistpostsandtheimplementationofBGSAVE,weunderstoodwhy.Redis
isforkingabackgroundprocessandneedstocopythepagetable.So,ifyouhavea
bigRedisinstancewithmanykeys,itwilltaketime.Evenonbaremetal,without
virtualization.Bynow,thisbehaviourhasbeenaddedtotheofficaldocumentation.
SeeForktimeindifferentsystems.

AsafollowupwedeactivatedRedissnapshotsforserviceswherepersistencewas
notneeded.Thisreducedtheamountofreaderroronconnectionbymorethan
30%.

Forinstanceswherepersistenceisneeded,theusageofsnapshotpointscanbe
tricky.Ifyouhavealotoftrafficonyourinstancesandyourapplicationisdoingwrite
operationsperrequestyouwillhavemorekeymodifications.Thisleadstomore
BGSAVEtriggersand(possibly)abiggeramountofrejectedconnections.Thereasons
arehigherprocessforktimesandblockedRedisinstances.

Tip
ReviewyourpersistencerequirementsandRedisconfiguration.Doesyourapp
modifymorekeyswhenyougetmoretrafficanddoyouneedpersistence?
ConsiderAOForrollingBGSAVEasanalternativetostandardsnapshotting.This
mayavoidconnection/commandexecutiontimeoutsandblockingRedis
instances.

Thiswasthecaseinoursituation.Ourapplicationisreadingandwritingkeys,butwe
didntwanttodeactivatepersistenceglobally.Wedeactivatedthesnapshotpointsin
thoseRedisinstancesandactivatedcronjobsthatwillcalltheBGSAVEcommandata
specifictime(rollingBGSAVE).Withthisweknowwhenadumpistriggeredandcan
avoidthoseduringhightraffictimes.AnalternativetorollingBGSAVEoperations
wouldbeaseparateslaveinstanceforpersistence.Thisslavewillnothandlereal
trafficanditsonlypurposeistotakecareofpersistence.Inusecaseswithhigher

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 5/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

persistencerequirementsweprefertouseAOF.Ifyouwanttoknowmoreabout
rollingBGSAVEwestartedasmallpostontheoldRedismailinglisttodiscussthis
topic.SeeRollingBGSAVEinsteadof(pre)configuredsavepoints.

Thischangewasconsideredasuccess.Wereducedourerror/timeoutproblems
immediately(eveninnonpeaktraffictimes).Butwestillsawerrorspoppinguphere
andthere,sowewerenotdoneyet.

Oneinstanceperdatacontext

SinceRedishasbeenintroducedintoourwebstack,itwasadoptedbymoreand
moreteamsforvarioususecases.

TheyusedtheexistingRedisinstancesandstoredtheirdatainadifferentdatabase.
Thisway,theycouldstartrightaway.Thiswasgreatforseveralreasons,butledus
toournextchallenge.

Oneteamhadacronjobrunningevery15minutesthatdumpeddatafromaMySQL
databaseintoasharedRedisinstanceviathePipeliningfeature.Duetothesingle
threadednatureofRedisthesharedRedisinstancewasblockedevery15minutesfor
severalseconds.

Tip
BeawareofRedisinstancesthataresharedbetweenteamsanddatacontext.
Duetodifferentusecasesandlongrunningcommandstheymayblockeach
other.Remember:Redisissinglethreaded.

WemovedthecronjobtoitsownRedisinstance.Asaresult,the(former)shared
instancethrewalotlessconnectionandtimeouterrorsthanbefore.Duetothesplitof
datacontextstheamountofcommandsperinstancewasreduced.Furthermore,
startingseveralRedisinstancesperserverleadstoabetterutilizationofcomputing
resources.Why?Again:Redisissinglethreaded!Andmodernservershavealotof
cores.

Sidenote:TheusageofSELECTandmultipledatabasesinsideoneRedisinstance
wasmentionedasanantipatternbySalvatore.

O(n)cankillyou

Okay,wefoundseveralcausesandreducedtheamountofconnectionandtimeout
errorsbyanorderofmagnitude.EverythingwentwellforalongtimeandourRedis
setupwashealthy.Thetimewentby,teamsimplementednewfeaturesintoour

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 6/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

applicationandourtrafficcontinuedtogrow.Thetrafficgrowthwentfastandthe
connectionandcommandtimeouterrorscameback.

Ourfirstthought:Really?Murphy?Areyouthere?

Luckilywesawapatternintheoccurrenceofthemessage.Itoccurredperiodically
every5minutes.Basedonourknowledgefromthelastinvestigationwestartedright
away.Wemeasuredthebaselatency,enabledwatchdogandreadtheSLOWLOG
documentation.

Inaveryshorttime,comparedtopreviousinvestigations,weidentifiedacronjobthat
firestheKEYS*commandagainstabigRedisinstance.LuckilytheRedis
documentationdescribestheTimecomplexitypercommandwiththehelpofthe
BigOnotation.ThetimecomplexityoftheKEYScommandisdefinedas:

O(N)withNbeingthenumberofkeysinthedatabase

InbigdatabasesanddependingonthepatternyouapplyontheKEYScommandthis
operationcanleadtoalongblockingRedisinstance.

Tip
TakeadedicatedlookatyourRediscommandsandtheirTimecomplexity.Afew
commandswithahighBigOestimationcanslowdowntheperformanceofyour
Redisinstance.Oftentherearealternativecommandsthatservenearlythesame
purposeandareabetterchoice(e.g.theSCANfamilyasareplacementforKEYS
)

Backin2014therewasonlyasmallnoteinthe
Latencygeneratedbyslowcommandsdocumentation,whichsaidtheKEYS
commandshouldonlybeusedfordebuggingpurposes.Inthemeantimethe
commandreferencewasextendedandawarningrelatedtothiswasadded.

Basedonthisnewexperiencewehadalookatourapplicationcodeagain.We
checkedallRediscommandswithaspecialattentiontotheusecase,useddata
structureandtheirtimecomplexity.Thiswasalotofwork,butitpaidoff.We
optimizedover40%oftheexecutedcommandswhichledtolesstimespentinthe
communicationwithRedis.Intheendthisledtoanoverallfasterresponsetimeofour
webstack.Wewerefineagain.

Oneconnectionperrequest

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 7/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

Weacceptedthechallengewiththeevergrowingamountoftrafficinthefollowing
monthsandfurtheroptimizedourapplicationandstackinseveralways.Someofour
goalsweretacklingtheconsumptionofmemoryperrequest,optimizingourdatabase
queries(slowquerylog),tuningourcachinglayersandaddingmorehardware(web
servers)toourdatacenters.Especiallythelastchange,addingmorewebserversto
ourstack,createdyetanotherchallenge.

Asmentionedearlier,weweredealingwithstatelessapplications.Withouttheusage
ofphpfpmandpersistentconnectionsthismeans:

1.AnHTTPrequestcomesin

2.Theapplicationcreatesconnectionstovariousservices

3.Operationsareexecuted,queriesarerun,requestsaremade,etc.
4.Theapplicationclosesconnectionscreatedin2.

5.Thereponseisdeliveredtotheclient

Thisworkedgreatsofarandthisisthewaymanyapplicationswork.Butifyouscale
thenumberofserversthatcanacceptincomingrequests,yourtrafficgrowsandyou
don`tpayspecialattentiontoyour3rdpartycomponentsthiscangowrong.Very
wrong.

Dependingontherequest,theapplicationcreatesthirdpartyconnections,executes
oneortwocommandsanddisconnectsagain.50%upto75%ofthecommandswe
executeareusedforconnectionhandling.RememberRedisissinglethreaded.Ifyou
havealotofclientsthattrytoconnecttoyourRedisinstancecontinuously,youwill
keepyourinstancebusywithconnectionhandlinginsteadofexecutingthe
commandsyourunyourbusinesslogicon.Thismayleadtoaslowdown/blockingof
http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 8/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

yourRedisinstance.The(simplified)imageabovevisualizestheproblem.Every
arrowrepresentsoneHTTPclientrequest.

Tip
Consideraproxybetweenyourapplicationandyour3rdpartycomponents.Ifyou
haveahighconnection/commandratioorRedisisusedasaplatformacross
manydifferentteams,aproxycanbeverybeneficial.Itcanreducetheconnection
overheadoractasafirewallforexpensiveandunwantedcommands.

Thisproblemsoundslikeatypicalproxyproblem.Subsequentresearchshowedthat
wewerenotaloneandthatthisproblemhadbeensolvedbefore.Onesolutionis
twemproxybytwitter.twemproxywascreatedspecificallyforthisusecase.Youinstall
thisproxyoneverywebserverandtwemproxyholdsapersistentconnectiontoyour
Redisinstance(s).Yourapplicationwillonlyconnecttothelocalproxywhichshould
bealotfaster,becauseitconnectstoaunixdomainsocketinsteadofanexternal
servicevianetwork.Andevenbetter:Itsupportsmemcachedaswell.Thiswasgood
newsforus,becausememcachedispartofourstackandmightfacethisproblemin
thefuture.

Soweintroducedtwemproxyintoourstack.Itwasnotaputitinandeverythingis
workingproject.Wehadseveralsmalladjustmentstomakebeforeitwasasuccess,
like

gettingtwemproxyupandrunningonFreeBSD(OSweuseforwebandRedis
servers)

addingRedisSELECTonconnect/supportformultipledatabasesinRedis
configuringourmbufvaluescorrectly

checkforusageofRediscommandsnotsupportedbytwemproxy

Asdescribedearliertheusageofmultipledatabaseswasmarkedasanantipattern.
Atthistimewewerenotabletomoveallourapplicationsawayfromconnectingto
differentdatabasesononeRedisinstancesothecapabilitytoSELECTadatabase
otherthanthedefaultonewasstillarequirementforus.

Anotheradvantageoftwemproxyisitsabilitytoblockexpensivecommands.Soitcan
actlikeacircuitbreakerforcommandslikeKEYSanddangerousoperationslike
FLUSHALLandFLUSHDB.

Thedownsideofthis:EverynewcommandoffutureRedisversionsneedstobe
supportedbytwemproxyaswell.IfyouupgradeyourRedisinstallationtousenew
features,likeGEOcommands,twemproxysupportneedstobeaddedanddeployed
aswell.

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 9/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

Thedeploymentoftwemproxywasagreatsuccess.Weeliminatedalltimeoutand
connectionerrorsthatwereleft(withoutbuyingnewhardware).

Bonusround:Shardyourdata

Allknownrootcausesfortheconnectionandcommandtimeoutsweresolved.The
growthandtrafficofourplatformstillcontinuedandwecontinuedtooptimizeour
Redisusage.

TwoofourRedisusecaseswerecachingofcalculateddataandshortterm(~1min.)
storage.Atsomepointintimeonemachinewillnotbeabletohandlethoseusecases
aloneanymore,because

thedatawontfitintoRAM

thenumberofread/writerequestsistohighforonemachine

Asafollowupwestartedtoshardourdataoverseveralmachinesusing
consistenthashing.Luckilythiswayofshardingisnativelysupportedbytwemproxy.
Thisresultedin:

Reductionoftraffic/load/requestspermachine
Improvedreliabilityofthiscachinginfrastructureduringnodefailures

Bothpointsareabigwin,especiallythesecondone.Ifamachinefails(e.g.hardware
failure)ourserviceisabletooperatenormally.Onlyasmallpercentageofcompute
andstoragepowerissacrificed.Weappliedthispatterntoeveryusecasewhereit
madesenseorwasapplicableWedidntregretit.Mostprominently,duetoanode
failureinthiscomponentlastyearthisoptimizationhaditsdebut.

Conclusion
Inthispost,wetoldyouourpainfulstoryofhowwelearnedtouseandbenefitfrom
Redis.Duringthetimeofidentifyingandfixingtheseissues,wefacedmultiple
challengesliketheconstantlyincreasingHTTPtraffictoourplatformand
understandingtheimplicationsofoperatingsuchadatabase.

Lookingbackthiswasnotonlyatechnicalissueandnowthisseemstobeobvious.
Therootcausesoftheseerrorsweremoreaconceptionalissue.Thewayweused
Rediswasnotidealforthiskindoftrafficandgrowth.

Afterunderstandingtheissuesmoreandmoreitwasclearthatthereisnosilver
bullettosolvethisproblem.Therewereseveralimportantlessonslike

understandinghowcommandsareexecuted(singlethreaded)

properlyconfigurethewayhowRedispersistsdata(BGSAVE)
http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 10/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

splittingdatastorageperserviceandavoidingsharedRedisinstances(multiple
databases)

understandtimecomplexityofRediscommands(KEYS/O(n))
controltheamountofTCP/IPconnectionstoyourRedisinstances(twemproxy)

shardyourdataonceitdoesntfitontoonemachineanymoreandaccepting
machinefailures(consistenthashing)

Wehadahardbutexcitingtimeandallofuslearnedalot.Eversinceweappliedthe
changesdescribedheretooursetupwedidntfaceanybiggerissue.Butbeaware:
TherearemanyotherthingsyouhavetotakecareofwhenyourunRedisin
productionlike

replicationandrequiredRAM

automaticfailoverviasentinal
correctuseofevictionpolicies

basicunderstandingofRedissecuritymodel(also
possibleattackpatternsbyinsufficientconfiguration)

propermonitoringandalerting

Didyouexperiencesimilarissues/problems?Orwereyouabletouseanyofthetips
mentionedhere?Letusknowinthecommentsection.

redis database nosql production lessonslearned

AndyGrunwald
FollowAndyGrunwaldonTwitter

Comments

5Comments trivagotechblog
1 Login

Recommend 1 Share SortbyBest

Jointhediscussion

PeeyushChandel2monthsago
Nicearticlewithdetailedexplanation.
http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 11/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

Nicearticlewithdetailedexplanation.

Wealsofacedsimilarchallengesandsolvedthemafterfixingvariouspartsasyou
did.

OneoftheinstanceIrememberis:

AllofasuddenonefinenightoneofourRedisinstancesstoppedacceptingwrites
thenIcheckedthelogsandfoundthatLinuxwasnotabletoforktheprocessfor
backgroundsaveandbydefaultRedisstopsacceptingthewritesifitisnotableto
savethemtodisk(whichisokayforpersistence).
Thendebuggingmorewefiguredoutthatwedon'tevenneedthepersistenceon
thatinstance.

Andthesolutionwas:

1.Turnedoffpersistenceonthatinstanceasitisnotneeded.
2.Madevm.overcomit_memory=1insysctl.conf
3.AsourRediswasbackedupbydatabase,soIchangedthememorypolicytoall
keyslru,sothatinthecaseofmemoryoverflowRediscanremoveanykey,
otherwise,bydefault,itwillonlyremovethekeyswhichhaveanexpiry.

Also,payattentiontoRedisconnectiontimeoutandmaxclientconnectionsettings
intheRedisconfigfile.Whenyouusethepersistentconnectionswithoutany
proxythenyouneedtomaintainthepoolofconnectionbyyourselfandkeepof
recyclingtheconnectionswhichwerealreadydead.
2 Reply Share

AndyGrunwald Mod >PeeyushChandel 2monthsago

HeyPeeyush,

thankstowritedownyouroutageandhowyousolvedit.Reallyvaluable
andmaybesomeotherreadershavetheexactsameissue.
1 Reply Share

MangatRai>PeeyushChandel13hoursago
Weusedtohaveexactsimilarissue,BGSAVEwouldfail(vmovercommitis
0bydefault)whichwouldkickRedisinreadonlymode.Forourdesign,
Persistencewasrequiredasthedatabasewasbuiltovertime,andcannot
bedoneagain.Alsowecan'taffordtolooseanykeys(otherthanexpiry..).
ThattimerollingBGSAVEdidn'toccurtome.EndedupusingAerospike
afterusingRedisfor1year
Reply Share

hughdbrown3hoursago
Justsometypos.Feelfreetodeletethiscommentwhenyouhavefixedthem.
s/Thiswasgreatforseveralreasons,butleadustoournextchallenge./Thiswas
greatforseveralreasons,butledustoournextchallenge./
s/Weoptimizedover40%oftheexecutedcommandswhichleadtolesstimespent
inthecommunicationwithRedis./Weoptimizedover40%oftheexecuted
commandswhichledtolesstimespentinthecommunicationwithRedis./
s/Intheendthisleadtoanoverallfasterresponsetimeofourwebstack./Inthe
http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 12/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog
s/Intheendthisleadtoanoverallfasterresponsetimeofourwebstack./Inthe
endthisledtoanoverallfasterresponsetimeofourwebstack./
Reply Share

AndyGrunwald Mod >hughdbrown 3hoursago

Thankyou@hughdbrown.
Ifixedthemanddeployedanewversion.Iamfinewithlettingthis
commenthere:)
Reply Share

Subscribe d AddDisqustoyoursiteAddDisqusAdd Privacy

We'reHiring
Tacklinghardproblemsislikegoingonanadventure.Solvingatechnical
challengefeelslikefindingahiddentreasure.Wanttogotreasurehuntingwith
us?

Viewallcurrentjobopenings

Relatedposts

Backend

YourDefiniteGuideForAutoscalingJenkins
Postedon17February2017

AttrivagoweuseJenkinsasourmainCItool.However,whenourphysicalsetupwas
notenoughweneededtomoveittothecloudandimplementanautomatedslave
scaling.Thisisthedefiniteguidewithallthestepswetooktoimplementanauto
scalingJenkinsplatform.

Readthepost

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 13/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

Backend

ConfigurationmanagementHowtostarttestingyour
saltformulas
Postedon12October2016

Configurationmanagementtoolshaverecentlygainedalotofpopularity.Attrivago
weuseSaltStacktoautomateourinfrastructure.Asthecomplexityofconfiguration
filesandformulasisincreasing,weneedafast,reliablewaytotestourchanges.

Readthepost

trivagoGmbH,BennigsenPlatz1,40474Dsseldorf,Deutschland

Company
Careers
Jobs
Press
SupportCentre
Expedia

Mobileappssearchonthego
HotelManagerforhoteliers
Communityforthosewholovehotels
AffiliateProgram

TermsandConditions
LegalInformation
PrivacyPolicy
Cookiepolicy

Copyright2017trivagoAllrightsreserved

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 14/15
29/3/2017 Learnredisthehardway(inproduction)trivagotechblog

http://tech.trivago.com/2017/01/25/learnredisthehardwayinproduction/? 15/15

Potrebbero piacerti anche