Sei sulla pagina 1di 30

myYearbook.

com  Architecture  
Lessons  Learned  from  the  Trials  of  
Scaling  a  High  Traffic  Website  
•  Founded  in  2005  
•  3rd  Largest  Social  
Network  in  United  
States  
•  Teenage  Demographic  
•  60+  Employees  
January  2007  
•  100M  Pageviews  
•  1  Database  Server  
•  1  Web  ApplicaOon  Server  
•  Daily  issues  with  load  and  site  availability  
September  2008  
•  2.5B  Pageviews  
•  30  Database  Servers  
•  120  Web  ApplicaOon  Servers  
•  99.94%  UpOme  as  measured  by  pingdom.com  
Key  Architecture  Components  
•  PHP5,  APC   •  LighYpd  
•  Apache  hYpd   •  Isilon  IQ  Clustered  NAS  
•  PostgreSQL   •  Message  Systems  
•  Memcached   eCelerity  
•  Apache  AcOveMQ   •  Subversion  
Web  ApplicaOon  Architecture  
•  2005-­‐2007:  Monolithic  Code  Base  
•  2008:  MigraOng  to  a  Services  Oriented  
Architecture  
–  ApplicaOons  get  own  resources  
–  Loosely  Coupled  architecture  
•  MVC  ApplicaOon  using  XSLT  
Web  ApplicaOon  Architecture  
•  Why  SOA?  
–  Monolithic  app  wastes  
hardware  
–  Cross  Data-­‐Center  
OperaOons  
–  SelecOve  Maintenance  
Scaling  Postgres  
Rules  for  Scaling  
1. Plan  for  Growth  
2. Know  the  internals  
3. Bigger  Hardware  is  
BeYer  
Our  Postgres  Scaling  History  
•  Quarter  1,  2007  
–  Monolithic  database  with  one  schema,  many  
complex  joins  and  poor  opOmizaOon  
–  No  plan  for  growth  
–  No  DBA  
Our  Postgres  Scaling  History  
•  Quarter  3,  2008  
–  Horizontal  “Sharded”  Data  
–  VerOcal  ParOOoning  
–  5000  ConnecOons/sec  Avg  
Scaling  Postgres:  Lessons  Learned  
•  Scaling  web  servers  means  many  database  
connecOons,  needed  pooling  
–  Started  with  pgPool  moved  to  pgBouncer  
•  Started  with  Slony  replicaOng  read-­‐only  slaves  
–  High  IO/CPU  Overhead  
Scaling  Postgres:  Lessons  Learned  
•  Began  scaling  verOcally  by  separaOng  
applicaOon  data  by  database  servers  and  
removed  read  only  slaves  
•  Needed  few  small  tables  replicated  that  could  
be  slightly  inaccurate  and  eventually  
consistent    (BASE)  
Scaling  Postgres:  Lessons  Learned  
•  Enter  plProxy  
–  Database  parOOoning  language  by  Skype  uOlizing  
PostgreSQL  funcOons  
–  Trigger  based  plProxy  funcOons  replicate  needed  
tables  without  the  Queue  overhead  
–  NOT  TRANSACTION  SAFE  
Scaling  Postgres:  Lessons  Learned  
•  Standard  Use  of  plProxy  
–  Horizontal  parOOoning  of  data  by  ID  across  
mulOple  servers  
–  Example:  Messaging  System  
•  8  Servers  store  actual  parOOoned  message  data  
•  Rule  #1  –  Plan  for  Growth  
Scaling  Postgres:  Lessons  Learned  
•  Knowing  internals  
–  pg_catalog  
•  pg_stat_user_tables  
•  pg_stat_user_indexes  
Scaling  Postgres:  Knowing  Internals  
Scaling  Postgres:  Lessons  Learned  
•  Database  Ecosystem  
–  Performance  Factors  
•  Index  bloat  
•  Usage  changes  
–  Abuse  
•  Cache  uOlizaOon  
contenOon  
Scaling  Postgres:  Lessons  Learned  
•  Bigger  is  BeYer  
–  More  RAM  
–  More  Disks  
–  Faster  and  More  CPU  
Scaling  Postgres:  Lessons  Learned  
Scaling  Across  CPU  Cores   Before  and  A=er  Upgade  
•  PostgreSQL  Scales  to  32  
Cores  
•  Extensive  Benchmarking  @  
MYB  
Scaling  Postgres:  Future  Plans  
•  More  ParOOoning  
•  SOA  Data  DistribuOon  
–  Golconde  
•  Python  Based  
•  Apache  AcOveMQ  
Apache  AcOveMQ  
•  Java  based  Message  
Broker  soqware  
•  Client  language  neutral  
•  Implements  JMS  1.1,  
Stomp,  XMPP,  REST  and  
Others  
AcOveMQ  @  myYearbook.com  
Out-­‐of-­‐band  Processing   Targeted  Workload  
•  Uploaded  content  processing   •  Message  Queues  allow  for  the  
–  Image  Resize  
–  Content  analysis  (R&D)  
right  server  for  the  job  
–  AnO-­‐Virus  Scans     •  BeYer  distribuOon  of  CPU  
•  Comment  and  Message  processing   intensive  tasks  without  
–  Spam  Processing   negaOvely  impacOng  the  user  
•  Email  spooling  from  web   experience  
applicaOon  
•  Anywhere  we  can  that  makes  sense   •  Clusterable,  Scalable  
Memcached:  Key  for  Success  
•  Valuable  Scaling  Tool  
–  Over  250k  get  requests  second  during  peak  
–  Over  750GB  of  cached  data  
–  Easy  to  Deploy  
–  The  more  distributed  the  cache  becomes  the  less  
impacOng  cache  failures  become  -­‐  more  boxes  are  
beYer  than  fewer  
Memcached:  PotenOal  Problems  
•  Large  scale  implementaOons  can  have  some  hidden  
problems  
–  Lots  of  network  traffic  
–  Non-­‐parOOon  or  evenly  distributed  data  
•  What  to  do  for  data  that  is  not  evenly  distributed?  
–   Implemented  a  round-­‐robin  cluster  of  memcache  servers  
that  contain  the  same  data  
Research  and  Development  
•  Copyr  
–  Copy-­‐on-­‐Write  Filesystem  ReplicaOon  
•  Framewerk  
–  PHP5  OO  Development  Framework  
•  Golconde  
–  Queue  Based  Data  DistribuOon  for  PostgreSQL  
•  Lightr  
–  PHP5  XMPP  Class  Library  
•  mod_xsltd  
–  LighYpd  XSL  TransformaOon  module  
•  Playr  
–  PostgreSQL  Log  Replay  
•  Staplr  
–  STAOsical  Package  Logically  engineered  Right  
Tools  for  Success  
•  OperaOons  Portal  
–  ExecuOve  Level  Overview  of  OperaOonal  Status  
and  ProducOon  Change  Log  
•  Staplr  
–  Trending  &  AnalyOs  System  
OperaOons  Portal  
Trending  and  Analysis:  Staplr  
•  Version  0.6  
–  PHP  Based  
–  Process  forking  
–  Shelled  RRD  Commands  
•  Version  2.0  
–  Python  Based  
–  Threaded  
–  Python  wrappers  to  librrd  
Trending  and  Analysis:  Staplr  
•  Polls  for:  
–  Apache  hYpd  
–  Apache  AcOveMQ  
–  lighYpd  
–  memcached  
–  MySQL  
–  pgBouncer  
–  PostgreSQL  
–  SNMP  Data  
•  APC,  Isilon,  F5,  Xiotech,  Others  
–  SysStat  
QuesOons?  

Potrebbero piacerti anche