Sei sulla pagina 1di 80

SHORTCUTS AROUND THE MISTAKES IVE MADE SCALING MONGODB

Theo, Chief Architect at


onsdag 21 september 11

What we do
We want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.

onsdag 21 september 11

Ads

onsdag 21 september 11

Data

onsdag 21 september 11

Assembling sessions
exposure ping ping

ping event ping event ping

ping

session

onsdag 21 september 11

Crunching
session session session session session session

session

session

session

42

session

session session

session

onsdag 21 september 11

Reports

onsdag 21 september 11

What we do
Track ads, make pretty reports.

onsdag 21 september 11

That doesnt sound so hard

onsdag 21 september 11

That doesnt sound so hard


We dont know when sessions end

onsdag 21 september 11

That doesnt sound so hard


We dont know when sessions end Theres a lot of data

onsdag 21 september 11

That doesnt sound so hard


We dont know when sessions end Theres a lot of data Its all done in (close to) real time

onsdag 21 september 11

Numbers

onsdag 21 september 11

Numbers
40 Gb data

onsdag 21 september 11

Numbers
40 Gb data 50 million documents

onsdag 21 september 11

Numbers
40 Gb data 50 million documents per day

onsdag 21 september 11

How we use MongoDB

onsdag 21 september 11

How we use MongoDB


Virtual memory to ofoad data while we wait for sessions to nish

onsdag 21 september 11

How we use MongoDB


Virtual memory to ofoad data while we wait for sessions to nish Short time storage (<48 hours) for batch jobs

onsdag 21 september 11

How we use MongoDB


Virtual memory to ofoad data while we wait for sessions to nish Short time storage (<48 hours) for batch jobs Metrics storage

onsdag 21 september 11

Why we use MongoDB

onsdag 21 september 11

Why we use MongoDB


Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas

onsdag 21 september 11

Why we use MongoDB


Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes

onsdag 21 september 11

Why we use MongoDB


Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store)

onsdag 21 september 11

Why we use MongoDB


Schemalessness makes things so much easier, the data we collect changes as we come up with new ideas Sharding makes it possible to scale writes Secondary indexes and rich query language are great features (for the metrics store) Its just nice
onsdag 21 september 11

Btw.

onsdag 21 september 11

Btw.
We use JRuby, its awesome

onsdag 21 september 11

A story in 7 iterations

onsdag 21 september 11

1st iteration
secondary indexes and updates

onsdag 21 september 11

1st iteration
secondary indexes and updates
One document per session, update as new data comes along Outcome: 1000% write lock

onsdag 21 september 11

#1
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

MongoDB 2.0.0

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: ...}}, true)


onsdag 21 september 11

MongoDB 1.8.1

db.coll.update({_id: "xyz"}, {$inc: {x: 1}}, true)

db.coll.update({_id: "abc"}, {$push: {x: ...}}, true)


onsdag 21 september 11

2nd iteration
using scans for two step assembling
Instead of updating, save each fragment, then scan over _id to assemble sessions

onsdag 21 september 11

2nd iteration
using scans for two step assembling
Outcome: not as much lock, but still not great performance. We also realised we couldnt remove data fast enough

onsdag 21 september 11

#2
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

#3
Give a lot of thought to your

PRIMARY KEY

onsdag 21 september 11

3rd iteration
partitioning

onsdag 21 september 11

3rd iteration
partitioning
We came up with the idea of partitioning the data by writing to a new collection every hour

onsdag 21 september 11

3rd iteration
partitioning
We came up with the idea of partitioning the data by writing to a new collection every hour Outcome: lots of complicated code, lots of bugs, but we didnt have to care about removing data

onsdag 21 september 11

#4
Make sure you can

REMOVE OLD DATA

onsdag 21 september 11

4th iteration
sharding

onsdag 21 september 11

4th iteration
sharding
To get around the global write lock and get higher write performance we moved to a sharded cluster.

onsdag 21 september 11

4th iteration
sharding
To get around the global write lock and get higher write performance we moved to a sharded cluster. Outcome: higher write performance, lots of problems, lots of ops time spent debugging

onsdag 21 september 11

#5
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

#6 SHARDING IS NOT A SILVER BULLET


and its buggy, if you can, avoid it

onsdag 21 september 11

onsdag 21 september 11

#7 IT WILL FAIL
design for it

onsdag 21 september 11

onsdag 21 september 11

onsdag 21 september 11

5th iteration
moving things to separate clusters

onsdag 21 september 11

5th iteration
moving things to separate clusters
We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.

onsdag 21 september 11

5th iteration
moving things to separate clusters
We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster. Outcome: a more balanced and stable cluster

onsdag 21 september 11

#8
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

#9 ONE DATABASE
with one usage pattern

PER CLUSTER

onsdag 21 september 11

#10 MONITOR EVERYTHING


look at your health graphs daily

onsdag 21 september 11

6th iteration
monster machines

onsdag 21 september 11

6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think

onsdag 21 september 11

6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to HighMemory Quadruple Extra Large (with cheese).

onsdag 21 september 11

6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to HighMemory Quadruple Extra Large (with cheese).

I
onsdag 21 september 11

#11
Dont try to scale up

SCALE OUT

onsdag 21 september 11

#12
When youre out of ideas

CALL THE EXPERTS

onsdag 21 september 11

7th iteration
partitioning (again) and pre-chunking

onsdag 21 september 11

7th iteration
partitioning (again) and pre-chunking
We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.

onsdag 21 september 11

7th iteration
partitioning (again) and pre-chunking
We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot. Outcome: no more problems removing data.

onsdag 21 september 11

#13
Smaller objects means a smaller database, and a smaller database means

LESS RAM NEEDED

onsdag 21 september 11

#14
Give a lot of thought to your

PRIMARY KEY

onsdag 21 september 11

#15
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

#16
Everything is about working around the

GLOBAL WRITE LOCK

onsdag 21 september 11

KTHXBAI

@iconara architecturalatrocities.com burtcorp.com

onsdag 21 september 11

Since we got time

onsdag 21 september 11

Tips
Safe mode

onsdag 21 september 11

Tips
Safe mode
Run every Nth insert in safe mode

onsdag 21 september 11

Tips
Safe mode
Run every Nth insert in safe mode This will give you warnings when bad things happen; like failovers

onsdag 21 september 11

Tips
Avoid bulk inserts

onsdag 21 september 11

Tips
Avoid bulk inserts
Very dangerous if theres a possibility of duplicate key errors

onsdag 21 september 11

Tips
EC2

onsdag 21 september 11

Tips
EC2
You have three copies of your data, do you really need EBS?

onsdag 21 september 11

Tips
EC2
You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance.

onsdag 21 september 11

Tips
EC2
You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance. m1.xlarge comes with 1.7 TB of storage.

onsdag 21 september 11

Potrebbero piacerti anche