Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
What we do
We want to revolutionize the digital advertising industry by showing that there is more to ad analytics than click through rates.
onsdag 21 september 11
Ads
onsdag 21 september 11
Data
onsdag 21 september 11
Assembling sessions
exposure ping ping
ping
session
onsdag 21 september 11
Crunching
session session session session session session
session
session
session
42
session
session session
session
onsdag 21 september 11
Reports
onsdag 21 september 11
What we do
Track ads, make pretty reports.
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
Numbers
onsdag 21 september 11
Numbers
40 Gb data
onsdag 21 september 11
Numbers
40 Gb data 50 million documents
onsdag 21 september 11
Numbers
40 Gb data 50 million documents per day
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
Btw.
onsdag 21 september 11
Btw.
We use JRuby, its awesome
onsdag 21 september 11
A story in 7 iterations
onsdag 21 september 11
1st iteration
secondary indexes and updates
onsdag 21 september 11
1st iteration
secondary indexes and updates
One document per session, update as new data comes along Outcome: 1000% write lock
onsdag 21 september 11
#1
Everything is about working around the
onsdag 21 september 11
MongoDB 2.0.0
MongoDB 1.8.1
2nd iteration
using scans for two step assembling
Instead of updating, save each fragment, then scan over _id to assemble sessions
onsdag 21 september 11
2nd iteration
using scans for two step assembling
Outcome: not as much lock, but still not great performance. We also realised we couldnt remove data fast enough
onsdag 21 september 11
#2
Everything is about working around the
onsdag 21 september 11
#3
Give a lot of thought to your
PRIMARY KEY
onsdag 21 september 11
3rd iteration
partitioning
onsdag 21 september 11
3rd iteration
partitioning
We came up with the idea of partitioning the data by writing to a new collection every hour
onsdag 21 september 11
3rd iteration
partitioning
We came up with the idea of partitioning the data by writing to a new collection every hour Outcome: lots of complicated code, lots of bugs, but we didnt have to care about removing data
onsdag 21 september 11
#4
Make sure you can
onsdag 21 september 11
4th iteration
sharding
onsdag 21 september 11
4th iteration
sharding
To get around the global write lock and get higher write performance we moved to a sharded cluster.
onsdag 21 september 11
4th iteration
sharding
To get around the global write lock and get higher write performance we moved to a sharded cluster. Outcome: higher write performance, lots of problems, lots of ops time spent debugging
onsdag 21 september 11
#5
Everything is about working around the
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
#7 IT WILL FAIL
design for it
onsdag 21 september 11
onsdag 21 september 11
onsdag 21 september 11
5th iteration
moving things to separate clusters
onsdag 21 september 11
5th iteration
moving things to separate clusters
We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster.
onsdag 21 september 11
5th iteration
moving things to separate clusters
We saw very different loads on the shards and realised we had databases with very different usage patterns, some that made autosharding not work. We moved these off the cluster. Outcome: a more balanced and stable cluster
onsdag 21 september 11
#8
Everything is about working around the
onsdag 21 september 11
#9 ONE DATABASE
with one usage pattern
PER CLUSTER
onsdag 21 september 11
onsdag 21 september 11
6th iteration
monster machines
onsdag 21 september 11
6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think
onsdag 21 september 11
6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to HighMemory Quadruple Extra Large (with cheese).
onsdag 21 september 11
6th iteration
monster machines
We got new problems removing data and needed some room to breathe and think Solution: upgraded the servers to HighMemory Quadruple Extra Large (with cheese).
I
onsdag 21 september 11
#11
Dont try to scale up
SCALE OUT
onsdag 21 september 11
#12
When youre out of ideas
onsdag 21 september 11
7th iteration
partitioning (again) and pre-chunking
onsdag 21 september 11
7th iteration
partitioning (again) and pre-chunking
We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot.
onsdag 21 september 11
7th iteration
partitioning (again) and pre-chunking
We rewrote the database layer to write to a new database each day, and we created all chunks in advance. We also decreased the size of our documents by a lot. Outcome: no more problems removing data.
onsdag 21 september 11
#13
Smaller objects means a smaller database, and a smaller database means
onsdag 21 september 11
#14
Give a lot of thought to your
PRIMARY KEY
onsdag 21 september 11
#15
Everything is about working around the
onsdag 21 september 11
#16
Everything is about working around the
onsdag 21 september 11
KTHXBAI
onsdag 21 september 11
onsdag 21 september 11
Tips
Safe mode
onsdag 21 september 11
Tips
Safe mode
Run every Nth insert in safe mode
onsdag 21 september 11
Tips
Safe mode
Run every Nth insert in safe mode This will give you warnings when bad things happen; like failovers
onsdag 21 september 11
Tips
Avoid bulk inserts
onsdag 21 september 11
Tips
Avoid bulk inserts
Very dangerous if theres a possibility of duplicate key errors
onsdag 21 september 11
Tips
EC2
onsdag 21 september 11
Tips
EC2
You have three copies of your data, do you really need EBS?
onsdag 21 september 11
Tips
EC2
You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance.
onsdag 21 september 11
Tips
EC2
You have three copies of your data, do you really need EBS? Instance store disks are included in the price and they have predictable performance. m1.xlarge comes with 1.7 TB of storage.
onsdag 21 september 11