Sei sulla pagina 1di 46

#MDBlocal

Advanced Schema
Design Patterns
FEBRUARY 15, 2018 | BELL HARBOR
Who Am I?
{ "name": "Daniel Coupal",
"jobs_at_MongoDB": [
{ "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
{ "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ],
"email": "daniel.coupal@mongodb.com"
}

#MDBlocal
Pattern
PATTERN

The "Gang of Four":


A design pattern systematically names, explains,
and evaluates an important and recurring design
in object-oriented systems

MongoDB systems can also be built using its own


patterns

#MDBlocal
Why this Talk?
• 10 years with the document
model
• Use of a common
methodology and
vocabulary when designing
schemas for MongoDB
• Ability to model schemas
using building blocks
• Less art and more
methodology

#MDBlocal
Why do we Create Models?

• Hardware
Ensure: • RAM faster than Disk
• Disk cheaper than RAM
• Good performance • Network latency
• Scalability • Reduce costs $$$
despite constraints • Database Server
• Maximum size for a document
• Atomicity of a write
• Data set
• Size of data

#MDBlocal
However don't Over Design!

#MDBlocal
WMDB -
World Movie Database

Any events, characters and


entities depicted in this
presentation are fictional.

Any resemblance or similarity to


reality is entirely coincidental

#MDBlocal
WMDB -
World Movie Database

First iteration
3 collections:

A. movies
B. moviegoers
C. screenings
#MDBlocal
Mission Possible

Our mission, should we decide to accept it, is to


fix this solution, so it can perform well and scale.
As always, should I or anyone in the audience do
it without training, WMDB will disavow any
knowledge of our actions.
This tape will self-destruct in five seconds. Good
luck!

#MDBlocal
#MDBlocal
Patterns by Category
• Representation • Frequency of Access • Grouping
• Attribute ✔️ • Subset ✔️ • Computed ✔️
• Schema Versioning ✔️ • Approximation ✔️ • Bucket
• Document Versioning • Extended Reference • Outlier
• Tree
• Polymorphism
• Pre-Allocation

#MDBlocal
Issue #1: Big Documents, Many Fields
and Many Indexes
{ Would need the following indexes:
title: "Dunkirk",
{ release_USA: 1 }
...
{ release_Mexico: 1 }
release_USA: "2017/07/23",
{ release_France: 1 }
release_Mexico: "2017/08/01",
...
release_France: "2017/08/01",
{ release_Festival_San_Jose: 1 }
release_Festival_San_Jose: ...
"2017/07/22"
}

#MDBlocal
Pattern #1: Attribute

{
title: "Dunkirk",
...
release_USA: "2017/07/23",
release_Mexico: "2017/08/01",
release_France: "2017/08/01",
release_Festival_San_Jose:
"2017/07/22"
}

#MDBlocal
Attribute Pattern

Problem:
• Lots of similar fields
• Common characteristic to search across those fields together
• Fields present in only a small subset of documents
Use cases:
• Product attributes like ‘color’, ‘size’, ‘dimensions’, ...
• Release dates of a movie in different countries, festivals

#MDBlocal
Attribute Pattern - Solution

Solution:
• Field pairs in an array
Benefits:
• Allow for non deterministic list of attributes
• Easy to index
{ "releases.location": 1, "releases.date": 1 }
• Easy to extend with a qualifier, for example:
{ descriptor: "price", qualifier: "euros", value: Decimal(100.00) }

#MDBlocal
Issue #2: Working Set doesn’t fit in RAM

Possible solutions:
A. Reduce the size of your working set
B. Add more RAM per machine
C. Start sharding or add more shards

#MDBlocal
WMDB -
World Movie Database

First iteration
3 collections:

A. movies
B. moviegoers
C. screenings
#MDBlocal
Pattern #2: Subset

In this example, we can:


• Limit the list of actors and
crew to 20
• Limit the embedded reviews
to the top 20
•…

#MDBlocal
Subset Pattern

Problem:
• There is a 1-N or N-N relationship, and only a few documents
always need to be shown
• Only infrequently do you need to pull all of the depending
documents
Use cases:
• Main actors of a movie
• List of reviews or comments

#MDBlocal
Subset Pattern - Solution

Solution:
• Keep duplicates of a small subset of fields in the main collection
Benefits:
• Allows for fast data retrieval and a reduced working set size
• One query brings all the information needed for the "main page"

#MDBlocal
Quiz A
Subset Pattern

Question:
• Which new MongoDB 3.6 feature will allow me to notify an
application if the name of an actor is changed?

#MDBlocal
Issue #3: Lot of CPU Usage

• CPU is on fire!

#MDBlocal
Issue #3: ..caused by repeated calculations

{
title: "The Shape of Water",
...
viewings: 5,000
viewers: 385,000
revenues: 5,074,800
}

#MDBlocal
Pattern #3: Computed

For example:
• Apply a sum, count, ...
• rollup data by minute, hour,
day
• As long as you don’t mess
with your source, you can
recreate the rollups

#MDBlocal
Computed Pattern

Problem:
• There is data that needs to be computed
• The same calculations would happen over and over
• Reads outnumber writes:
• example: 1K writes per hour vs 1M read per hour
Use cases:
• Have revenues per movie showing, want to display sums
• Time series data, Event Sourcing
#MDBlocal
Computed Pattern - Solution

Solution:
• Apply a computation or operation on data and store the result
Benefits:
• Avoid re-computing the same thing over and over

#MDBlocal
Quiz B
Computed Pattern

Question:
• Which Relational Database feature is typically used to mimic the
computed pattern?

#MDBlocal
Issue #4: Lots of Writes

#MDBlocal
Issue #4: … for non critical data

#MDBlocal
Pattern #4: Approximation

• Only increment once in X


iterations
• Increment by X

#MDBlocal
#MDBlocal
Approximation Pattern

Problem:
• Data is difficult to calculate correctly
• May be too expensive to update the document every time to keep
an exact count
• No one gives a damn if the number is exact
Use cases:
• Population of a country
• Web site visits
#MDBlocal
Approximation Pattern –
Solution
Solution:
• Fewer stronger writes
Benefits:
• Less writes, reducing contention on some documents

#MDBlocal
Issue #5: Need to change the list of fields in the
documents
• Keeping track of the schema version of a document

#MDBlocal
Pattern #5: Schema Versioning

Add a field to track the


schema version number, per
document

Does not have to exist for


version 1

#MDBlocal
Schema Versioning Pattern

Problem:
• Updating the schema of a database is:
• Not atomic
• Long operation
• May not want to update all documents, only do it on updates
Use cases:
• Practically any database that will go to production

#MDBlocal
Schema Versioning Pattern –
Solution
Solution:
• Have a field keeping track of the schema version
Benefits:
• Don't need to update all the documents at once
• May not have to update documents until their next modification

#MDBlocal
BACK to reality

#MDBlocal
Aspect of Patterns: Consistency

• How duplication is handled


A. Update both source and target in real time
B. Update target from source at regular intervals. Examples:
• Most popular items => update nightly
• Revenues from a movie => update every hour
• Last 10 reviews => update hourly? daily?

#MDBlocal
What our Patterns did for us

Problem Pattern
Messy and Large Documents Attribute
Too much RAM Subset
Too much CPU Computed
Too many disk accesses Approximation
No downtime to upgrade schema Schema Versioning

#MDBlocal
Other Patterns

• Bucket
• grouping documents together, to have less documents
• Document Versioning
• tracking of content changes in a document
• Outlier
• Avoid few documents drive the design, and impact performance for all
• External Reference
• Tree(s)
• Polymorphism
• Pre-allocation

#MDBlocal
Take Aways

A. Simple grouping from tables to collections is not optimal

B. Learn a common vocabulary for designing schemas with MongoDB

C. Use patterns as "plug-and-play" to improve performance

#MDBlocal
References for complete Solutions

A full design example for a


given problem:
• E-commerce site
• Contents Management
System
• Social Networking
• Single view
•…
#MDBlocal
How Can I Learn More About Schema Design?

• More patterns in a follow up to this presentation


• MongoDB in-person training courses on Schema Design

• Upcoming Online course at


MongoDB University:
• https://university.mongodb.com
• Data Modeling

#MDBlocal
Quiz C
Which Pattern is used

{ "name": "Daniel Coupal",


"jobs_at_MongoDB": [
Question: { "job": "Senior Curriculum Engineer",
"from": new Date("2016-11") },
• Which Pattern is used in the { "job": "Senior Technical Service Engineer",
"from": new Date("2013-11") }
following document? ],
"previous_jobs": [
"Consultant",
"Developer",
"Manager Quality & Tools Team",
"Manager Software Team",
"Tools Developer"
],
"likes": [ "food", "beers", "movies", "MongoDB" ],
"email": "daniel.coupal@mongodb.com"
}

#MDBlocal
Thank You for using MongoDB !

#MDBlocal

Potrebbero piacerti anche