Sei sulla pagina 1di 47

Schema Design

Senior Solutions Architect, MongoDB


Ranga Sarvabhouman
@MongoDB
All application development is
Schema Design

Success comes from
Proper Data Structure
What is a Record?
Key Value
One-dimensional storage
Single value is a blob
Query on key only
No schema
Value cannot be updated, only replaced
Key Blob
Relational
Two-dimensional storage (tuples)
Each field contains a single value
Query on any field
Very structured schema (table)
In-place updates
Normalization process requires many tables, joins,
indexes, and poor data locality

Primary
Key
Document
N-dimensional storage
Each field can contain 0, 1,
many, or embedded values
Query on any field & level
Flexible schema
Inline updates *
Embedding related data has optimal data locality,
requires fewer indexes, has better performance


_id
Core Concepts
Traditional Schema Design
Focus on data storage
Document Schema Design
Focus on data use

Another way to think about it
What answers do I have?
What questions do I
have?

Three Building Blocks of
Document Schema
Design
1 Flexibility
Choices for schema design
Each record can have different fields
Field names consistent for programming
Common structure can be enforced by application
Easy to evolve as needed

2 Arrays
Multiple Values per Field
Each field can be:
Absent
Set to null
Set to a single value
Set to an array of many values
Query for any matching value
Can be indexed and each value in the array is in the
index


3 - Embedded Documents
An acceptable value is a document
Nested documents provide structure
Query any field at any level
Can be indexed

What is an Entity?
An Entity
Object in your model
Associations with other entities
An Entity
Object in your model
Associations with other entities
Referencing (Relational) Embedding (Document)
has_one embeds_one
belongs_to embedded_in
has_many embeds_many
has_and_belongs_to_ma
ny
MongoDB has both referencing and embedding for universal
coverage
Let's model something
together

How about a business
card?


Business Card

Referencing
Addresses

{
_id: ,
street: ,
city: ,
state: ,
zip_code: ,
country:
}
Contacts

{
_id: ,
name: ,
title: ,
company: ,
phone: ,
address_id:
}

Embedding
Contacts

{
_id: ,
name: ,
title: ,
company: ,
address: {
street: ,
city: ,
state: ,
zip_code: ,
country:
},
phone:
}

Relational Schema

Contact

name
company
title
phone

Address

street
city
state
zip_code

Contact

name
company
adress
Street
City
State
Zip
title
phone

address
street
city
State
zip_code
Document Schema
How are they different? Why?

Contact

name
company
title
phone

Address

street
city
state
zip_code

Contact

name
company
adress
Street
City
State
Zip
title
phone

address
street
city
state
zip_code
Schema Flexibility

{
name: ,
title: ,
company: ,
address: {
street: ,
city: ,
state: ,
zip_code:
},
phone:
}

{
name: ,
url: ,
title: ,
company: ,
email: ,
address: {
street: ,
city: ,
state: ,
zip_code:
}
phone: ,
fax
}
Example
Lets Look at an
Address Book

Address Book
What questions do I have?
What are my entities?
What are my associations?

Address Book Entity-Relationship

Contacts
name
company
title

Addresses
type
street
city
state
zip_code
Phones
type
number
Emails
type
address
Thumbnail
s
mime_type
data
Portraits
mime_type
data
Groups
name
N
1
N
1
N
N
N
1
1
1
1 1
Twitters
name
location
web
bio
1
1
Associating Entities
One to One

Contacts
name
company
title

Addresses
type
street
city
state
zip_code
Phones
type
number
Emails
type
address
Thumbnail
s
mime_type
data
Portraits
mime_type
data
Groups
name
N
1
N
1
N
N
N
1
1
1
1 1
Twitters
name
location
web
bio
1
1
One to One
Schema Design Choices


contact
twitter_id

twitter
1 1
contact
twitter
contact_id
1 1
Redundant to track relationship on both sides
Both references must be updated for consistency

May save a fetch?
Contact
twitter
twitter
1
One to One
General Recommendation
Full contact info all at once
Contact embeds twitter
Parent-child relationship
contains
No additional data duplication
Can query or index on embedded field
e.g., twitter.name
Exceptional cases
Reference portrait which has very large data
Contact
twitter
twitter
1
One to Many

Contacts
name
company
title

Addresses
type
street
city
state
zip_code
Phones
type
number
Emails
type
address
Thumbnail
s
mime_type
data
Portraits
mime_type
data
Groups
name
N
1
N
1
N
N
N
1
1
1
1 1
Twitters
name
location
web
bio
1
1
One to Many
Schema Design Choices


contact
phone_ids: [ ]

phone
1 N
contact
phone
contact_id
1 N
Redundant to track relationship on both sides
Both references must be updated for consistency

Not possible in relational DBs
Save a fetch?
Contact
phones
phone
N
One to Many
General Recommendation
Full contact info all at once
Contact embeds multiple phones
Parent-children relationship
contains
No additional data duplication
Can query or index on any field
e.g., { phones.type: mobile }
Exceptional cases
Scaling: maximum document size is 16MB
Contact
phones
phone
N
Many to Many

Contacts
name
company
title

Addresses
type
street
city
state
zip_code
Phones
type
number
Emails
type
address
Thumbnail
s
mime_type
data
Portraits
mime_type
data
Groups
name
N
1
N
1
N
N
N
1
1
1
1 1
Twitters
name
location
web
bio
1
1
Many to Many
Traditional Relational Association

Join table

Contacts
name
company
title
phone

Groups
name
GroupContacts
group_id
contact_id
Use arrays instead

X
Many to Many
Schema Design Choices


group
contact_ids: [ ]

contact
N N
group
contact
group_ids: [
]
N N
Redundant to track
relationship on both sides
Both references must be
updated for consistency

Redundant to track
relationship on both sides
Duplicated data must be
updated for consistency

group
contacts
contact
N
contact
groups
group
N
Many to Many
General Recommendation
Depends on use case
1. Simple address book
Contact references groups
2. Corporate email groups
Group embeds contacts for performance
Exceptional cases
Scaling: maximum document size is 16MB
Scaling may affect performance and working set

group
contact
group_ids: [
]
N N
Contacts
name
company
title

addresses
type
street
city
state
zip_code
phones
type
number
emails
type
address
thumbnail
mime_type
data
Portraits
mime_type
data
Groups
name
N
1
N
1
twitter
name
location
web
bio
N
N
N
1
1
Document model - holistic and efficient representation

Contact document example
{
name : Gary J. Murakami, Ph.D.,
company : MongoDB, Inc.,
title : Lead Engineer,
twitter : {
name : Gary Murakami, location : New Providence, NJ,
web : http://www.nobell.org
},
portrait_id : 1,
addresses :
,
phones :
,
emails :

}

Working Set
To reduce the working set, consider
Reference bulk data, e.g., portrait
Reference less-used data instead of embedding
Extract into referenced child document

Also for performance issues with large documents
General Recommendations
Legacy Migration
1. Copy existing schema & some data to MongoDB
2. Iterate schema design development
Measure performance, find bottlenecks, and embed
1. one to one associations first
2. one to many associations next
3. many to many associations
3. Migrate full dataset to new schema
New Software Application? Embed by default


Embedding over Referencing
Embedding is a bit like pre-joined data
BSON (Binary JSON) document ops are easy for the
server
Embed (90/10 following rule of thumb)
When the one or many objects are viewed in the
context of their parent
For performance
For atomicity
Reference
When you need more scaling
For easy consistency with many to many associations
without duplicated data

Its All About Your Application
Programs+Databases = (Big) Data Applications
Your schema is the impedance matcher
Design choices: normalize/denormalize,
reference/embed
Melds programming with MongoDB for best of both
Flexible for development and change
ProgramsMongoDB = Great Big Data Applications

Potrebbero piacerti anche