Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Models
Data
Think Before You Model
Or how to keep doing what you’re already doing
3
Some of the Entities and Relationships in KillrVideo
id
timestamp
firstname User 1
lastname adds
n id
email n
password 1 name
m
Video
description
rates
1 location
posts
rating
preview_image
timestamp features
tags
n
n
id
Comment
comment 4
Modeling Queries
5
Some Application Workflows in KillrVideo
Show latest
User Logs Search for a videos
into site video by tag added to the
site
Show
Show basic Show videos
comments
information added by a
posted by a
about user user
user 6
Some Queries in KillrVideo to Support Workflows
Users
Show basic
User Logs Find user by email information Find user by id
into site address about user
Comments
Show
Show
comments
Find comments by comments Find comments by
for a video video (latest first) posted by a
user
user (latest first)
Ratings
Show
ratings for a
Find ratings by
video video 7
Some Queries in KillrVideo to Support Workflows
Videos
Show latest
Search for a videos Find videos by date
Find video by tag
video by tag added to the (latest first)
site
8
Data Modeling Refresher
9
Users – The Cassandra Way
Show basic
User Logs Find user by email information Find user by id
into site address about user
80
70 10
60 20
50 30
40
Views or indexes?
Show video Show videos
Find videos by user
and its Find video by id added by a
details user (latest first)
Denormalized data
12
Videos Everywhere!
Show latest
Search for a videos Find videos by date
Find video by tag
video by tag added to the (latest first)
site
14
Single Nodes Have Limits Too
Show latest
Find videos by date
• Mitigate by adding data to the
videos
added to the (latest first) Partition Key to spread load
site
) WITH CLUSTERING
added_date, videoid)
• Arbitrary data, like a bucket
ORDER BY (added_date DESC,
videoid ASC
number
);
– Round robin at the app level
15
Hot spot
yyyymmmdd
yyyymmmdd, bucket_number
Game API
Upload API
It’s all about the model
• Use case
• User creates an account
• User uploads image
• Image is distributed worldwide
• User can check access patterns
UPDATE videos
SET name = 'The data model is dead. Long live the data model.'
WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed;
Primary Key
The race is on
Process 1 Process 2
(0 rows)
INSERT INTO videos (videoid, name, userid, description, location, location_type, preview_thumbnails, tags, added_date, metadata)
VALUES (06049cbb-dfed-421f-b889-5f649a0de1ed,'The data model is dead. Long live the data model.',
9761d3d7-7fbd-4269-9988-6cfd4e188678,
'First in a three part series for Cassandra Data Modeling','http://www.youtube.com/watch?v=px6U2n74q3g',1,
{'YouTube':'http://www.youtube.com/watch?v=px6U2n74q3g'},{'cassandra','data model','relational','instruction'},
'2013-05-02 12:30:29’)
IF NOT EXISTS;
Don’t overwrite!
Lightweight Transactions
UPDATE videos
SET name = 'The data model is dead. Long live the data model.'
WHERE id = 06049cbb-dfed-421f-b889-5f649a0de1ed
IF userid = 9761d3d7-7fbd-4269-9988-6cfd4e188678;
Don’t overwrite!
Solution LWT
Process 1
[applied]
----------- T1
True
• No multi-gets (multi-partitions)
CREATE TYPE address (
• Nesting! street text,
city text,
zip_code int,
country text,
cross_streets set<text>
);
Before
CREATE TABLE videos (
videoid uuid,
userid uuid,
name varchar,
description varchar,
location text,
location_type int,
preview_thumbnails map<text,text>,
tags set<varchar>,
added_date timestamp, SELECT * In-application
FROM videos Title: Introduction to Apache Cassandra
PRIMARY KEY (videoid) join
); WHERE videoId = 2;
Description: A one hour talk on everything
you need to know about a totally amazing
SELECT *
database.
CREATE TABLE video_metadata ( FROM video_metadata
video_id uuid PRIMARY KEY, WHERE videoId = 2;
height int, Playback rate:
width int,
480 720
video_bit_rate set<text>,
encoding text
);
After
• Now video_metadata is
embedded in videos
}
);
Retrieving fields
Aggregates
Follow me on twitter
@PatrickMcFadin