Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
NoSQL Introduction
• Understand what NoSQL is and what it is not.
• Why would you want to use NoSQL within your project
and which NoSQL database would you utilize?
• Explore the relationships between NoSQL and RDBMS.
• Understand how to select between an RDBMs (MySQL
and PostgreSQL), Document Database (MongoDB), Key-
Value Store, Graph Database, and Columnar databases or
combinations of the above.
2
NoSQL Solves Some Problems
• Identify Problems first…
• Don’t implement Solutions just
because they are Awesome
3
NoSQL
• History
• Popular NoSQL Databases
• NoSQL Implementation CRUD Operations
4
NoSQL History
• June 11, 2009 Meetup
– Open Source, Distributed, Non-Relational DB
– Eric Evans (Rackspace)
– Johan Oskarsson (Last.fm)
5
NoSQL History
6 http://www.w3resource.com/mongodb/nosql.php
NoSQL Origination
• Problems not solved by RDBMs
• Limitations of RDBMs, not SQL
7
NoSQL History
8
NoSQL History
• Bad name, but it stuck!
• Not a definitive term
• Generally, newer databases solving
new and different problems
• Not Only SQL
9
Most Popular Databases
http://db-engines.com/en/ranking
Ranking by: Web Content, Web Searches, Technical Discussion, Jobs, Resumes
10
Most Popular NoSQL
• MongoDB - Document Store
• Cassandra – Wide Column Store
• Redis – Key-value store
• Solr – Search Engine
• Hbase – Wide Column Store
• Neo4j – Graph Database
• Memcached – Key-value Store
• CouchDB – Document Store
• Riak – Key-value Store
• SimpleDB – Key-value Store within Amazon Cloud
11
Download NoSQL v95.141.3
Released 5/17/2017
http://www.nosql.org/downloads/laeRtoN.zip
12
Reading Recommendations
13
2016 NoSQL vs RDBMs
15
NoSQL Database Types
• Key-Value – Redis, Riak
16
Key-Value Stores
Key Value
Code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
17
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
18
Column Oriented Database
19
Neo4j
20
NoSQL Characteristics
No Predefined Schemas (except for Columnar)
• May insert data without creating a table
• Schema Versions (v1.5, v1.6, v1.7,…)
Rarely Foreign Keys (except for Graph Databases)
• No JOIN operations
• Relationships are not automatically maintained
Eventual Consistency
• Replicated old data eventually replaced by updated data
• Inconsistent data until all replacements are complete
21
NoSQL Database Types
• Document (JSON)
– Schema is continually growing
– Can pre-JOIN records for speed
• Column Oriented Databases (Columnar)
– Quick Aggregate COUNT, AVG, MIN, MAX, SUM
– Sparse Data = Lots of NULL values
• Graph
– Representation of Complex Relationships/JOINs
22
CRUD Operations
Create
Read
Update
Delete
23
SQL CRUD
Create
INSERT INTO table (column1, column2) VALUES (9, 'string');
Read
SELECT column1, column2 FROM table;
Update
UPDATE table SET column2 = 'text' WHERE column1= 9
Delete
DELETE FROM table WHERE column2='text'
24
Database SELECT Statements
Oracle
SELECT * FROM table
MongoDB
db.table.find()
Cassandra (CQL)
SELECT * FROM table
Neo4j (Cypher)
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
25
Document Oriented Database
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
26
Document Oriented Database
{ "facutly" :
[
{ {
"_id" : 1, "_id" : 2,
"name" : { "first" : "John", "last" : "Backus" }, "name" : { "first" : "David", "last" : "Williams" },
"contribs" : [ "Fortran", "ALGOL" ], "contribs" : [ "C#", "Java", "PHP" ],
"awards" : [ "awards" : [
{ "award" : "W.W. McDowell Award", { "award" : "Sherman Peabody Award II",
"year" : 1967, "year" : 2095,
"by" : "IEEE Computer Society" }, "location" : "Paris",
{ "award" : "Draper Prize", "by" : "Intergalactic Continuum" },
"year" : 1993, { "award" : "Sherman Peabody Award IX",
"by" : "National Academy of Engineering" } "year" : 2090,
] "location" : “San Francisco",
}, "by" : "Intergalactic Continuum" },
{ "award" : "Sherman Peabody Award IV",
"year" : 2093,
"location" : “London",
"by" : "Intergalactic Continuum" }
]
}
]
}
27
Document Oriented Database
http://chris.photobooks.com/json/
28
MongoDB Simple Database
{"city": "ACMAR", "loc": [-86.51557, 33.584132], "pop": 6055, "state": "AL", "_id": "35004"}
{"city": "ADAMSVILLE", "loc": [-86.959727, 33.588437], "pop": 10616, "state": "AL", "_id": "35005"}
{"city": "ADGER", "loc": [-87.167455, 33.434277], "pop": 3205, "state": "AL", "_id": "35006"}
{"city": "KEYSTONE", "loc": [-86.812861, 33.236868], "pop": 14218, "state": "AL", "_id": "35007"}
{"city": "NEW SITE", "loc": [-85.951086, 32.941445], "pop": 19942, "state": "AL", "_id": "35010"}
{"city": "ALPINE", "loc": [-86.208934, 33.331165], "pop": 3062, "state": "AL", "_id": "35014"}
{"city": "ARAB", "loc": [-86.489638, 34.328339], "pop": 13650, "state": "AL", "_id": "35016"}
{"city": "BAILEYTON", "loc": [-86.621299, 34.268298], "pop": 1781, "state": "AL", "_id": "35019"}
{"city": "BESSEMER", "loc": [-86.947547, 33.409002], "pop": 40549, "state": "AL", "_id": "35020"}
{"city": "HUEYTOWN", "loc": [-86.999607, 33.414625], "pop": 39677, "state": "AL", "_id": "35023"}
{"city": "BLOUNTSVILLE", "loc": [-86.568628, 34.092937], "pop": 9058, "state": "AL", "_id": "35031"}
{"city": "BREMEN", "loc": [-87.004281, 33.973664], "pop": 3448, "state": "AL", "_id": "35033"}
{"city": "BRENT", "loc": [-87.211387, 32.93567], "pop": 3791, "state": "AL", "_id": "35034"}
{"city": "BRIERFIELD", "loc": [-86.951672, 33.042747], "pop": 1282, "state": "AL", "_id": "35035"}
{“city”: “Logan, UT”, “additionally”: [“Nibley, UT”, “River Heights, UT”], “state”: “UT”, “version”: “2.1”, “_id”: “84321”}
{“city”: “Olivehurst, CA”, “additionally”: [“Arboga, CA”, “Plumas Lake, CA”, “West Linda, CA”], “state”: “CA”, “version”: “2.1”,
“_id”: “95961”}
Source: http://media.mongodb.org/zips.json
29
Document Database
Advantages
• Add new columns of data very easily
• Commonly JOIN’d data is pre-collected
• NULL fields can be skipped
• Many-to-many without helper table
Disadvantages
• Must track schema definitions
• Data integrity is limited
30
MongoDB vs SQL
http://docs.mongodb.org/manual/reference/sql-comparison/
Terminology:
MongoDB <-> RDBMS
Collection <-> Table
Document <-> Row
Column <-> Field
31
MongoDB vs SQL CRUD
SELECT (Read)
db.courses.find() = SELECT * FROM courses
db.courses.find({name: “CIS2120”}) = WHERE name=“CIS2120”
db.courses.count() = SELECT COUNT(*) FROM courses
UPDATE
db.courses.update({name: “Lehi”}, { $set : { “zip” : “11111” } } )
DELETE
db.courses.remove({name: “CIS2120”})
32
MongoDB CRUD
INSERT
db.courses.insert({
name: “CIS2120”,
description: “Database Coding”,
instructor: {
name: “David Williams”,
email: “david.williams@usu.edu”
}
subjects: [“Python”, “MongoDB”, “3NF”, “ETL”, “Star Schema”]
})
33
MongoDB JOIN
• All fields for a document are pre-joined for speedy retrieval
• JOINs were not natively supported
• $lookup provides a lightweight JOIN capabilities
https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/
34
Column Oriented Database
35
Column Oriented Database
36
Column Oriented Database
37
May only see notable improvements
with over 1 million records
38
Cassandra CRUD
CREATE TABLE course (
name text PRIMARY KEY,
instructor text,
maxstudents int
)
39
Cassandra CRUD
CREATE TABLE people (
name text,
email text,
phones map<text, text>
)
40
Neo4j
41
Neo4j – Graph Database
http://www.neo4j.org/learn/try
http://docs.neo4j.org/refcard/2.0/
MATCH (n)-[r:LIKES]->(m) RETURN n,r,m
Matches a person “n” that likes person “m”
42
Neo4j – Graph Database
Game of Thrones:
http://neo4j.com/graphgist/6029850
43
http://neo4j.com/graphgist/c4eab62c-7f5e-4e17-8f75-811d65d83127
http://neo4j.com/graphgist/886c572c-509e-41be-91b5-d74a5ef6d16d
http://neo4j.com/graphgists/?category=health-care-and-science
44
Neo4j
(LUKE {name:"Luke Skywalker"}), (OBI_WAN)-[:KNOWS]->(VADER),
(HAN {name:"Han Solo"}), (LUKE)-[:KNOWS]->(R2D2),
(LEIA {name:"Princess Leia Organa"}), (R2D2)-[:KNOWS]->(C3PO),
(OBI_WAN {name:"Obi Wan Kenobi"}), (LUKE)-[:LIVED_ON]->(TATOOINE),
(YODA {name : "Yoda"}), (HAN)-[:LIVED_ON]->(CORELLIA),
(VADER {name:"Darth Vader"}), (LEIA)-[:LIVED_ON]->(ALDERAAN),
(C3PO {name:"C3PO", droid:true}), (YODA)-[:LIVED_ON]->(DAGOBAH),
(R2D2 {name:"R2D2", droid:true}), (LUKE)-[:DEVOTED_TO]->(JEDI),
(CHEWBACCA {name:"Chewbacca"}), (LUKE)-[:DEVOTED_TO]->(REBELLION),
(TATOOINE {name:"Tatooine", distance:13184}), (LUKE)-[:DEVOTED_TO]->(LIGHT_SIDE),
(DAGOBAH {name:"Dagobah", distance:15407}), (VADER)-[:DEVOTED_TO]->(SITH),
(JEDI {name:"Jedi"}), (VADER)-[:DEVOTED_TO]->(EMPIRE),
(SITH {name:"Sith"}), (VADER)-[:DEVOTED_TO]->(DARK_SIDE),
(REBELLION {name:"Rebellion"}), (LEIA)-[:DEVOTED_TO]->(REBELLION),
(EMPIRE {name:"Empire"}), (HAN)-[:DEVOTED_TO]->(REBELLION)
(DARK_SIDE {name:"Dark Side"}), …
(LIGHT_SIDE {name:"Light Side"}), https://gist.github.com/peterneubauer/6019125
… http://gist.neo4j.org/?6019125
(LUKE)-[:FRIENDS_WITH]->(HAN),
(LUKE)-[:FRIENDS_WITH]->(LEIA), MATCH y-[r]-other
(HAN)-[:FRIENDS_WITH]->(CHEWBACCA), WHERE y.name='Yoda'
(YODA)-[:TEACHES]->(OBI_WAN), return y.name, type(r), other.name
(YODA)-[:TEACHES]->(LUKE),
45 (OBI_WAN)-[:TEACHES]->(LUKE),
Neo4j CRUD
MATCH (user {name:“Bill"})-[:KNOWS]->(colleague)
WHERE colleague.employer=“LinkedIn”
RETURN user,colleague
ORDER BY colleague.name LIMIT 10
46
Neo4j CRUD
UPDATE
edge.weight = 87
DELETE
edge.removeProperty(‘weight’)
http://docs.neo4j.org/refcard/2.0/
http://www.neo4j.org/learn/cypher
47
NoSQL Challenges
• Identify a Problem First not a Solution
– Define the business value before spending
– Yet another solution to maintain
• Define Standards and Best Practices
• Concept Education and Technical Training
• MongoDB Schema Change Tracking
• Heterogeneous Interoperability
• Security is often an add-on rather than native
48
Supplemental Slides
OpenWest 2014 NoSQL
Presentation Recording
https://www.youtube.com/watch?v=057ddu0Xsqk&noredirect=1
51
2012 NoSQL vs RDBMs
56
Sharding
Partitions
Data distributed across disks
Sharding
Data distributed across servers
57
Map Reduce
Divides work across distributed systems
Parallel processing of large data sets
Divide – Conquer – Consolidate
2
6
16
8
1+2+3+6+7+8+9=? 36
1
7
20
3
9
59
JSON Example
{
"_id" : 1,
"name" : { "first" : "John", "last" : "Backus" },
"contribs" : [ "Fortran", "ALGOL", "FP" ],
"awards" : [
{ "award" : "W.W. McDowell Award",
"year" : 1967,
"by" : "IEEE Computer Society" },
{ "award" : "Draper Prize",
"year" : 1993,
"by" : "National Academy of Engineering" }
]
}
60 http://www.mongodb.com/json-and-bson
ACID, BASE, CAP, CPR
1979 Gray, 1983 Reuter & Härder - ACID
Atomic, Consistent, Isolated, Durable
Rollback: All or Nothing, Follows Rules, Simultaneous, No Drops
1997 Brewer - BASE
Basically Available, Soft-state, Eventually consistent
2000 Brewer – CAP (Pick Two)
Consistency, Availability, Partition Tolerance
CPR (Pick Two)
Consistency, Performance, Replication/Redundancy
CPR
Pick
Two Redundancy
Consistency
Performance
Redundancy/Replication
63
CPR
Consistency Performance
A B C D Redundancy
64
Consistency
Performance
Updates may be
inconsistent across Redundancy
devices
65
ABCD ABCD ABCD ABCD
Performance
Redundancy
66
CRUD
Create
Read
Update
Delete
67
Key-Value Stores
Key Value
code bucket
code:java 17.316% Lowest rank on Feb 2014
code:C 18.334% Lowest rank on August 2013
code:Objective-C Lowest rank on Dec 2007 11.341%
code:C++ {“score”:”6.892%”, “low rank”: “Feb 2008”}
Key Value
drink bucket
drink:java coffee
drink:punch Sprite + pineapple juice
drink:pop Carbonated Soda
http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html
68
Redis CRUD
http://redis.io/commands
http://redis.io/topics/data-types-intro
http://openmymind.net/2011/11/8/Redis-Zero-To-Master-In-30-Minutes-Part-
1/
72
Google BigTable
• White Paper published in 2006
• Many databases based upon BigTable
• 13 pages, readable for many non-techies
• Insightful into the early days of NoSQL
http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
73
Hbase
Large-Scale, Column-oriented database
Consistency, Performance, Fault-Tolerant, ACID via Locking
Tables are created before initial data is added
Tables have
row keys are indexed row identifier strings
column families – contain one or more columns
timestamp for version control
74
Hbase
Row key is a unifier for column families.
If row does insert values in a column family no disk
space is utilized within the column family.
Write-Ahead Logging
(WAL)
similar to file system
journaling
75
Hbase CRUD
create ‘wiki_table’, ‘text_column_family’, ‘revision_column_family’
create ‘wiki’, ‘text’, ‘revision’
put ‘wiki’, ‘first page’, ‘text:’, ‘…’
put ‘wiki’, ‘first page’, ‘revision:author’, ‘…’
get ‘wiki’, ‘first page’, [‘revision:author’, ‘revision:comment’]
delete ‘wiki’, ‘first page’, ‘revision:author’
scan ‘wiki’ = SELECT * FROM wiki
77
http://visualizer.json2html.com/
Additional JSON Visualizer
78
NoSQL Introduction
• NoSQL is a commonly adopted misnomer
• Typically does not use ANSI SQL
– SQL = Structured Query Language
– Structure exists but is more Flexible
– Queries are performed
– Language is closer to Programming Languages
79
Slides and Feedback at: http://joind.in/11012
NoSQL “Bleeding Edge”
• Several solutions are mature and stable
enough to run large scale production
environments
• Not all permutations have been considered
• Several (but not all) optimization strategies
have been published
• Crucial elements such as Security may be a
secondary add-on in favor of performance.
80
NoSQL “Bleeding Edge”
Sun Microsystems csh man page:
“Although robust enough for
general use, adventures into the
esoteric periphery of the C shell
may reveal unexpected quirks.”
81