Sei sulla pagina 1di 161

CouchDB: the last RESTful JSON

service you’ll ever need


Carlo Cabanilla
CouchDB is
a RESTful
JSON store
CouchDB is
a RESTful
JSON store
Create
Create

> PUT /mydb/stuff


> Content-type: application/json
>
> {"thing_count":1}
Create

> PUT /mydb/stuff


> Content-type: application/json
>
> {"thing_count":1}

< HTTP/1.1 201 Created


<
< {"_id":"stuff",
< "_rev":"1-e86a94c1...",
< "thing_count":1}
Read
Read

> GET /mydb/stuff


Read

> GET /mydb/stuff

< HTTP/1.1 200 OK


<
< {"_id":"stuff",
< "_rev":"1-e86a94c1...",
< "thing_count":1}
Update
Update
> PUT /mydb/stuff
> Content-type: application/json
>
> {"_id":"stuff",
> "_rev":"1-e86a94c1...",
> "thing_count":13}
Update
> PUT /mydb/stuff
> Content-type: application/json
>
> {"_id":"stuff",
> "_rev":"1-e86a94c1...",
> "thing_count":13}

< HTTP/1.1 201 Created


<
< {"_id":"stuff",
< "_rev":"2-39a9bf12...",
< "thing_count":13}
Delete
Delete

> DELETE /mydb/stuff?rev=2-39a9bf12...


Delete

> DELETE /mydb/stuff?rev=2-39a9bf12...

< HTTP/1.1 200 OK


a RESTful JSON store
a RESTful JSON store
PUT /look/ma/im/doing/rest
Do you support
Conditional GET?
PUT /look/ma/im/doing/rest
Conditional GET
is the
holy grail
of
REST
Conditional GET is the holy grail of REST
Conditional GET is the holy grail of REST

> GET /that/money


Conditional GET is the holy grail of REST

> GET /that/money

< HTTP/1.1 200 OK


<
< {“that_money”: “$5”}
Conditional GET is the holy grail of REST
Conditional GET is the holy grail of REST

> GET /that/money


Conditional GET is the holy grail of REST

> GET /that/money


(wait for a few ms)
Conditional GET is the holy grail of REST

> GET /that/money


(wait for a few ms)

< HTTP/1.1 200 OK


<
< {“that_money”: “$5”}
Conditional GET is the holy grail of REST

( )
> GET /that/money
(wait for a few ms)

< HTTP/1.1 200 OK


<
< {“that_money”: “$5”}
x 1,000,000
Conditional GET is the holy grail of REST

( )
> GET /that/money
(wait for a few ms)

< HTTP/1.1 200 OK


<
< {“that_money”: “$5”}
x 1,000,000 = slow!
Conditional GET is the holy grail of REST
Conditional GET is the holy grail of REST

> GET /that/money


Conditional GET is the holy grail of REST

> GET /that/money


(Uncached: slow)
Conditional GET is the holy grail of REST

> GET /that/money


(Uncached: slow)

< HTTP/1.1 200 OK


< Etag: "1-24c87859"
<
< {“that_money”: “$5”}
Conditional GET is the holy grail of REST

> GET /that/money


(Uncached: slow)

< HTTP/1.1 200 OK


< Etag: "1-24c87859"
<
< {“that_money”: “$5”}
Conditional GET is the holy grail of REST
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
(Cache hit: fast)
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
(Cache hit: fast)

< HTTP/1.1 304 Not Modified


< Etag: "1-24c87859"
< Content-Length: 0
Conditional GET is the holy grail of REST
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
(Data changed so cache miss )
Conditional GET is the holy grail of REST

> GET /that/money


> If-None-Match: "1-24c87859"
(Data changed so cache miss )

< HTTP/1.1 200 Ok


< Etag: "2-535d9fb2"
<
< {“that_money”: “$1,000,000”}
a RESTful JSON store
a RESTful JSON store
GET /my_db/some_id
GET /my_db/some_id

{
GET /my_db/some_id

{
"_id": "some_id",
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
"and arrays": ["ain't", "it", "cool?"],
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
"and arrays": ["ain't", "it", "cool?"],
"dicts too!": { "don't":"try",
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
"and arrays": ["ain't", "it", "cool?"],
"dicts too!": { "don't":"try",
"this":"in",
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
"and arrays": ["ain't", "it", "cool?"],
"dicts too!": { "don't":"try",
"this":"in",
"Oracle!":null }
GET /my_db/some_id

{
"_id": "some_id",
"_rev": "1-24c8785964763c21d...",
"my_field1": "json does strings",
"numbers too": 123.12,
"and arrays": ["ain't", "it", "cool?"],
"dicts too!": { "don't":"try",
"this":"in",
"Oracle!":null }
}
Schema-less
How do I know what a
document means??
Duck-type your database
Duck-typing

Duck-type your database


Duck-typing

def document_factory(doc):

Duck-type your database


Duck-typing

def document_factory(doc):
if doc['look'] == Duck.look()

Duck-type your database


Duck-typing

def document_factory(doc):
if doc['look'] == Duck.look()
and doc['swim'] == Duck.swim()

Duck-type your database


Duck-typing

def document_factory(doc):
if doc['look'] == Duck.look()
and doc['swim'] == Duck.swim()
and doc['quack'] == Duck.quack():

Duck-type your database


Duck-typing

def document_factory(doc):
if doc['look'] == Duck.look()
and doc['swim'] == Duck.swim()
and doc['quack'] == Duck.quack():
return Duck(doc)

Duck-type your database


Python

Duck-type your database


Python

def document_factory(doc):

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':
return Dog(doc)

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':
return Dog(doc)
elif ...:

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':
return Dog(doc)
elif ...:
...

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':
return Dog(doc)
elif ...:
...
else:

Duck-type your database


Python

def document_factory(doc):
if doc['type'] == 'Duck':
return Duck(doc)
elif doc['type'] == 'Dog':
return Dog(doc)
elif ...:
...
else:
return GenericDocument(doc)

Duck-type your database


Python

Duck-type your database


Python

def document_factory(doc):

Duck-type your database


Python

def document_factory(doc):
import document_types

Duck-type your database


Python

def document_factory(doc):
import document_types
doc_class = getattr(doc, doc['type'],

Duck-type your database


Python

def document_factory(doc):
import document_types
doc_class = getattr(doc, doc['type'],
'GenericDocument')

Duck-type your database


Python

def document_factory(doc):
import document_types
doc_class = getattr(doc, doc['type'],
'GenericDocument')
return doc_class(doc)

Duck-type your database


Java

Duck-type your database


Java
public static CouchObj documentFactory(CouchDoc doc) {

Duck-type your database


Java
public static CouchObj documentFactory(CouchDoc doc) {
ClassLoader loader = CouchObj.class.getClassLoader();

Duck-type your database


Java
public static CouchObj documentFactory(CouchDoc doc) {
ClassLoader loader = CouchObj.class.getClassLoader();
dispatcherClass = loader.loadClass(doc.get("type"));

Duck-type your database


Java
public static CouchObj documentFactory(CouchDoc doc) {
ClassLoader loader = CouchObj.class.getClassLoader();
dispatcherClass = loader.loadClass(doc.get("type"));
return (CouchObj) dispatcherClass.newInstance(doc);

Duck-type your database


Java
public static CouchObj documentFactory(CouchDoc doc) {
ClassLoader loader = CouchObj.class.getClassLoader();
dispatcherClass = loader.loadClass(doc.get("type"));
return (CouchObj) dispatcherClass.newInstance(doc);
}

Duck-type your database


That’s silly.
How do I query something that has no schema?
That’s silly.
How do I query something that has no schema?
What the heck is map reduce (in CouchDB)?

You write a map reduce function in Javascript!


What the heck is map reduce (in CouchDB)?
map reduce ≈ programmatically
building an index

You write a map reduce function in Javascript!


You write a map reduce function in Javascript!
In CouchDB, a map/reduce pair is called a view.

You write a map reduce function in Javascript!


In CouchDB, a map/reduce pair is called a view.
(different from a RDBMS view!)

You write a map reduce function in Javascript!


In CouchDB, a map/reduce pair is called a view.
(different from a RDBMS view!)

On creation => full database scan then cache


On db write => incrementally update the cache

You write a map reduce function in Javascript!


In CouchDB, a map/reduce pair is called a view.
(different from a RDBMS view!)

On creation => full database scan then cache


On db write => incrementally update the cache

(different from Hadoop map reduce!)

You write a map reduce function in Javascript!


You write a map reduce function in Javascript!
You can tell CouchDB to store any data in the view

You write a map reduce function in Javascript!


You can tell CouchDB to store any data in the view
(RDBMSs only store indexed values and row ids)

You write a map reduce function in Javascript!


You can tell CouchDB to store any data in the view
(RDBMSs only store indexed values and row ids)

CouchDB view ≈ RDBMS index +


materialized view

You write a map reduce function in Javascript!


Yo, data structures nerds,

You write a map reduce function in Javascript!


Yo, data structures nerds,

a CouchDB view is a B+ tree

You write a map reduce function in Javascript!


Yo, data structures nerds,

a CouchDB view is a B+ tree


map function determines
how keys map to values

You write a map reduce function in Javascript!


Yo, data structures nerds,

a CouchDB view is a B+ tree


map function determines
how keys map to values
keys are stored in sorted order

You write a map reduce function in Javascript!


Yo, data structures nerds,

a CouchDB view is a B+ tree


map function determines
how keys map to values
keys are stored in sorted order
O(log n) for reads, writes, deletes
and range queries

You write a map reduce function in Javascript!


Yo, data structures nerds,

a CouchDB view is a B+ tree


map function determines
how keys map to values
keys are stored in sorted order
O(log n) for reads, writes, deletes
and range queries
keys are stored close to values

You write a map reduce function in Javascript!


You write a map reduce function in Javascript!
For example:
Create an view called people on the
names of all person documents.

You write a map reduce function in Javascript!


For example:
Create an view called people on the
names of all person documents.

function map(doc) {
if (doc['type'] == 'person') {
emit(doc['name'], doc);
}
}

You write a map reduce function in Javascript!


For example:
Create an view called people on the
names of all person documents.

function map(doc) {
if (doc['type'] == 'person') {
emit(doc['name'], doc);
}
}

function reduce(keys, values) {


return values.length;
}
You write a map reduce function in Javascript!
You write a map reduce function in Javascript!
SELECT count(*) FROM people

You write a map reduce function in Javascript!


SELECT count(*) FROM people
GET /people

You write a map reduce function in Javascript!


SELECT count(*) FROM people
GET /people

SELECT name, count(*) FROM people GROUP BY name


ORDER BY name ASC

You write a map reduce function in Javascript!


SELECT count(*) FROM people
GET /people

SELECT name, count(*) FROM people GROUP BY name


ORDER BY name ASC
GET /people?group=true

You write a map reduce function in Javascript!


SELECT count(*) FROM people
GET /people

SELECT name, count(*) FROM people GROUP BY name


ORDER BY name ASC
GET /people?group=true

SELECT * FROM people ORDER BY name ASC

You write a map reduce function in Javascript!


SELECT count(*) FROM people
GET /people

SELECT name, count(*) FROM people GROUP BY name


ORDER BY name ASC
GET /people?group=true

SELECT * FROM people ORDER BY name ASC


GET /people?reduce=false

You write a map reduce function in Javascript!


You write a map reduce function in Javascript!
SELECT * FROM people WHERE name = “bjorn”

You write a map reduce function in Javascript!


SELECT * FROM people WHERE name = “bjorn”
GET /people?reduce=false&key=”bjorn”

You write a map reduce function in Javascript!


SELECT * FROM people WHERE name = “bjorn”
GET /people?reduce=false&key=”bjorn”

SELECT * FROM (
SELECT p.*, rownum rn
FROM people p ORDER BY name ASC
) WHERE rn BETWEEN 50 AND 60

You write a map reduce function in Javascript!


SELECT * FROM people WHERE name = “bjorn”
GET /people?reduce=false&key=”bjorn”

SELECT * FROM (
SELECT p.*, rownum rn
FROM people p ORDER BY name ASC
) WHERE rn BETWEEN 50 AND 60
GET /people?reduce=false&limit=10&skip=50

You write a map reduce function in Javascript!


What about joins??????

You write a map reduce function in Javascript!


You write a map reduce function in Javascript!
j/k.
Joins will be in version 0.11 (trunk)

You write a map reduce function in Javascript!


j/k.
Joins will be in version 0.11 (trunk)

function map(doc) {
for (i in doc['comment_ids']) {
var comment_id = doc['comment_ids'][i];
emit(doc['_id'], {'_id': comment_id });
}
}

You write a map reduce function in Javascript!


Documents in my blog db
[
{'_id': 1, 'comment_ids': [3, 4]},
{'_id': 2, 'comment_ids': [5]},
{'_id': 3, 'text': 'whoa'},
{'_id': 4, 'text': 'omg'},
{'_id': 5, 'text': 'ponies'}
]

You write a map reduce function in Javascript!


Documents in my blog db
[
{'_id': 1, 'comment_ids': [3, 4]},
{'_id': 2, 'comment_ids': [5]},
{'_id': 3, 'text': 'whoa'},
{'_id': 4, 'text': 'omg'},
{'_id': 5, 'text': 'ponies'}
]

GET /myblog/comments?
reduce=false&include_docs=true

You write a map reduce function in Javascript!


Documents in my blog db
[
{'_id': 1, 'comment_ids': [3, 4]},
{'_id': 2, 'comment_ids': [5]},
{'_id': 3, 'text': 'whoa'},
{'_id': 4, 'text': 'omg'},
{'_id': 5, 'text': 'ponies'}
]

GET /myblog/comments?
reduce=false&include_docs=true
[
{'key': 1, 'value': {'_id': 3, 'text': 'whoa'}},
{'key': 1, 'value': {'_id': 4, 'text': 'omg'}},
{'key': 5, 'value': {'_id': 5, 'text': 'ponies'}}
]

You write a map reduce function in Javascript!


Ok great,
so it can pretty much do
what a RDBMS can do.
Ok great,
so it can pretty much do
what a RDBMS can do.

Why not just use a RDBMS?


The real question you should be asking yourself is:
The real question you should be asking yourself is:

Why keep rewriting


HTTP protocol semantics
and data object accessors
to front your RDBMS?
Middle tier-less web apps!
(well, kind of)
No, forreals,
No, forreals,

Why not just use a RDBMS?


O
R
A
C
L
E
OMG
O
Really
R
Asininely
A
Costly
C
Licencing
L
Expenses
E
Cluster
Of
Unreliable
Commodity
Hardware
DataBase
Cluster?
Of
Unreliable
Commodity
Hardware
DataBase
Replication!

Cluster?
Replication!
Incremental,

Cluster?
Replication!
Incremental,
Fault Tolerant,

Cluster?
Replication!
Incremental,
Fault Tolerant,
Over HTTP

Cluster?
replication is expensive
replication is expensive

Mmm, yeah how much


did we pay for the LSB?
replication is expensive

Mmm, yeah how much


did we pay for the LSB?
Seems to work well tho
(fingers crossed)
Postgres replication is . . . ?
Postgres replication is . . . ?

No native replication support yet


(coming in v9.0)
Postgres replication is . . . ?

No native replication support yet


(coming in v9.0)

3rd party solutions are trigger-based (yuck)


Managing replication conflicts (abends)

Replication!
Managing replication conflicts (abends)
Deterministically pick a version to “win”

Replication!
Managing replication conflicts (abends)
Deterministically pick a version to “win”

Add a “\_conflict” property to the document

Replication!
Managing replication conflicts (abends)
Deterministically pick a version to “win”

Add a “\_conflict” property to the document

App must resolve the conflict

Replication!
Managing replication conflicts (abends)
Deterministically pick a version to “win”

Add a “\_conflict” property to the document

App must resolve the conflict

Much better explanation here:


http://books.couchdb.org/relax/
reference/conflict-management Replication!
Let’s see some appz!
Not enough time :(

• Design philosophy • Changes feed

• Transforming views with list • Potential applications at WGen


functions
• Elaborate on middle tier-less
• Lucene integration web apps

• Performance benchmarks

• Erlang under the hood

• Many to many relationships


http://couchdb.apache.org/

Potrebbero piacerti anche