PDDB 2

Parallel and distributed
databases II
Some interesting recent

systems
MapReduce
Dynamo
Peer-to-peer
Then and now
A modern search engine
MapReduce
How do I write a massively parallel data

intensive program?
Develop the algorithm

Write the code to distribute work to machines
Write the code to distribute data among
machines
Write the code to retry failed work units
Write the code to redistribute data for a second
stage of processing
Write the code to start the second stage after
the first finishes
Write the code to store intermediate result data
Write the code to reliably store final result data
MapReduce
Two phases
Map: take input data and map it to zero

or more key/value pairs
Reduce: take key/value pairs with the
same key and reduce them to a result
MapReduce framework takes care of

the rest
Partitioning data, repartitioning data,

handling failures, tracking completion
MapReduce
X
X
Map
Reduce
Example
Count the number of times each word appears on the w
apple
banana
apple
grape
apple
apple
grape
apple
apple
banana
apple
grape
apple
apple
grape
apple
Map
apple,1
banana,1
apple,1
apple,1
apple,1
apple,1
apple,1
apple,5
apple,1
grape,1
apple,1
apple,1
grape,1
grape,1
grape,1
apple,1
banana,1
Reduce
grape,5
banana,5
Other MapReduce uses
Grep
Sort
Analyze web graph
Build inverted indexes
Analyze access logs
Document clustering
Machine learning
Dynamo
Always writable data store
Do I need ACID for this?
Eventual consistency
Weak consistency guarantee for replicated

data
Updates initiated at any replica
Updates eventually reach every replica
Tentative versus stable writes
If updates cease, eventually all replicas will have

same state
Tentative writes applied in per-server partial order

Stable writes applied in global commit order
Bayou system at PARC
Eventual consistency
Storage
unit
Storage
unit
Master storage
unit
(Joe, 22, Arizona)
22 32
Arizona
Montana
(Joe, 32, Arizona)
Arizona
Montana
(Joe, 22, Montana)
22 32
Local write order preserved

Arizona
Montana
Joe Bob
(Bob, 22, Montana)
22 32
(Joe, 32, Montana)
Commit order
(Joe, 32, Montana)
Inconsistent
copies visible
Joe Bob
(Bob, 32, Montana)

Joe Bob
(Joe, 22, Montana)
(Bob, 32, Montana)
(Bob, 32, Montana)

(Bob, 32, Montana)
All replicas end up in same state
Mechanisms
Epidemics/rumor mongering
Updates are gossiped to random sites

Gossip slows when (almost) every site has
heard it
Anti-entropy
Pairwise sync of whole replicas
Via log-shipping and occasional DB snapshot
Ensure everyone has heard all updates
Primary copy
One replica determines the final commit order

of updates
Epidemic replication
Update
Node is already infected
Susceptible (no update)
Infective (spreading update)
Removed (updated, not spreading)
Anti-entropy
Will not receive update
Pairwise sync
Susceptible (no update)
Infective (spreading update)
Removed (updated, not spreading)
What if I get a conflict?

Brian=GoodProfessor
Brian=BadProfessor
How to detect?
Version vector: (As count, Bs count)

Brian=GoodProfessor
How to detect?

Initially, (0,0) at both
A writes, sets version vector to (1,0)
(1,0) dominates Bs version (0,0)
No conflict

Brian=GoodProfessor
Brian=BadProfessor
How to detect?

Initially, (0,0) at both
A writes, sets version vector to (1,0)
B writes, sets version vector to (0,1)
Neither vector dominates the other
Conflict!!
How to resolve conflicts?
Commutative operations: allow both
Thomas write rule: take the last update
Add Fight Club to shopping cart

Add Legends of the Fall shopping cart
Doesnt matter what order they occur in
Thats the one we meant to have stick
Let the application cope with it
Expose possible alternatives to application

Application must write back one answer
Peer-to-peer
Great technology
Shady business model
Focus on the technology for now
Peer-to-peer origins
Where can I find songs for

download?
Q?
Web interface
Napster
Q?
Q?
Q?
Gnutella
Q?
Characteristics
Peers both generate and process

messages
Server + client = servent
Massively parallel
Distributed
Data-centric
Route queries and data, not packets
Gnutella
Joining the network
ping
ping
ping
pong
ping
ping
pong
pong
pong
pong
ping
ping
pong
ping
ping
ping
ping
pong
ping
pong
ping
ping
ping
pong
ping
Joining the network
Search
Q?
TTL = 4
Download
Failures
Scalability!
Messages flood the network

Example: Gnutella meltdown, 2000
How to make more scalable?
Search more intelligently
Replicate information
Reorganize the topology
Iterative deepening
Q?
Yang and Garcia-Molina 2002, Lv et al 2002
Directed breadth-first search

Q?
Yang and Garcia-Molina 2002
Random walk
Q?
Adamic et al 2001
Random walk with

replication
Q?
Cohen and Shenker 2002, Lv et al 2002
Supernodes
Q?
Kazaa, Yang and Garcia-Molina 2003
Some interesting
observations
Most peers are short-lived
Most peers are freeloaders
Average up-time: 60 minutes

For a 100K network, this implies churn rate of 1,600 nodes
per minute
Saroiu et al 2002
70 percent of peers share no files

Most results come from 1 percent of peers
Adar and Huberman 2000
Network tends toward a power-law topology
Power-law: nth most connected peer has k/n connections

A few peers have many connections, most peers have few
Ripeanu and Foster 2002
Structured networks
Idea: form a structured topology that

gives certain performance guarantees
Number of hops needed for searches

Amount of state required by nodes
Maintenance cost
Tolerance to churn
Part of a larger application
Peer-to-peer substrate for data

management
Distributed Hash Tables
Basic operation
Given key K, return associated value V
Examples of DHTs
Chord
CAN
Tapestry
Pastry
Koorde
Kelips
Kademlia
Viceroy
Freenet
Chord
Node ID = Hash(IP)
2m-1 0
Object ID = Hash(Key)
Stoica et al 2001
Searching
K?
O(N)
Better searching
ger table: ith entry is node that succeeds me by at least 2i-1, m entries to
K?
O(log N)
Joining
Joining
O(log2 N)
Joining
Inserting
What is actually stored?
Objects
Requires moving object

Load balancing for downloads
Pointers to original objects
Unless some objects are hot
Object can stay in original location

Nodes with many objects can cause load
imbalance
Chord allows either option
Good properties
Limited state
Bounded hops
O(log n) for search, insert (w.h.p.)
Bounded maintenance
Finger table size: m = O(log n)
O(log2 n)
Robust
What if finger table is

incomplete?
K?
Issues
Partitions
Malicious nodes
Network awareness

PDDB 2

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

PDDB 2

Caricato da

Copyright:

Formati disponibili

Parallel and distributed

Some interesting recent

Then and now

A modern search engine

How do I write a massively parallel data

Develop the algorithm

Map: take input data and map it to zero

MapReduce framework takes care of

Partitioning data, repartitioning data,

Count the number of times each word appears on the w

Other MapReduce uses

Always writable data store

Do I need ACID for this?

Weak consistency guarantee for replicated

Updates initiated at any replica

Updates eventually reach every replica

Tentative versus stable writes

If updates cease, eventually all replicas will have

Tentative writes applied in per-server partial order

Bayou system at PARC

(Joe, 32, Arizona)

(Joe, 22, Montana)

Local write order preserved

(Joe, 32, Montana)

(Bob, 32, Montana)

(Joe, 22, Montana)

(Bob, 32, Montana)

(Bob, 32, Montana)

All replicas end up in same state

Updates are gossiped to random sites

Pairwise sync of whole replicas

Via log-shipping and occasional DB snapshot

Ensure everyone has heard all updates

One replica determines the final commit order

Node is already infected

Susceptible (no update)

Infective (spreading update)

Removed (updated, not spreading)

Will not receive update

Susceptible (no update)

Infective (spreading update)

Removed (updated, not spreading)

What if I get a conflict?

Version vector: (As count, Bs count)

What if I get a conflict?

Version vector: (As count, Bs count)

What if I get a conflict?

Version vector: (As count, Bs count)

How to resolve conflicts?

Commutative operations: allow both

Thomas write rule: take the last update

Add Fight Club to shopping cart

Thats the one we meant to have stick

Let the application cope with it

Expose possible alternatives to application

Where can I find songs for

Peers both generate and process

Server + client = servent

Route queries and data, not packets

Joining the network

Joining the network

Messages flood the network

How to make more scalable?