Sei sulla pagina 1di 34

Peer-to-Peer Systems

Simarleen Kaur
CSE Department

1
Peer-to-Peer

 An alternative to the client/server model of distributed


computing is the peer-to-peer model.
 Client/server is inherently hierarchical, with resources
centralized on a limited number of servers.
 In peer-to-peer networks, both resources and control are
widely distributed among nodes that are theoretically
equals. (A node with more information, better information, or
more power may be “more equal,” but that is a function of
the node, not the network controllers.)

2
Decentralization

 A key feature of peer-to-peer networks is decentralization.


This has many implications. Robustness, availability of
information and fault-tolerance tends to come from
redundancy and shared responsibility instead of planning,
organization and the investment of a controlling authority.
 On the Web both content providers and gateways try to
profit by controlling information access. Access control is
more difficult in peer-to-peer, although Napster depended on
a central index.

3
Technology Transition

The Client/Server Model

4 The Peer-to-Peer Model


Applications outside Computer
Science

 Bioinformatics
 Education and academic
 Military
 Business
 Television
 Telecommunication

5
Why Peer-to-Peer Networking?

 The Internet has three valuable fundamental assets-


information, bandwidth, and computing resources - all of
which are vastly under utilized, partly due to the traditional
client-server computing model.
 Information - Hard to find, impossible to catalog and
index
 Bandwidth - Hot links get hotter, cold ones stay cold

 Computing resources - Heavily loaded nodes get


overloaded, idle nodes remain idle

6
Information Gathering

 The world produces two exabytes of information


(2x1018 bytes) every year…..out of which
 The world publishes 300 terabytes of information
(2x1012 bytes) every year
 Google searches 1.3x109 pages of data
 Data beyond web servers
 Transient information
Hence, finding useful information in real time is
increasingly difficult.

7
Bandwidth Utilization

 A single fiber’s bandwidth has increased by a factor of


106, doubling every 16 months, since 1975
 Traffic is still congested
 More devices and people on the net
 More volume of data to move around same
destinations ( eBay, Yahoo, etc.)

8
Computing Resources
 Moore’s Law: processor speed doubles
every 18 months
 Computing devices ( server, PC, PDA, cellphone)
are more powerful than ever
 Storage capacity has increased dramatically
 Computation still accumulates around
data centers

9
Benefits from P2P

 Theory
 Dynamic discovery of information
 Better utilization of bandwidth, processor, storage, and
other resources
 Each user contributes resources to network
 Practice examples
 Sharing browser cache over 100Mbps lines
 Disk mirroring using spare capacity
 Deep search beyond the web

10
Three Generations of P2P
 First generation is Napster music exchange
server
 Second generation: file sharing with greater
scalability, anonymity, and fault tolerance
 Freenet, Gnutella, Kazaa, BitTorrent
 Third generation characterized by emergence
of middleware for application independence
 Pastry, Tapestry, …

Page 11 B.Ramamurthy 2019-03-20


Napster
 The first large scale peer-to-peer network was Napster, set
up in 1999 to share digital music files over the Internet.
While Napster maintained centralized (and replicated)
indices, the music files were created and made available by
individuals, usually with music copied from CDs to
computer files. Music content owners sued Napster for
copyright violations and succeeded in shutting down the
service. Figure 10.2 documents the process of requesting a
music file from Napster.

12
Napster: peer-to-peer file sharing with a
centralized, replicated index
pee rs

Napster se rv er Napster se rv er
Index 1. Fi le location Index
request
3. Fi le request
2. List of peers
offering the file
5. Index update
4. Fi le delivered

Page 13 B.Ramamurthy 2019-03-20


Napster (contd.)
 Napster demonstrated the feasibility of building a
useful large-scale service which depends wholly
on individual computers.
 Music files are not updated; state of resource
stable
 No guarantees about availability; it fit well for
music files

Page 14 B.Ramamurthy 2019-03-20


Impact of Napster
 First application in which a demand for globally-
scalable information storage and retrieval
storage emerged: downloading music files
 Napster demonstrated the use of peer-peer
system in sharing files
 Several million users were registered at its peek
 Involved centralized indices.
 Shared resources at the edges of the internet.
 It was shutdown by legal issues related to
copyright law.

Page 15 B.Ramamurthy 2019-03-20


Middleware for Peer-to-Peer

 A key problem in Peer-to-Peer applications is to provide a


way for clients to access data resources efficiently. Similar
needs in client/server technology led to solutions like NFS.
However, NFS relies on pre-configuration and is not
scalable enough for peer-to-peer.
 Peer clients need to locate and communicate with any
available resource, even though resources may be widely
distributed and configuration may be dynamic, constantly
adding and removing resources and connections.

16
Peer-to-Peer Middleware
 Functional requirements:
 Enable clients to locate and communicate with any
individual resource
 Add and remove nodes
 Add and remove resources
 Simple API
 Non-functional requirements: global
scalability, load balancing, accommodating to
highly dynamic host availability, trust,
anonymity, deniability…
 Active research issue: routing overlay

Page 17 B.Ramamurthy 2019-03-20


Non-Functional Requirements for
Peer-to-Peer Middleware

 Global Scalability
 Load Balancing
 Local Optimization
 Adjusting to dynamic host availability
 Security of data
 Anonymity, deniability, and resistance to censorship
(in some applications)

18
Routing Overlays

 A routing overlay is a distributed algorithm for a


middleware layer responsible for routing requests from
any client to a host that holds the object to which the
request is addressed.
 Any node can access any object by routing each request
through a sequence of nodes, exploiting knowledge at
each of theme to locate the destination object.
 Global User IDs (GUID) also known as opaque identifiers
are used as names, but do not contain location
information.
 A client wishing to invoke an operation on an object
submits a request including the object’s GUID to the
routing overlay, which routes the request to a node at
19
Routing Overlay
 A distributed algorithm known as routing
overlay.
 Routing overlay locates nodes and objects. It
is middleware layer responsible for routing
requests from clients to hosts that holds the
object to which request is addressed.
 Main difference is that routing is implemented
is in application layer (besides the IP routing
at network layer)

Page 20 B.Ramamurthy 2019-03-20


Distribution of information in a routing
overlay
A’s routing knowledge D’s routing knowledge

A
D

Object:
B’s routing knowledge C’s routing knowledge
Node:
Page 21 B.Ramamurthy 2019-03-20
Routing Overlays
Basic programming interface for a distributed hash table (DHT) as implemented
by the PAST API over Pastry

put(GUID, data)
The data is stored in replicas at all nodes responsible for the object identified by
GUID.
remove(GUID)
Deletes all references to GUID and the associated data.
value = get(GUID)
The data associated with GUID is retrieved from one of the nodes responsible it.

The DHT layer take responsibility for choosing a location for data item, storing it
(with replicas to ensure availability) and providing access to it via get()
operation.
Routing Overlays
Basic programming interface for distributed object location and routing
(DOLR) as implemented by Tapestry

publish(GUID)
GUID can be computed from the object. This function makes the node performing a
publish operation the host for the object corresponding to GUID.
unpublish(GUID)
Makes the object corresponding to GUID inaccessible.
sendToObj(msg, GUID, [n])
Following the object-oriented paradigm, an invocation message is sent to an object in
order to access it. This might be a request to open a TCP connection for data transfer
or to return a message containing all or part of the object’s state. The final optional
parameter [n], if present, requests the delivery of the same message to n replicas of
the object.

Object can be stored anywhere and the DOLR layer is responsible for
maintaining a mapping between GUIDs and the addresses of the nodes at
which replicas of the objects are located.
Pastry

 All the nodes and objects that can be accessed through


Pastry are assigned 128-bit GUIDs.
 In a network with N participating nodes, the Pastry
routing algorithm will correctly route a message
addressed to any GUID in O(logN) steps.
 If the GUID identifies a node that is currently active, the
message is delivered to that node; otherwise, the
message is delivered to the active node whose GUID is
numerically closest to it (the closeness referred to here is
in an entirely artificial space- the space of GUIDs)
Pastry
 When new nodes join the overlay they obtain the data needed
to construct a routing table and other required state from
existing members in O(logN) messages, where N is the
number of hosts participating in the overlay.
 In the event of a node failure or departure, the remaning nodes
can detect its absence and cooperatively reconfigure to reflect
the required changes in the routing structure in a similar
number of messages.
 Each active node stores a leaf set- a vector L (of size 2l)
containing the GUIDs and IP addresses of the nodes whose
GUIDs are numerically closet on either side of its own (l above
and l below)
 The GUID space is treated as circular: GUID 0’s lower
neighbor is 2128-1
Pastry- Routing algorithm

 The full routing algorithm involves the use of a routing


table at each node to route messages efficiently, but for
the purposes of explanation, we describe the routing
algorithm in two stages:
 The first stage decribes a simplified form of the
algorithm which routes messages correctly but
inefficiently without a routing table
 The second stage describes the full routing algorithm
which routes a request to any node in O(logN)
messages.
Pastry- Routing algorithm

Stage 1:
 Any node A that recieves a message M with destination
address D routes the message by comparing D with its
own GUID A and with each of the GUIDs in its leaf set and
forwarding M to the node amongst them that is
numerically closet to D
 At each step M is forwarded to node that is closer to D
than the current node and that this process will eventually
deliver M to the active node closer to D
 Very inefficient, requiring ~N/2l hops to deliver a message
in a network with N nodes
Pastry- Routing algorithm

The diagram illustrates the


routing of a message from node
65A1FC to D46A1C using leaf
set information alone, assuming
leaf sets of size 8 (l=4)
Pastry- Routing algorithm
Stage 2:
 Each Pastry node maintains a routing table giving GUIDs and
IP addresses for a set of nodes spread throughout the entire
range of 2128 possible GUID values
 The routing table is structured as follows: GUIDs are viewed
as hexadecimal values and the table classifies GUIDs based
on their hexadecimal prefixes
 The table has as many rows as there are hexadecimal digits
in a GUID, so for the prototype Pastry system that we are
describing, there are 128/4 = 32 rows
 Any row n contains 15 entries – one for each possible value
of the nth hexadecimal digit excluding the value in the local
node’s GUID. Each entry in the table points to one of the
potentially many nodes whose GUIDs have the relevant
prefix
Pastry- Routing algorithm
Stage 2 (cont.):
p= GUID prefixes and corresponding nodehandles n

0 0 1 2 3 4 5 6 7 8 9 A B C D E F
n n n n n n n n n n n n n n n

1 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F
n n n n n n n n n n n n n n n

2 650 651 652 653 654 655 656 657 658 659 65A 65B 65C 65D 65E 65F
n n n n n n n n n n n n n n n

3 65A0 65A1 65A2 65A3 65A4 65A5 65A6 65A7 65A8 65A9 65AA 65AB 65AC 65AD 65AE 65AF
n n n n n n n n n n n n n n n

The routing table is located at the node whose GUID begins 65A1
Pastry- Routing algorithm
Stage 2 (cont.):
To handle a message M addressed to a node D (where R[p,i]
is the element at column i, row p of the routing table)
1. If (L-l < D < Ll) { //the destination is within the leaf set or is the current node
2. Forward M to the element Li of the leaf set with GUID closest to D or the current
node A
3. } else {
// use the routing table to despatch M to a node with the closer GUID
4. Find p (the length of the longest common prefix of D and A), and i (the (p+1)th
hexadecimal digit of D)
5. If (R[p,i]  null) forward M to R[p,i] //route M to a node with a longer common
prefix
6. else { //there is no entry in the routing table
7. Forward M to any node in L and R with a common prefix of length i, but a
GUID that is numerically closer.
8. }
9. }
Tapestry
 Tapestry is another peer-to-peer model similar to Pastry. It hides
a distributed hash table from applications behind a Distributed
object location and routing (DOLR) interface to make replicated
copies of objects more accessible by allowing multiple entries in
the routing structure.
 Identifiers are either NodeIds which refer to computers that
perform routing actions or GUIDs which refer to the objects.
 For any resource with GUID G, there is a unique root node with
GUID RG that is numerically closest to G.
 Hosts H holding replicas of G periodically invokde publish(G)
to ensure that newly arrived hosts become aware of the
existence of G. On each invocation, a publish message is routed
from the invoker towards node RG.
Tapestry
4377 (Root for 4378)
Tapestry routings
for 4377

43FE 437A
publish path
4228
4361
Location mapping 4378
Phil’s 4664
for 4378
Books
4B4F 4A6D
Routes actually
taken bysend(4378) E791 4378
57EC AA93 Phil’s
Books

Replicas of the file Phil’s Books (G=4378) are hosted at nodes 4228 and AA93. Node 4377 is the root node
for object 4378. The Tapestry routings shown are some of the entries in routing tables. The publish paths show
routes followed by the publish messages laying down cached location mappings for object 4378. The location
mappings are subsequently used to route messages sent to 4378.
Pastry & Tapestry Routing
 Prefix routing based on GUIDs
 Narrow search for next node along the route by
applying a binary mask that selects an increasing
number of hexadecimal digits from the destination
GUID after each hop.

Page 34 B.Ramamurthy 2019-03-20

Potrebbero piacerti anche