Sei sulla pagina 1di 11

Intro

Introduction:

o The approach followed in coordination based systems is the clean separation


between computation and co-ordination.
o The coordination part of a distributed system handles the communication and
cooperation between processes.
o The focus is on how coordination between theprocesses takes place.
Taxonomy of coordination models.

Taxonomy of coordination models in 2 dimensions (temporal and referential)

o Referential coupled, Temporal coupled: Direct coordination


– Know addresses and/or names of processes

– Know the form of the messages

– Must be up at the same time

o Referential coupled, Temporal Decoupled: Mailboxcoordination


– Like persistent message-oriented communication

– Know form of messages in advance

– Does not have to be up at the same time

o Referential Decoupled, Temporal coupled: Meetingoriented coordination


– Often implemented by events

– Publish/subscribe with no persistence on dataspace

– Meet at a certain time to coordinate

o Referential Decoupled, Temporal Decoupled: generativecommunication


– Independent processes make use of a shared persistentdataspace of tuples

– Dont agree on structure of tuples in advance


– Tuple tag distinguish between tuples of different info

– Publish/subscribe with persistence on dataspace

Architecture.
ARCHITECTURE:

o Data items are described by a series of attributes


o Subscription is passed to the middleware with a description of the data items that the
subscriber is interested in.
o Subscription description as (attribute, value) pairs with range of each attribute and
predicates like SQL.
o When data items are found the middleware sends a notification to the subscribers to read
or just send the data items directly (no storage if sent directly)
o Publishing processes publish events which are data items with 1 attribute.
o Matching data items against subscriptions should be efficient and scalable.

the principle of sharing data items between publishers and subscribers.

Traditional Architectures.

o Centralized client-server architecture.


o Adopted by many publish/subscribe systems like IBM WebSphere and Sun JMS.
o Generative communication models Like Sun Jini and JavaSpaces are based on central
servers
JavaSpaces.

o To read a tuple instance, a process provides a typed template tuple for matching.
o A field in the template tuple either contains a reference to an actual object or contains the
value NULL.
o Two fields match if they have a copy of the same reference or if the template tuple field is
NULL.
o A tuple instance matches a template tuple if they have the same fields.
o Read and Take (remove tuple after reading) block the caller.
o Some implementations return immediately if there is not matching tuple or a timeout can
be set.
o Centralized implementations makes complex matching rules easier and also can be used
for synchronization.
TIB/Rendezvous.

o Instead of central servers we can immediately send published tuples to subscribers using
multicasting.
o Data item is a message tagged with a compound keyword describing its content
(subject).
o Uses broadcast or uni cast if the subscribers addresses are known.
o Each host has a rendezvous daemon, which takes care that messages are sent and
delivered according to their subject.
o The daemon has a table of (process, subject), entries and whenever a message on
subject S arrives, the daemon checks in its table for local subscribers, and forwards the
message to each one.
o Can allow complex matching of published data items against subscriptions

the principle of publish/subscribe system as implemented in TIB/Rendezvous.


Peer-to-Peer Architectures.

o For scalability, restrictions on describing subscriptions and data items may be necessary.
o Keywords or (attribute, value) pairs are hashed to unique identifiers for published data,
which can be efficiently implemented in a DHT-based system.
o For more advanced matching rules we can use Gossip-Based Publish/Subscribe
Systems
Gossip-Based Publish/Subscribe Systems.

o ●A subscription S is a tuple of (attribute, value/range)pairs.


o Like CAN (Content Addressable network) make float from attributes and organize
subscriber nodes into 2dimentional array of floats to form groups.
o Cluster nodes into M different groups, such that nodes I and j belong to the same group if
and only if their subscriptions Si and Sj intersect.
o Each node maintains a list of references to other neighbours (partial view) to know the
intersecting subscriptions
Discussion

o Similar to gossip-based systems.


o each attribute a, is handled by a separate process Pi, which in turn partitions the range of
its attribute across multiple processes.
o When a data item d is published, it is forwarded to each Pi, where it is stored at the
process responsible for the d’s value of a.
Mobility and Coordination.

o To know if a mobile peer received a message is problematic.

o Two solutions are suggested:


1- That the receiving mobile process saves older messages to make sure it does not
receive duplicates
2- We devise routers to keep track of the mobile peers and know which messages they
received (harder toimplement).

Lime.

o In Lime, each mobile process has its own dataspace.


o When processes are connected, their dataspacesbecome shared.
o Formally, the processes should be member of the samegroup and use the same group
communication protocol.
o The local dataspaces of connected processes form atransiently shared dataspace to
allow exchange of tuples.
o To control how tuples are distributed, dataspaces can do”reactions”. A reaction specifies
an action to be executed when a tuple matching a given template is found in thelocal
dataspace.

Communication.
COMMUNICATION.

o In Every java-based system, all communication proceeds through remote method


invocations.
o In wide-area networks, the system should be self-organization or content-based
routing to ensure that the message reaches only to its intended subscribers.
Content-based routing.

o Routers can take routing decisions based on the message content so it it cuts of
routes that do not lead to receivers of this message.
o ● Clients can tell the servers which messages they are interested in so that the
servers notify them when they receive a relevant message. This is done in 2 layers
where layer 1 consists of a shared broadcast tree connecting the servers using
routers
o ● In simple subject-based publish/subscribe using a unique (non-compound)
keyword.
o ● We can send each published message to every server like in TIB/Rendezvous. Or
let every server broadcast its subscriptions to all other servers to be able to compile a
list of (subject, destination) pairs.
Naive content based routing

partially filled routing table


o Each server broadcasts its subscription across the network so that routers can
compose routing filters.
o ● When a node leaves the system, it should cancel its subscriptions and essentially
broadcast this information to all routers.
o ●Comparison of subscriptions and data items to be routed can be computationally
expensive.
Supporting Composite Subscriptions.
o ● When we use more sophisticated expressions of subscriptions then we need
another way not the simple content routing we have just used.
o ● Express compositions of subscriptions in which a process specifies in a single
subscription that itis interested in very different types of data items.
o ●Design routers analogous to rule databases where subscriptions are transformed
into rules stating the conditions under which published data should be forwarded.

Processes.
Processes.

o Nothing special we just need efficient mechanisms to be used to search in a large


collection of data.
o The main problem is devising schemes that workwell in distributed environments.

Naming
NAMING.

o every published data item has an associated


vector of 11 (attribute, value) pairs and that processes can subscribe to data items
by specifying predicates over these attribute values. In general, this naming
scheme can be readily applied, although systems differ with respect to attribute
types, values. and the predicates that can be used.
o A data item that has only one associated(attribute, value) pair, is referred to as an event.
Support for subscribing to events, and notably composite events largely dictates the
discussion on naming issues in
publish/subscribe systems.
Describing composite events:

o Some examples of composite events are,

Examples of events in distributed systems.


o The first two subscriptions are relatively easy. S 1 is an example that can be
handled by a primitive discrete event, whereas S2 is a simple composition of two
discrete events. Subscription S3 is more complex as it requires that the system can
also report time-related events. Matters are further complicated if subscriptions involve
aggregated values required for computing gradients (S4) or averages (S5).
o The basic idea behind an event-composition language for distributed systems
is to enable the formulation of subscriptions in terms of primitive events.

The finite state machine for subscription S3 from above fig.


Here, subscriptions can be translated into Finite state machines (FSM)

o These FSMs can often be decomposed into smaller FSMs that communicate
by passing events to each other. Note that such an event communication would
normally trigger a state transition at the FSM for which that event is intended. For
example, assume that we want to automatically tum off the lights in room R4.20 after 2
seconds when we are certain that nobody is there anymore (and the door is locked).
o

Two coupled FSM’S


Synchronization.
SYNCHRONIZATION:

Synchronization in coordination-based systems is generally restricted to systems supporting


generative communication. Matters are relatively straightforward
when only a single server is used. In that case, processes can be simply blocked
until tuples become available, but it is also simpler to remove them. Matters
become complicated when the shared dataspace is replicated and distributed across multiple
servers

Consistency and Replication.


Consistency and Replication.

o Replication plays a key role in the scalability of coordination-based systems,


and notably those for generative communication. The following are the standard
approaches explored in a number of systems such as JavaSpaces.
General Considerations to Static Approaches.

An efficient distributed implementation of a JavaSpace has to solve two problems:

1. How to simulate associative addressing without massive searching.


2. How to distribute tuple instances among machines and locate them later.

A JavaSpace can be replicated on all machines.


The dotted lines show the partitioning of the JavaSpace into subspaces. (a) Tuples
are broadcast on write.

A JavaSpace can be replicated on all machines.


The dotted lines show the partitioning of the JavaSpace into
subspaces. (b) reads are local, but the removing an instance when calling take must
be broadcast.
A JavaSpace can be replicated on all machines.
The dotted lines show the partitioning of the JavaSpace into subspaces. (a) Tuples are broadcast on
write.

A JavaSpace can be replicated on all machines.


The dotted lines show the partitioning of the JavaSpace into
subspaces. (b) reads are local, but the removing an instance when calling take must be broadcast.

DYNAMIC REPLICATION:

Replication in coordination-based systems has generally been restricted to static policies for
parallel applications. In commercial applications, relatively simple schemes in which entire
dataspaces or otherwise statically predefined parts of a data set are subject to a single policy are
seen (GigaSpaces, 2005). Inspired by the fine-grained replication of Web documents in Globule,
performance improvements can also be achieved when differentiating replication between the
different kinds of data stored in a dataspace. This differentiation is supported by GSpace.

GSPACE OVERVIEW:
Fault Tolerance
Fault Tolerance.

Examples of fault Tolerance in TIB/Rendezvous


.

Fault Tolerance in shared dataspace:

Security
Security.

Security in coordination-based systems poses a difficult problem. The processes should be


referentially decoupled and to achieve this, integrity and confidentiality of data must be ensured.

Confidentiality:

One important difference between many distributed systems and coordination-based ones is that
in order to provide efficiency, the middleware needs to inspect the content of published data.
Without being able to do so, the middleware can essentially only flood data to all potential
subscribers. This poses the problem
of information confidentiality which refers to the fact that it is sometimes important to disallow the
middleware to inspect published data. This problem can be circumvented through end-to-end
encryption.
Decoupling Publishers from Subscribers:

Use of a special ACcounting service(AS) is taken into existence in order to protect data and
subscriptions from the middleware where AS sits between clients( Publishers and subscribers)
and the actual publish/Subscribe middleware.

A publisher registers itself at any node of the publish/subscribe network, that is, aka broker. The
broker forwards the registration information to the accounting service which then generates a
public key to be used by the publisher, and which is signed by the AS. Of course, the AS keeps
the associated private key to itself.
When a subscriber registers, it provides an encryption key that is forwarded by the broker

Decoupling publishers from subscribers using an additional trusted service.

Secure Shared Dataspaces:

A common approach is to simply encrypt the fields of data items and let matching take place only
when decryption succeeds and content matches with a subscription.

One of the major problems with this approach is that keys may need to be shared between
publishers and subscribers, or that the decryption keys of the publishers should be known to
authorized subscribers.

Potrebbero piacerti anche