Sei sulla pagina 1di 45

Introduction to Distributed

Systems

Curso de Sistemas Distribuidos


Sergio Arévalo e Isabel Muñoz
Departamento de Sistemas Informáticos
Universidad Politécnica de Madrid

Sept, 2019 Introduction. ETSISI-UPM 1


CONTENTS

1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model
5. Cloud computing. A distributed system
architecture
Bibliography
• Introductionto Reliable Distributed
Programming. Rachid Gerraoui, Luis Rodrigues.
Springer-Verlag 2006. Chpts. 1 & 2.
Sept, 2019 Introduction. ETSISI-UPM 2
1. Motivation

• Distributed computing has to do with algorithms


for a set of processes that cooperate.
• Besides some of the processes of the distributed
algorithm might stop by crashing while others
might stay alive and keep operating.
• This uncertainty differentiates a distributed
system from a concurrent system.

Sept, 2019 Introduction. ETSISI-UPM 3


1. Motivation (cont.)

• The challenge is for the processes that are still


alive.
• The process cooperation must tolerate failures.
• Other uncertainties for process cooperation:
ü communication asynchrony (unknown delay)
ü and communication link failures.

Sept, 2019 Introduction. ETSISI-UPM 4


1. Motivation (cont.)

• The most common process cooperation:


Ø client-server.
Client

Server
• Tolerating failures would mean that:
ü if the server fails, the client should do the
request to another server.
ü If some clients fail, the server should
Sept, 2019 continue offering services to other clients.
Introduction. ETSISI-UPM 5
1. Motivation (cont.)

• Other form of cooperation: multiparty


interaction or peer-to-peer interaction.

p1
(get file a) (put chunk 4 file a)
p2
(get file a) (put chunk 3 file a)
p3
(get file a) (put chunk 2 file a)
p4
(get file a) (put chunk 1 file a)
p5

• Tolerating failures of this kind of interactions is


more complex.

Sept, 2019 Introduction. ETSISI-UPM 6


1. Motivation (cont.)

• Distributed computing means that processes


might execute in different physical nodes.
• This implies two more uncertainties:
ü Processes might not share the same clock.
ü Processes might not share the same memory.

Sept, 2019 Introduction. ETSISI-UPM 7


1. Motivation (cont.)

• Without the same clock: there is no time-order


of the distributed process events.
12:59 13:00
(Put Alice:bad photos) (Get Alice)
Photos server
13:01 13:02
(get Alice) (Put Bob:no-friend)
Friends server
Alice
Bob (friends, 13:01) (photos, 13:00)

The clock of server “photos” has a shift with


respect to the clock of server “friends”.
Sep. 2019 Departamento de Sistemas Informáticos UPM 8
1. Motivation (cont.)

• Without sharing memory: there is not instant


global state. For example: how many tokens are in
the servers? (m1 moves a token from S1 to S3)
(Get Global state) s1 (Get Global state) s1’
Server1
Server2 s2 s2’
s3 m1 m1’ s3’
Server3
Client
(s1,s2,s3) (s1’,s2’,s3’)
Time
State (s1,s2,s3) is not possible (instantaneous)
State (s1’,s2’,s3’) is possible but problems with
Sep. 2019
m1’ (two tokens instead one!).
Departamento de Sistemas Informáticos UPM 9
2. Distributed abstractions

• To understand distributed system we need to


capture the properties/abstractions that help to
distinguish the fundamental from the accessory.
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patterns in distributed applications: applications
abstractions.

Sept, 2019 Introduction. ETSISI-UPM 10


2. Distributed abstractions

• The two main basic abstractions are:


ü Processes: that abstract the active entities
that perform computations (computer,
processor, a thread of execution).
ü Links: that abstract the physical and logical
network that support communication among
processes.
• These abstraction descriptions must show how
they behave with respect to the passage of time
and to failure ocurrences.
Sept, 2019 Introduction. ETSISI-UPM 11
2. Distributed abstractions

• Typical application abstractions are:


ü Reliable and efficient communication. Because there
are failures and asynchrony periods, some
abstractions to get reliable and efficient links are
needed.
ü Logical clocks. Because there is no global clock, an
abstraction to time-order the distributed system is
needed.
ü Distributed global states. Because there is no global
state, an abstraction to obtain a consistent
distributed global state is needed.
Sept, 2019 Introduction. ETSISI-UPM 12
2. Distributed abstractions

• Typical application abstractions are (cont):


ü Multicast primitives: Because there is no reliable
and synchronous hardware broadcast mechanism, a
multicast abstraction, implementing different
quality of services, to communicate groups of
processes is needed.
ü Shared memory: Because there is no shared
physical memory among processes, an abstraction to
allow process to share memory is needed.

Sept, 2019 Introduction. ETSISI-UPM 13


2. Distributed abstractions

• Typical application abstractions are (cont):


ü Consensus. Some application group of processes
need to reach a consensus on some value to advance
in their computation, an abstraction to get this
consensus is needed.
ü Failure detectors. The system asynchrony creates
uncertainties about the knowledege of process
failures, an abstraction to detect failures is
needed.

Sept, 2019 Introduction. ETSISI-UPM 14


2. Distributed abstractions

• Typical application abstractions are (cont):


ü Atomic commitment. A group of processes need to
agree to execute some step only if all agree to do
it, otherwise the step is not done, an abstraction to
do this commitment is needed.
ü Leader election. A group of processes need to elect
among them a leader when a previous leader fails, an
abstraction to elect a leader is needed.

Sept, 2019 Introduction. ETSISI-UPM 15


3. Examples of distributed applications

• Information disemination:
ü Processes may produce information, publishers
ü Processes may consume information, subscribers
ü Also called publish-subscribe paradigm
ü If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
ü An example is a social network application

Sept, 2019 Introduction. ETSISI-UPM 16


3. Examples of distributed applications

• Process Control Applications:


ü Software processes most control the execution of a
physical activity.
ü They might control dynamic location of aircrafts,
temperature of nuclear installations, automation of car
production, …
ü Some of the processes have typically connected a
sensor.
ü To tolerate process failure a group of replicas may reach
a consensus on their input sensor values in order to
offer a reliable output value.
Sept, 2019 Introduction. ETSISI-UPM 17
3. Examples of distributed applications

• Process Control Applications:

Sensor
Sensor
Actuator Consensus on input
Actuator
Comparator

Reference Reference
Processor

Control loop Processor replicas


Sept, 2019 Introduction.
Fault tolerant control loop
ETSISI-UPM 18
3. Examples of distributed applications

• Cooperative work:
ü Internet users may cooperate in building a common
software or document, or setting up a distributed
dialogue (virtual conference).
ü They can use an space abstraction with read and write
operations on it.
ü These abstractions can be a distributed shared memory,
or a distributed file service.
ü To maintain a consistent view of the shared space,
processes must to agree on the order of operations.

Sept, 2019 Introduction. ETSISI-UPM 19


3. Examples of distributed applications

• Distributed Databases:
ü In distributed systems several transaction managers
might cooperate to service each transaction.
ü When a transaction ends a distributed atomic
commitment algorithm must be execute in order to
decide if the transaction must commit or abort.
ü A transaction manager might decide to abort the
transaction if it detects a violation of the database
integrity, a deadlock problem, a disk error, etc.

Sept, 2019 Introduction. ETSISI-UPM 20


3. Examples of distributed applications

• Highly Available Services:


ü They are done using the state-machine replication
approach.
ü Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
ü They receive the same inputs (messages) in the same
order with a total-ordered multicast.
ü All replicas execute the same states if they have the
same deterministic code.
ü If one replica fails nothing happens because the others
Sept, 2019 continue offering the service of the replicated service.
Introduction. ETSISI-UPM 21
4. Model. Distributed Computation

• Processes are the units of computations.


• System can be static or dynamic on the set of
processes.
• Processes might know the process identifiers of
the system (known membership) or not (unknown
membership).
• Unless explicitly stated otherwise, it is assumed
that the set is static and the membership is
known.

Sept, 2019 Introduction. ETSISI-UPM 22


4. Model. Distributed Computation

• No assumption is made on the mapping of


processes to actual processors, processes or
threads.
• Processes communicate exchanging messages and
the messages are uniquely identified (proc_id,
sec_num).
• Messages are exchanged through communication
links.
• A distributed algorithm is a collection of
distributed automata, one per process.
Sept, 2019 Introduction. ETSISI-UPM 23
4. Model. Distributed Computation

• A process step consist in receiving (delivering) a


message (global event), executing a local
computation (local event) and sending a message
(global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
Sept, 2019 Introduction. ETSISI-UPM 24
4. Model. Processes

• Unless it fails a process is supposed to execute


the algorithm assigned to it.
• The unit of failure is the process (atomic
component).
• When it fails, all its components fail as well at
the same time.
• Process abstraction differ according to the
nature of the failure that are considered.

Sept, 2019 Introduction. ETSISI-UPM 25


4. Model. Processes (cont.)

• Failure modes of a process

Crashes

Omissions

Crashes & Recoveries

Arbitrary

Sept, 2019 Introduction. ETSISI-UPM 26


4. Model. Processes (cont.)

• Arbitrary failure mode:


ü It happens when a process execute deviates
arbitrarily from the algorithm asigned to it.
ü It is the most general failure mode.
ü A process can process any output and at any
time.
ü They are also called byzantine failures and
malicious failures.
ü They are the most expensive to tolerate.
Sept, 2019 Introduction. ETSISI-UPM 27
4. Model. Processes (cont.)

• Omissions failure mode:


ü It happens when a process does not send (or
receive) a message it is supposed to send (or
receive) according to the algorithm.
ü In general this faults are due to buffer
overflows or network congestion.
ü With omisions a process deviates from the
algorithm asigned due to messages lost.

Sept, 2019 Introduction. ETSISI-UPM 28


4. Model. Processes (cont.)

• Crash failure mode:


ü It happens when a process stops executing
after some time t.
ü It is called a crash failure and it is said that
we have a crash-stop process abstraction.
ü It is typical to assume in algorithms to have up
to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.

Sept, 2019 Introduction. ETSISI-UPM 29


4. Model. Processes (cont.)

• Crash-recovery failure mode:


ü In this mode process can recover after crash.
ü Two options: to have stable storage or not.
ü With the crash all the volatile memory is lost
but not the stable storage. After the
recovery the stable storage can be read.
ü Processes: permanently up; eventually up;
eventually down; permanently up&down.

Sept, 2019 Introduction. ETSISI-UPM 30


4. Model. Communication links (cont.)

• The link abstraction:


ü The link is used to represent the network
components of the distributed systems.
ü Unless otherwise stated every pair of
processes is connected by a bidirectional link,
providing a full connectivity among processes.
ü In practice, different topologies may be used
to implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
Sept, 2019 Introduction. ETSISI-UPM 31
4. Model. Communication links (cont.)

• The link abstraction (cont):


ü Some algorithms do not consider a fully
conected system.
ü In this case the algorithm should route the
messages by itself.
ü Messages are uniquely identified.

Sept, 2019 Introduction. ETSISI-UPM 32


4. Model. Communication links (cont.)

• Link failures:
ü Links can loss messages (ommision) and delay
messages (timming).
ü A process can retransmit messages if it loss
them.
ü Using Fair-loss links we can implement Reliable
links.

Sept, 2019 Introduction. ETSISI-UPM 33


4. Model. Communication links (cont.)

• Link failures:
ü The Fair-loss link properties are:
o Fair-loss: if a process p send infinitely
number of messages to process q, then q
will deliver infinitely number of messages, if
p and q don’t crash.
o Finite duplication: If p send to q a message
m a finite number of times, m cannot be
deliver an infinite number of times to q.

Sept, 2019
o No creation: If m is deliver then m was sent
Introduction. ETSISI-UPM 34
4. Model. Timing assumptions

• Types of timing systems


ü The lack of a global clock and the
uncertainties in the communication delay
duration produces diferent types of timing
systems.
ü This timing systems are: asynchronous,
synchronous and partially synchronous.

Sept, 2019 Introduction. ETSISI-UPM 35


4. Model. Timing assumptions
• Asynchronous systems:
ü Processes: There is no upper bound on maximum
processing delays.
ü Communication links: There is no upper bound on
maximum message transmission delay
ü More realistic. Like internet.
ü Dificult or imposible to build algorithms: consensus,
atomic broadcast, membership service.

Sept, 2019 Introduction. ETSISI-UPM 36


4. Model. Timing assumptions
• Synchronous systems:
ü Processes: There is a known upper bound on maximum
processing delays.
ü Communication links: There is a known upper bound on
maximum message transmission delay.
ü Less realistic. Only real-time systems.
ü Easy to detect processes failures realiably.

Sept, 2019 Introduction. ETSISI-UPM 37


4. Model. Timing assumptions
• Partially Synchronous systems:
ü Processes: There is an upper bound on the maximum
processing delays but is unknown.
ü Communication links: There is an upper bound on the
maximum message transmission delay but is unknown.
ü It is realistic.
ü It is possible to detect processes failures unrealiably
with adaptative timeouts.
ü It is possible to implement consensus, atomic
broadcast, membership services.
Sept, 2019 Introduction. ETSISI-UPM 38
4. Model. Communication paradigms
• Communication paradigms among entities:
ü Direct communication
ü Indirect communication
• Direct communication: one to one
ü Senders and receivers know each other
ü Messages have the identification of the receiver and
the sender
ü Both exist at the same time
ü Examples:
u Message passing (Socket API),
u request/reply protocols,
u RPC, RMI
Sept, 2019 Introduction. ETSISI-UPM 39
4 Communication paradigms

• Indirect communication (typically one-to-many)


ü The communication involves a third entity
ü Senders and receivers may not know each
other => space uncoupling
ü Senders and receivers do not need to exist at
the same time => time uncoupling
ü Examples:
u GCS (multicast, membership, failure detection)
u Message queues
u Publish-subscribe systems
Sept, 2019
u Distributed Shared Memory
Introduction. ETSISI-UPM 40
5. Cloud computing. A distributed
system architecture
• Cloud computing properties:
Ø Appearance of infinite computing resources on demand
Ø Elimination of resource commitment by Cloud users
Ø Ability to pay for use of computing resources on a
short term bases as needed
Ø Economies of scale due to very large data centers
Ø Higher resource utilization by multiplexing of
workloads from different users
Ø Simplify operation and increase utilization via
virtualization.
Sept, 2019 Introduction. ETSISI-UPM 41
5. Cloud computing. A distributed
system architecture
• Example of a cloud architecture (Pepper, Yahoo!):

Ø Use of Big Data


technologies.
Ø Uses part of Apache
Hadoop Ecosystem.
Ø Zookeper, distributed
coordinator
Ø HDFS, Distributed File
System.
Ø Yarn, distributed task
manager.
Sept, 2019 Introduction. ETSISI-UPM 42
5. Cloud computing. A distributed
system architecture

Sept, 2019 Introduction. ETSISI-UPM 43


5. Cloud computing. A distributed
system architecture
Users
9
Zookeeper 10 11 Proxy
Manager
servers
1
8
Resource manager
Job manager 4 Hadoop
12
6 -YARN
2 Node managers
App manager
Task manager
5 7
Name node
HDFS
Data nodes
Sept, 2019 Introduction. ETSISI-UPM 44
5. Cloud computing. A distributed
system architecture
• Interaction between cloud and Internet of things:
Ø This will be the typical architecture in the Master
Project.

Sept, 2019 Introduction. ETSISI-UPM 45

Potrebbero piacerti anche