Introduction to Distributed Systems: Key Concepts and Applications

Introduction to Distributed
Systems
Curso de Sistemas Distribuidos

Sergio Arévalo e Isabel Muñoz
Departamento de Sistemas Informáticos
Universidad Politécnica de Madrid
Sept, 2019 Introduction. ETSISI-UPM 1

CONTENTS
1. Motivation
2. Distributed abstractions
3. Examples of distributed applications
4. Model
5. Cloud computing. A distributed system
architecture
Bibliography
• Introductionto Reliable Distributed
Programming. Rachid Gerraoui, Luis Rodrigues.
Springer-Verlag 2006. Chpts. 1 & 2.
1. Motivation
• Distributed computing has to do with algorithms

for a set of processes that cooperate.
• Besides some of the processes of the distributed
algorithm might stop by crashing while others
might stay alive and keep operating.
• This uncertainty differentiates a distributed
system from a concurrent system.

1. Motivation (cont.)
• The challenge is for the processes that are still

alive.
• The process cooperation must tolerate failures.
• Other uncertainties for process cooperation:
ü communication asynchrony (unknown delay)
ü and communication link failures.

• The most common process cooperation:

Ø client-server.
Client
Server
• Tolerating failures would mean that:
ü if the server fails, the client should do the
request to another server.
ü If some clients fail, the server should
Sept, 2019 continue offering services to other clients.
Introduction. ETSISI-UPM 5
• Other form of cooperation: multiparty

interaction or peer-to-peer interaction.
p1
(get file a) (put chunk 4 file a)
p2
p3
p4
p5
• Tolerating failures of this kind of interactions is

more complex.

• Distributed computing means that processes

might execute in different physical nodes.
• This implies two more uncertainties:
ü Processes might not share the same clock.
ü Processes might not share the same memory.

• Without the same clock: there is no time-order

of the distributed process events.
12:59 13:00
(Put Alice:bad photos) (Get Alice)
Photos server
13:01 13:02
(get Alice) (Put Bob:no-friend)
Friends server
Alice
Bob (friends, 13:01) (photos, 13:00)
The clock of server “photos” has a shift with

respect to the clock of server “friends”.
Sep. 2019 Departamento de Sistemas Informáticos UPM 8
• Without sharing memory: there is not instant

global state. For example: how many tokens are in
the servers? (m1 moves a token from S1 to S3)
(Get Global state) s1 (Get Global state) s1’
Server1
Server2 s2 s2’
s3 m1 m1’ s3’
Server3
Client
(s1,s2,s3) (s1’,s2’,s3’)
Time
State (s1,s2,s3) is not possible (instantaneous)
State (s1’,s2’,s3’) is possible but problems with
Sep. 2019
m1’ (two tokens instead one!).
Departamento de Sistemas Informáticos UPM 9
• To understand distributed system we need to

capture the properties/abstractions that help to
distinguish the fundamental from the accessory.
• We will abstract the underlying physical system:
basic abstractions
• Then we will show some recurring interaction
patterns in distributed applications: applications
abstractions.

• The two main basic abstractions are:

ü Processes: that abstract the active entities
that perform computations (computer,
processor, a thread of execution).
ü Links: that abstract the physical and logical
network that support communication among
processes.
• These abstraction descriptions must show how
they behave with respect to the passage of time
and to failure ocurrences.
• Typical application abstractions are:

ü Reliable and efficient communication. Because there
are failures and asynchrony periods, some
abstractions to get reliable and efficient links are
needed.
ü Logical clocks. Because there is no global clock, an
abstraction to time-order the distributed system is
needed.
ü Distributed global states. Because there is no global
state, an abstraction to obtain a consistent
distributed global state is needed.
• Typical application abstractions are (cont):

ü Multicast primitives: Because there is no reliable
and synchronous hardware broadcast mechanism, a
multicast abstraction, implementing different
quality of services, to communicate groups of
processes is needed.
ü Shared memory: Because there is no shared
physical memory among processes, an abstraction to
allow process to share memory is needed.


ü Consensus. Some application group of processes
need to reach a consensus on some value to advance
in their computation, an abstraction to get this
consensus is needed.
ü Failure detectors. The system asynchrony creates
uncertainties about the knowledege of process
failures, an abstraction to detect failures is
needed.


ü Atomic commitment. A group of processes need to
agree to execute some step only if all agree to do
it, otherwise the step is not done, an abstraction to
do this commitment is needed.
ü Leader election. A group of processes need to elect
among them a leader when a previous leader fails, an
abstraction to elect a leader is needed.

• Information disemination:
ü Processes may produce information, publishers
ü Processes may consume information, subscribers
ü Also called publish-subscribe paradigm
ü If several processes are interested in the same
notification a multicast primitive with reliable
delivery property is needed
ü An example is a social network application

• Process Control Applications:

ü Software processes most control the execution of a
physical activity.
ü They might control dynamic location of aircrafts,
temperature of nuclear installations, automation of car
production, …
ü Some of the processes have typically connected a
sensor.
ü To tolerate process failure a group of replicas may reach
a consensus on their input sensor values in order to
offer a reliable output value.
• Process Control Applications:
Sensor
Sensor
Actuator Consensus on input
Actuator
Comparator
Reference Reference
Processor
Control loop Processor replicas

Sept, 2019 Introduction.
Fault tolerant control loop
ETSISI-UPM 18
• Cooperative work:
ü Internet users may cooperate in building a common
software or document, or setting up a distributed
dialogue (virtual conference).
ü They can use an space abstraction with read and write
operations on it.
ü These abstractions can be a distributed shared memory,
or a distributed file service.
ü To maintain a consistent view of the shared space,
processes must to agree on the order of operations.

• Distributed Databases:
ü In distributed systems several transaction managers
might cooperate to service each transaction.
ü When a transaction ends a distributed atomic
commitment algorithm must be execute in order to
decide if the transaction must commit or abort.
ü A transaction manager might decide to abort the
transaction if it detects a violation of the database
integrity, a deadlock problem, a disk error, etc.

• Highly Available Services:

ü They are done using the state-machine replication
approach.
ü Several processes (replicas) execute the same code in
different nodes (independent probability of failure).
ü They receive the same inputs (messages) in the same
order with a total-ordered multicast.
ü All replicas execute the same states if they have the
same deterministic code.
ü If one replica fails nothing happens because the others
Sept, 2019 continue offering the service of the replicated service.
4. Model. Distributed Computation
• Processes are the units of computations.

• System can be static or dynamic on the set of
processes.
• Processes might know the process identifiers of
the system (known membership) or not (unknown
membership).
• Unless explicitly stated otherwise, it is assumed
that the set is static and the membership is
known.

• No assumption is made on the mapping of

processes to actual processors, processes or
threads.
• Processes communicate exchanging messages and
the messages are uniquely identified (proc_id,
sec_num).
• Messages are exchanged through communication
links.
• A distributed algorithm is a collection of
distributed automata, one per process.
• A process step consist in receiving (delivering) a

message (global event), executing a local
computation (local event) and sending a message
(global event).
• Only one process step in the distributed system
at the same time. (Virtual global scheduler)
• Some of the step events can be “nil” (nothing is
done).
• Unless specified otherwise we will consider
deterministic algorithms.
4. Model. Processes
• Unless it fails a process is supposed to execute

the algorithm assigned to it.
• The unit of failure is the process (atomic
component).
• When it fails, all its components fail as well at
the same time.
• Process abstraction differ according to the
nature of the failure that are considered.

4. Model. Processes (cont.)
• Failure modes of a process
Crashes
Omissions
Crashes & Recoveries
Arbitrary

• Arbitrary failure mode:

ü It happens when a process execute deviates
arbitrarily from the algorithm asigned to it.
ü It is the most general failure mode.
ü A process can process any output and at any
time.
ü They are also called byzantine failures and
malicious failures.
ü They are the most expensive to tolerate.
• Omissions failure mode:

ü It happens when a process does not send (or
receive) a message it is supposed to send (or
receive) according to the algorithm.
ü In general this faults are due to buffer
overflows or network congestion.
ü With omisions a process deviates from the
algorithm asigned due to messages lost.

• Crash failure mode:

ü It happens when a process stops executing
after some time t.
ü It is called a crash failure and it is said that
we have a crash-stop process abstraction.
ü It is typical to assume in algorithms to have up
to F failures. This means that during the
execution the number of real processes
crashes will be less or equal to F.

• Crash-recovery failure mode:

ü In this mode process can recover after crash.
ü Two options: to have stable storage or not.
ü With the crash all the volatile memory is lost
but not the stable storage. After the
recovery the stable storage can be read.
ü Processes: permanently up; eventually up;
eventually down; permanently up&down.

4. Model. Communication links (cont.)
• The link abstraction:

ü The link is used to represent the network
components of the distributed systems.
ü Unless otherwise stated every pair of
processes is connected by a bidirectional link,
providing a full connectivity among processes.
ü In practice, different topologies may be used
to implement this abstraction, possibly using
routing algorithms: a fully connected mesh, an
ethernet, a ring, the internet.
• The link abstraction (cont):

ü Some algorithms do not consider a fully
conected system.
ü In this case the algorithm should route the
messages by itself.
ü Messages are uniquely identified.

• Link failures:
ü Links can loss messages (ommision) and delay
messages (timming).
ü A process can retransmit messages if it loss
them.
ü Using Fair-loss links we can implement Reliable
links.

• Link failures:
ü The Fair-loss link properties are:
o Fair-loss: if a process p send infinitely
number of messages to process q, then q
will deliver infinitely number of messages, if
p and q don’t crash.
o Finite duplication: If p send to q a message
m a finite number of times, m cannot be
deliver an infinite number of times to q.
Sept, 2019
o No creation: If m is deliver then m was sent
4. Model. Timing assumptions
• Types of timing systems

ü The lack of a global clock and the
uncertainties in the communication delay
duration produces diferent types of timing
systems.
ü This timing systems are: asynchronous,
synchronous and partially synchronous.

• Asynchronous systems:
ü Processes: There is no upper bound on maximum
processing delays.
ü Communication links: There is no upper bound on
maximum message transmission delay
ü More realistic. Like internet.
ü Dificult or imposible to build algorithms: consensus,
atomic broadcast, membership service.

• Synchronous systems:
ü Processes: There is a known upper bound on maximum
processing delays.
ü Communication links: There is a known upper bound on
maximum message transmission delay.
ü Less realistic. Only real-time systems.
ü Easy to detect processes failures realiably.

• Partially Synchronous systems:
ü Processes: There is an upper bound on the maximum
processing delays but is unknown.
ü Communication links: There is an upper bound on the
maximum message transmission delay but is unknown.
ü It is realistic.
ü It is possible to detect processes failures unrealiably
with adaptative timeouts.
ü It is possible to implement consensus, atomic
broadcast, membership services.
4. Model. Communication paradigms
• Communication paradigms among entities:
ü Direct communication
ü Indirect communication
• Direct communication: one to one
ü Senders and receivers know each other
ü Messages have the identification of the receiver and
the sender
ü Both exist at the same time
ü Examples:
u Message passing (Socket API),
u request/reply protocols,
u RPC, RMI
4 Communication paradigms
• Indirect communication (typically one-to-many)

ü The communication involves a third entity
ü Senders and receivers may not know each
other => space uncoupling
ü Senders and receivers do not need to exist at
the same time => time uncoupling
ü Examples:
u GCS (multicast, membership, failure detection)
u Message queues
u Publish-subscribe systems
Sept, 2019
u Distributed Shared Memory
5. Cloud computing. A distributed
system architecture
• Cloud computing properties:
Ø Appearance of infinite computing resources on demand
Ø Elimination of resource commitment by Cloud users
Ø Ability to pay for use of computing resources on a
short term bases as needed
Ø Economies of scale due to very large data centers
Ø Higher resource utilization by multiplexing of
workloads from different users
Ø Simplify operation and increase utilization via
virtualization.
system architecture
• Example of a cloud architecture (Pepper, Yahoo!):
Ø Use of Big Data

technologies.
Ø Uses part of Apache
Hadoop Ecosystem.
Ø Zookeper, distributed
coordinator
Ø HDFS, Distributed File
System.
Ø Yarn, distributed task
manager.
system architecture

system architecture
Users
9
Zookeeper 10 11 Proxy
Manager
servers
1
8
Resource manager
Job manager 4 Hadoop
12
6 -YARN
2 Node managers
App manager
Task manager
5 7
Name node
HDFS
Data nodes
system architecture
• Interaction between cloud and Internet of things:
Ø This will be the typical architecture in the Master
Project.

Introduction to Distributed Systems: Key Concepts and Applications

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Introduction to Distributed Systems: Key Concepts and Applications

Caricato da

Copyright:

Formati disponibili

Introduction to Distributed

Curso de Sistemas Distribuidos

Sept, 2019 Introduction. ETSISI-UPM 1

• Distributed computing has to do with algorithms

Sept, 2019 Introduction. ETSISI-UPM 3

• The challenge is for the processes that are still

Sept, 2019 Introduction. ETSISI-UPM 4

• The most common process cooperation:

• Other form of cooperation: multiparty

• Tolerating failures of this kind of interactions is

Sept, 2019 Introduction. ETSISI-UPM 6

• Distributed computing means that processes

Sept, 2019 Introduction. ETSISI-UPM 7

• Without the same clock: there is no time-order

The clock of server “photos” has a shift with

• Without sharing memory: there is not instant

• To understand distributed system we need to

Sept, 2019 Introduction. ETSISI-UPM 10

• The two main basic abstractions are:

• Typical application abstractions are:

• Typical application abstractions are (cont):

Sept, 2019 Introduction. ETSISI-UPM 13

• Typical application abstractions are (cont):

Sept, 2019 Introduction. ETSISI-UPM 14

• Typical application abstractions are (cont):

Sept, 2019 Introduction. ETSISI-UPM 15

Sept, 2019 Introduction. ETSISI-UPM 16

• Process Control Applications:

• Process Control Applications:

Control loop Processor replicas

Sept, 2019 Introduction. ETSISI-UPM 19

Sept, 2019 Introduction. ETSISI-UPM 20

• Highly Available Services:

• Processes are the units of computations.

Sept, 2019 Introduction. ETSISI-UPM 22

• No assumption is made on the mapping of

• A process step consist in receiving (delivering) a

• Unless it fails a process is supposed to execute

Sept, 2019 Introduction. ETSISI-UPM 25

• Failure modes of a process

Crashes & Recoveries

Sept, 2019 Introduction. ETSISI-UPM 26

• Arbitrary failure mode:

• Omissions failure mode:

Sept, 2019 Introduction. ETSISI-UPM 28

• Crash failure mode:

Sept, 2019 Introduction. ETSISI-UPM 29

• Crash-recovery failure mode:

Sept, 2019 Introduction. ETSISI-UPM 30

• The link abstraction:

• The link abstraction (cont):

Sept, 2019 Introduction. ETSISI-UPM 32

Sept, 2019 Introduction. ETSISI-UPM 33

• Types of timing systems

Sept, 2019 Introduction. ETSISI-UPM 35

Sept, 2019 Introduction. ETSISI-UPM 36

Sept, 2019 Introduction. ETSISI-UPM 37

• Indirect communication (typically one-to-many)

Ø Use of Big Data

Sept, 2019 Introduction. ETSISI-UPM 43

Sept, 2019 Introduction. ETSISI-UPM 45

Potrebbero piacerti anche