Sei sulla pagina 1di 35

Distributed Database Systems

Autumn, 2007

Chapter 12 – Part 3 of 3

Distributed DBMS
Reliability
Distributed Database Systems 1
12.1 Reliability Concepts And Measures
12.2 Failures And Fault Tolerance In
Distributed Systems
12.3 Failures In Distributed DBMS
12.4 Local Reliability Protocols
12.5 Distributed Reliability Protocols

Distributed Database Systems 2


Section 12.6

Dealing with Site Failures

Distributed Database Systems 3


Problem with 2PC
y 2PC is designed for dealing with system crashes.

Failed site can properly recover Operational site can properly


without consulting other sites. terminate properly without waiting
for the recovery of failed site.

Independent recovery and non-blocking protocols


exist only for single-site failures.
Distributed Database Systems 4
Problem with 2PC
y 2PC is inherently blocking !

Distributed Database Systems 5


Subsection 12.6.1

Termination and Recovery


Protocols of 2PC
State transition in 2PC protocol

Distributed Database Systems 7


Coordinator time-outs
y The coordinator can time-out in WAIT, ABORT,
and COMMIT states.
y WAIT
◦ The coordinator is waiting for the local
decisions from the participants.
◦ Solution: the coordinator decides to globally
abort the transaction by writing an abort
record in the log, and sending a global abort
to all participants

Distributed Database Systems 8


Coordinator time-outs
y COMMIT or ABORT
◦ The coordinator is not certain if the commit
or abort procedures have been completed by
the LRMs of all participants.
◦ Solution: resend the "global-commit" or
"global abort" to the site that have not
acknowledged.

Distributed Database Systems 9


Participant time-outs
y A participant can time-out in INITIAL or
READY states.
y INITIAL
◦ The participant is waiting for a “prepare” message.
◦ The coordinator must have failed in INITIAL state.
◦ Solution: the participant unilaterally aborts the
transaction. If the "prepare" message arrives later. It
can be responded by
x vote abort, or
x just ignoring the message. This causes the time-out of the
coordinator in the WAIT state (see the previous discussion
for this case).

Distributed Database Systems 10


Participant time-outs
y READY
◦ The participant must have "voted commit" and
therefore cannot change it and unilaterally abort it.
◦ Solution: blocked until it can learn (from the
coordinator or other participants) the ultimate fate
of the transaction.
y In centralized communication structure, a
participant has to ask the coordinator for its
decision. If the coordinator failed, the
participant will remain blocked.

Distributed Database Systems 11


Can blocking problem be overcome?

y No!
y 2PC is an inherently blocking protocol.

Distributed Database Systems 12


Analysis
Assumptions and definitions
y Assume participants can communicate
each other.
y Let Pi be the participant that time-outs in
the READY state, and Pj be the
participant to be asked.

Distributed Database Systems 14


All the cases that Pj can respond
1. Pj is in the INITIAL state. This means Tj has
not voted yet. Pj can unilaterally abort the
transaction and reply to Pi with a “vote-abort”
message.
2. Pj is in the READY state. Pj does not know the
global decision and cannot help.
3. Pj is in COMMIT or ABORT state. Pj can send
global "vote-commit" or global "vote-abort" to
Pi.

Distributed Database Systems 15


How Pi interprets these responses
1. Pi receives “vote-abort” from all Pj. Pi just
proceed to abort the transaction.
2. Pi receives "vote-abort" from some Pj, but
some other participants are in READY state. Pi
go ahead and abort the transaction.
3. Pi receives the information that all Pj are
READY. Pi is blocked, since it has no knowledge
about the global decision.

Distributed Database Systems 16


How Pi interprets these responses
4. Pi receives either “global-abort” or “global-
commit” messages from all Pj. Pi can go ahead
and terminate the transaction according to
the message.
5. Pi receives either “global-abort” or “global-
commit” messages from some Pj, but others
are in READY. Pi takes action same as (4).

These are all the alternatives that a termination


protocol needs to handle.

Distributed Database Systems 17


Recovery protocols
y The protocols that a failed coordinator or
participant can use to recover when they
restart.
y Assuming:
1. Writing log and sending messages are in an
atomic action,
2. The state transition occurs after message
sending.

Distributed Database Systems 18


Coordinator site failure
y The coordinator fails while in the INITIAL state.
◦ Action: restart the transaction.
y The coordinator fails while in the WAIT state.
◦ Action: restart the commit process by sending the
“prepare” message once more.
y The coordinator fails while in the COMMIT or
ABORT state.
◦ Action: If all ACK messages have been received, then
no action is needed; otherwise follow the termination
protocols.

Distributed Database Systems 19


Participant site failures
y A participant fails while in the INITIAL state.
◦ Action: Upon recovery the participant should abort
the transaction unilaterally.
y A participants fails while in the READY state.
◦ Action: Same as time-out in the READY state and
follow its termination protocols (ask for help).
y A participant fails while in the ABORT or
COMMIT state.
◦ Action: No action.

Distributed Database Systems 20


Additional cases
y The first assumption of recovery protocols is
relaxed, i.e. it is possible to fail after writing log
but before sending a message to a site.
y The coordinator fails after begin_commit is
written in the log but before the "prepare"
message is sent.
◦ Action: Same as a failure in the WAIT state, and send
the “prepare” message upon recovery.
y All other additional cases can be treated on the
basis on techniques discussed in this chapter.

Distributed Database Systems 21


Subsection 12.6.2

Three-Phase Commit Protocol


3PC – A non-blocking protocol
y A commit protocol that is synchronous
within one state transition is non-blocking
if and only if its state transition diagram
contains neither of following:
1. A state that is adjacent to both commit and
abort state;
2. A non-commitable state that is adjacent to
a commit state.

Distributed Database Systems 23


Action diagram
y COMMIT – commitable
state
y WAIT, READY – non-
commitable state
y Add a PRE-COMMIT
state between WAIT
and COMMIT for the
coordinator, and
between REDAY and
COMMIT for
participants.

Distributed Database Systems 24


State transitions

Distributed Database Systems 25


Termination protocol
y Coordinator time-out
1. In the WAIT state
x Same as in 2PC. The coordinator unilaterally aborts the
transaction and send a “global abort” message to all
participants that have voted to commit.
2. In the PRE-COMMIT state
x All participant must at least be in READY state (have voted
to commit).
x The coordinator globally commit the transaction and send
GC message to all operational participants.
3. In the COMMIT (or ABORT) state
x No action to take.

Distributed Database Systems 26


Termination protocol
y Participants time-out
1. In the INTIAL state
x Same as 2PC.
2. In the READY state
x The participant does not know the global decision.
x Elect a new coordinator and the new coordinator
terminates the transaction according to the termination
protocols to be discussed below.
3. In the PRE-COMMIT state
x The participant is waiting for the "global commit" message
from the coordinator.
x Solution: same as case 2.

Distributed Database Systems 27


For above case 2 and 3
y The new coordinator (elected from old participants)
may be in WAIT, PRE-COMMIT, or ABORT sate.
y If the new coordinator is in WAIT, it will globally abort
the transaction. The participants may be in
◦ INITIAL
◦ READY No problem for taking global or abort action
◦ ABORT
y PRE-COMMIT: add an edge from PRE-COMMIT to
ABORT

Distributed Database Systems 28


For above case 2 and 3
y If the new coordinator is in the PRE-COMMIT
state, the participants can be in PRE-COMMIT
or COMMIT (but no one can be in ABORT).
◦ Solution: globally commit the transaction and send a
GC message to all participants.
y If the new coordinator is in ABORT, all
participants have to move to abort.

Distributed Database Systems 29


Recovery Protocols
Only indicate the differences from those in 2PC.

y The coordinator fails while in WAIT


This causes participants time-out (see above discussion)
◦ Solution: the recovered coordinator asks around to determine
the fate of the transaction.
y The coordinator fails while in the PRE-COMMIT state.
This causes participants time-out in the PRE-COMMIT
state.
◦ Solution: ask around upon recovery.
y A participant fails while in the PRE-COMMIT state.
◦ Solution: ask other participants when recovered.

Distributed Database Systems 30


More about 3PC
y Advantages
◦ non-blocking
y Disadvantages
◦ Fewer independent recovery cases
◦ More messages

Distributed Database Systems 31


Section 12.7

Network Partitioning

Distributed Database Systems 32


Network Partitioning
y Simple partitioning
◦ The network is partitioned into two parts.
y Multiple partitioning
◦ More than two parts.
y In general, it is not possible to find a non-
blocking termination protocols in the presence
of network partitioning.
y It is possible to design an atomic non-blocking
protocols that are resilient to simple
partitioning.

Distributed Database Systems 33


Design decision
y Allow partitions to continue their
operations and compromise database
consistency, or
y Guarantee the consistency by permitting
one partition work, while the sites in
other partitions remain blocked.

Distributed Database Systems 34


The End of Chapter 12

Potrebbero piacerti anche