Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1, JANUARY 1985 67
ing for local computer networks," Commun. ACM, vol. 19, pp. Mustaque Ahamad received the B.E.(Hons.) degree in electTical engi-
395-404, July 1976. neering from Birla Institute of Technology and Science, Pilani, India,
[24] D. Menasce and R. Muntz, "Locking and deadlock detection in in 1981.
distributed databases," IEEE Trans. Software Eng., vol. SE-5, He is currently working toward the Ph.D. degree in computer science
pp. 195-202, May 1979. at the State University of New York at Stony Brook. His research in-
[251 J. McQuillan and D. Walden, "The ARPA network design de- terests include distributed programming languages, operating systems,
cisions," Comput. Networks, vol. 1, pp. 243-289, Aug. 1977. network protocols, and distributed algorithms.
[26] B. Nelson, "Remote procedure call," Dep. Comput. Sci., Carnegie-
Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-CS-81-119, May
1981.
[271 R. Thomas, "A solution to the concurrency control problem for
multiple copy databases," in Proc. IEEE Compon '78, 1978, pp.
56-62.
[281 H. Sturgis, J. Mitchell, and J. Israel, "Issues in the design and use
of a distributed file system," Oper. Syst. Rev., vol. 14, pp. 55-69, Arthur J. Bernstein (S'56-M'63-SM'78-F'81) received the Ph.D. degree
July 1980. from Columbia University, New York, NY.
[29] R. Smith, "The contract net protocol," in Proc. 1st Conf Dis- He is on the faculty of the Computer Science Department at the State
tributed Computing Systems, 1979, pp. 185-191. University of New York at Stony Brook. His current research interests
[30] R. Strom and S. Yemini, "NIL: An integrated language and sys- are in the area of distributed algorithms, concurrent programming, and
tem for distributed programming," in Proc. SIGPLAN '83 Symp. networks.
Programming Language Issues in Software Systems, 1983, pp. Dr. Bernstein was a member of the IEEE Distinguished Visitors
73-82. Program.
to the timeout technique, a deadlock detection scheme aborts explicitly [3], [13], [14]. In comparison to the algorithm
a transaction only when the transaction is involved in a dead- of Chandy and Misra [3], our algorithm has the following
lock. Most deadlock detection schemes [81, [9], [12], [15] advantages.
detect deadlocks by finding cycles in a transaction wait-for 1) In our scheme, a deadlock computation is initiated only
graph, in which each node represents a transaction, and a di- when an antagonistic conflict occurs. In contrast, in their
rected edge from one transaction to another indicates that the scheme, a computation is initiated whenever a transaction be-
former is waiting for a data item locked by the latter transac- gins to wait for another. Hence, our algorithm generates a
tion. In a distributed database system, the problem is, in es- fewer number of messages to detect a deadlock.
sence, of finding cycles in a distributed graph where no single 2) In our scheme, there is no separate phase for deadlock
site knows the entire graph. resolution.
The deadlock detection scheme presented in this paper does Our scheme has some similarities (e.g., initiation of deadlock
not construct any transaction wait-for graph, but follows the computation only when an antagonistic conflict occurs) with
edges of the graph to search for a cycle (called an edge-chasing the algorithm proposed by Moss [13]. However, in comparison
algorithm by Moss [131). It is assumed that each transaction to his scheme, our algorithm has the following advantages.
is assigned a priority in such a way that priorities of all transac- 1) In Moss' scheme, a transaction does not maintain any in-
tions are totally ordered. When a transaction waits for a data formation regarding transactions that wait for it, directly or in-
item locked by a lower priority transaction, we say that an directly. Hence, his scheme requires transactions to initiate
antagonistic conflict has occurred. When an antagonistic con- deadlock detection computations periodically. Thus, his
flict occurs for a data item, the waiting transaction initiates a scheme would, in general, require more messages and it is not
message to find cycles of transactions, in which each transac- possible to compute the exact number of messages generated
tion is waiting for a data item locked by the next. If the mes- before a deadlock is detected.
sage comes back to the initiating transaction, a deadlock cycle 2) In our scheme, a transaction continues to retain the above
is detected. information even after the resolution of a deadlock, and this in
Our algorithm presumes a point-to-point network with a re- turn speeds up detection and resolution of future deadlocks.
liable message communication facility, and it is not applicable 3) Our algorithm is less prone to detect phantom deadlocks
for detecting communication deadlocks [4], [141. that may involve nested transactions than Moss' scheme. In
The distinguishing features of the proposed deadlock detec- our scheme, a detected deadlock is made phantom only when
tion scheme are as follows. a waiting transaction aborts, either explicitly or implicitly. In
1) For a given deadlock cycle, it is possible to compute the contrast, in Moss' scheme, sometimes a detected deadlock is
exact number of messages that have been generated for the pur- made phantom even when an active transaction aborts, say due
pose of deadlock detection. If the number of messages gener- to some application considerations. We discuss this further in
ated is used as a complexity measure, the proposed algorithm Section VI-C
is not inferior to any of the other algorithms reported in the 4) In our scheme all messages have an identical short length
literature. whereas Moss' scheme has messages of varying lengths.
2) When a deadlock is detected, the detector has informa- In the following section, we introduce a distributed database
tion about the highest and the lowest priority transactions of model in order to set the context, and in Section III we de-
the cycle, and this can be used for deadlock resolution. Thus, scribe the basic distributed deadlock detection algorithm. We
resolution does not need any new computation. analyze the cost of the algorithm in Section IV. The basic al-
3) In the absence of failures (site failures or explicit abort of gorithm is applicable when only exclusive locks are used. How-
a waiting transaction by the user), it does not detect any phan- ever, it has been reported in the literature [9] that 80 percent
tom deadlock. of access is only for reading data. Taking this into account, we
4) Even after a transaction is aborted to resolve a deadlock, show in Section V how the basic algorithm can be modified to
other member transactions of the cycle continue to retain in- include share locks as well as simultaneous acquisition of mul-
formation about the remaining transactions. This, in turn, tiple locks. In Section VI, we describe a nested transaction
helps to detect, with fewer number of messages, deadlocks in model and extend the algorithm to detect and resolve dead-
which the remaining transactions (or any subset of them) may locks taking into account nested transactions. We conclude
get involved in the future. the paper with suggestions for further improving the algorithm.
5) The resolution scheme adopted guarantees progress of
computation, and avoids the problem of cyclic restart. II. THE DISTRIBUTED DATABASE MODEL
6) The basic algorithm can be easily extended to a locking A database is a structured collection of information. In a
scheme that provides both share locks and exclusive locks, and distributed database system, the information is spread across a
the scheme in which a transaction can acquire several locks collection of nodes (or sites) interconnected through a com-
simultaneously. munication network. Each node has a system-wide, unique
7) It can also be extended to detect and resolve deadlocks identifi'er, called the site-identification-number (site id, in
which may occur in an environment where transactions can be short), and nodes communicate through messages.
nested within other transactions. All messages sent arrive at their destinations in finite time,
In the literature, several authors have proposed algorithms for and the network filters duplicate messages and guarantees that
deadlock detection in which wait-for graph is not constructed messages are error-free. The site-to-site communication is
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 69
pipelined, i.e., the receiving site gets messages in the same synchronization scheme [2] which uses timestamps to sched-
order that the sending site has transmitted them. ule lock requests of transactions (and in turn, prevents dead-
Within a node, there are several processes and data items (or locks), here timestamps are used only to assign priorities to
objects). A process is an autonomous active entity that is transactions.
scheduled for execution. Every process has a system-wide For generating timestamps, we assume that every node has a
unique name, called process-id, and processes communicate logical clock (or counter) that is monotonically increasing, and
with each other through messages. To access one or more data the various clocks are loosely synchronized [111. A timestamp
items, which may be distributed over several nodes, a user generated by a node i is a pair (C, i) where C is the current
creates a transaction process at the local node. The transac- value of the local clock and i is the site-id of the node i.
tion process coordinates actions on all data items participating Greater than (>) and less than (<) relations for timestamps
in the transaction and preserves the consistency of the data- are defined as follows.
base. Henceforth, we use the term transaction to denote the Let t, = (Cl, il) and t2 = (C2, i2) be two timestamps. Then
corresponding transaction process. t1 >t2 iffCl >C2or(Cl =C2andil >i2);
Data items are passive entities that represent some indepen- t1 < t2 iff Cl <C2 or (Cl = C2 and il < i2).
dently accessible piece of information. Each data item is main-
tained by a data manager which has the exclusive right to oper- Each transaction is denoted by an ordered pair of the form
ate on a data item. If a transaction wants to operate on a data (p, t)where p is the process-id of the corresponding transaction
item, it must send a request to the data manager that manages process, and t is the timestamp of the transaction. The pro-
the data item. A data manager can maintain several data items cess-id is used for communication purposes.
simultaneously. However, to simplify the exposition, we shall If two transactions T1 and T2 are denoted by the pairs
assume that a data- manager maintains only one data item. (Pi, t1) and (P2, t2), respectively, we say that Tl > T2, i.e.,
In addition to data manipulation operations, a data manager priority of T1 is higher than that of T2, if t, < t2-
provides two primitives to control access to the data item that Further, we say that there is an antagonistic conflict at a data
it maintains: Lock(data_item) and Un_Lock(data_item). A item if the item is locked, and there is a requester of higher
transaction must lock a data item before accessing it, and it priority than the holder. In such a case, we also say that the
must unlock the data item when it no longer needs to access it. requesterfaces the antagonistic conflict.
A data item can be in one of two lock modes, null or free (N,
i.e., absence of a lock), and exclusive (X, i.e., presence of a III. DISTRUBUTED DEADLOCK DETECTION
lock). A data manager honors the lock request of a transaction AND RESOLUTION
if the data item is free; otherwise it keeps the lock request In this algorithm, a deadlock is detected by circulating a
pending in a queue, called request_Q. Atransactionwhichhas message, called probe, through the deadlock cycle. The occur-
locked the data item is called the holder of the data, whereas a rence of an antagonistic conflict at a data site triggers initia-
transaction which is waiting in the request_Q is called a re- tion of a probe. A probe is an ordered pair (initiator, junior),
quester of the data item. When a holder unlocks the data item, where initiator denotes the requester which faced the antago-
the data manager chooses a lock request from the request_Q, nistic conflict, triggering the deadlock detection computation,
and grants the lock to that requester. The scheduling scheme and initiating this probe. The element junior denotes the
followed by the data manager does not guarantee avoidance of transaction whose priority is the least among transactions
deadlocks [5], e.g., it may follow an arrival order scheduling through which the probe has traversed.
scheme. A data manager sends a probe only to the holder of its data
Transactions can be in one of two states: active or wait. If a while a transaction process sends a probe only to the data
transaction waits in a request_Q of a data manager, it is in manager from which it is waiting to receive the lock grant.
wait state, otherwise it is an active state. An active transaction Transaction processes (or data managers) never communicate
process may or may not be running on a processor. The state among themselves for purposes of deadlock detection.
of a transaction changes from active to wait when its lock re-
quest for a data item is queued by the data manager in its re- A. The Basic Deadlock Detection Algorithm
quest-Q. The state of the transaction changes from wait to
active when the data manager schedules its pending lock re- The basic deadlock detection algorithm has three steps.
quest. In either case, the manager informs the transaction of 1) A data manager initiates a probe in the following two
its change of state. We assume that a transaction acquires locks situations.
one after another (i.e., at any time it has only one outstanding a) When the data item is locked by a transaction, if a lock
lock request), and it follows the two-phase lock protocol [7]. request arrives from another transaction, and requester >
Each transaction is assigned a priority in such a way that pri- holder, the data manager initiates a probe and sends it to the
orities of all transactions are totally ordered. To assign prior- holder.
ities to transactions, we use the timestamp mechanism. When b) When a holder releases the data item, the data manager
a transaction is initiated, it is assigned a unique timestamp. schedules a waiting lock request. If there are more lock re-
Timestamps induce priorities in the following manner: a trans- quests still in the request_Q, then for each lock request for
action is of higher priority than another if the timestamp of which requester> new holder, the data manager initiates a
the former is less than that of the latter. Unlike the timestamp probe and sends it to the new holder.
70 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 1, JANUARY 1985
When a data manager initiates a probe it sets (victim, initiator). Since victim is aborted, it is necessary to
discard those probes (from probe-Qs of various transactions)
initiator: = requester; that have victim as their juinor or initiator. Hence, on receiv-
junior := holder; ing an abort-signal, the victim does the following.
We shall presently assume that a data manager sends a probe a) It initiates a message, clean(victim, initiator), sends it
as soon as the above situations occur. However, as we shall to the data manager where it is waiting, and enters the abort
elaborate in Section VII, in order to improve performance, a phase. Since initiator is the highest priority transaction of
data manager can wait for a while before sending a probe. the deadlock cycle, its probe_Q will never contain any probe
2) Each transaction maintains a queue, called probe_Q, generated by other members of the cycle. Consequently,
where it stores all probes received by it. The probe_Q of a probe_Qs of transactions, from initiator to victim in the direc-
transaction contains information about the transactions which tion of probe traversal, will not contain a probe having victim
wait for it, directly or transitively. Since we have assumed that either as junior or as initiator. And hence, the clean message
a transaction follows the two phase lock protocol, the informa- carries the identity of initiator beyond which it need not
tion contained in the probe_Q of a transaction remains valid traverse.
until it aborts or commits. b) In abort phase, the victim releases all locks it held,
After a transaction enters the second phase of the two phase withdraws its pending lock request, and aborts. During this
lock protocol, it can never get involved in a deadlock. Hence, phase, it discards any probe or clean message that it receives.
when it enters the second phase, it discards the probe-Q. Dur- 2) When a data manager receives clean(victim, initiator)
ing the second phase, any probe or clean message (discussed message, it propagates the message to its holder.
later in this section) received is ignored. 3) When a transaction T receives clean(victim, initiator)
A transaction sends a probe to the data manager, where it is message, it acts as follows.
waiting in the following two cases.
a) When a transaction T receives probe(initiator, junior), purge from the probe_Q every probe that has victim as its
it performs the following. junior or initiator;
if Tis in wait state
if junior > T then if T = initiator
then junior := T; then discard the clean message
save the probe in the probe-Q; else propagate the clean message to the data manager
if T is in wait state where it is waiting
then transmit a copy of the saved probe to the data manager else discard the clean message;
where it is waiting;
A transaction discards a clean message in the following two
b) When a transaction issues a lock request to a data man- situations: 1) the transaction is in active state or, 2) the trans-
ager and waits for the lock to be granted (i.e., it goes from ac- action is same as the initiator of the clean message received.
tive to wait state), it transmits a copy of each probe stored in After "cleaning" up its probe_Q as described above, each
its probe_Q to that data manager. member transaction of the deadlock cycle continues to retain
3) When a data manager receives probe(initiator, junior) the remaining probes in its probeQ. In the future, if the re-
from one of its requesters, it performs the following. maining members (or any subset of them) get involved in a
deadlock cycle, it will be detected with fewer number of mes-
if holder > initiator sages, since probes have already traversed some edges of the
then discard the probe cycle.
else if holder < initiator
then propagate the probe to the holder IV. THE COST OF DEADLOCK DETECTION
else declare deadlock and initiate deadlock resolution; To compare our algorithm to other deadlock detection and
When a deadlock is detected, the detecting data manager has resolution algorithms, we consider three factors which deter-
the identities of two members of the deadlock cycle, initiator mine the cost of any deadlock detection algorithm:
and junior, i.e., the highest and the lowest priority transac- 1) Communication Cost: the number of messages that must
tions, respectively. In order to guarantee progress, we choose be exchanged to detect a deadlock;
to abort junior, i.e., the lowest priority transaction (hereafter 2) Delay: the time needed to detect a deadlock once the
called victim). When victim restarts, its priority does not deadlock cycle is formed (presuming that every message ex-
change, i.e., it uses the same timestamp that was assigned to it change, whether it is an intersite communication or an intra-
when it was initiated. site communication, takes equal time); and
3) Storage Cost: the amount of storage needed by transac-
B. The Deadlock Resolution and Post-Resolution tions and data managers specifically for purposes of deadlock
Computation detection and resolution.
This consists of the following three steps. In our scheme, the communication and the delay costs of de-
1) To abort the victim, the data manager that detects the tecting a deadlock depends on the configuration of a deadlock
deadlock sends an abort signal to the victim. The identity cycle. The configuration indicates which transaction waits for
of the initiator is also sent along with the abort signal: abort which other transaction. We describe a configuration using a
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTIONI ALGORITHM 71
Ti T. TN TN-l
O@ b
TN-2 T5 T4
O- bi
T3
*j
T2
.bj
Ti
Ob,- ob O
Obj2
Fig. 1. An edge of a TWFG.
data manager D, will initiate a deadlock detection computa- Fig. 3. Deadlock cycle: intermediate configuration.
tion by initiating probe(T1, T1), and sending it to the transac-
tion TI. T2 T3 T4 TN-3 TN-2 TN-1 TN Ti
A data item can have many requesters but only one holder, \Ni_lOi-1 Obj4 . ..Obj3 Obj2 Ob'l
respectively. For N= 2, the maximum and the minimum are rently in the system is N, then the length of a probe_Q can
identical, namely 2. grow at most up to (N - 1).
B. The Delay D. Costwise Comparison to Other Algorithms
The delay is defined to be the time taken to detect the dead- In comparison to the algorithm of Chandy and Mishra [3],
lock after the deadlock cycle is formed. Note that irrespective our algorithm has less communication cost since it initiates a
of the configuration of a deadlock cycle of length N (best, deadlock computation only upon the occurrence of antagonistic
worst, or any intermediate), the maximum amount of delay is conflicts, but not otherwise. Furthermore, the resolution of
the time taken to exchange 2 * (N - 1) messages. The delay is deadlock does not involve any extra cost.
maximum if the highest priority transaction of the cycle is the Unlike Moss' algorithm [13], we have separated the cost of
last transaction to enter the wait state, closing the deadlock reliable network communication from that of deadlock detec-
cycle. If a transaction other than the highest priority transac- tion. Incorporation of this distinction in our algorithm enables
tion is the last to enter the wait state, the delay is less. This us to compute exact communication and delay costs of dead-
is because the probe initiated by the highest priority transac- lock detection, for a given configuration.
tion would have traversed part of the cycle before the cycle is In the distributed database model considered by Obermarck
formed. [15], transactions migrate from one data site to another, and
Suppose, in the configuration shown in Fig. 2 (prior to the for- there is a deadlock detector at each site which builds a transac-
mation of a deadlock cycle), all edges except the edge TJ+ 1-TJ tion wait-for graph for that site (by extracting information
(where I < J <N - 1) are formed, i.e., TJ+ I is still active. from lock tables, and other resource allocation tables and
When TJ+ 1 requests for a lock on data item Objj held by TJ, queues). In computing the communication cost to detect a
it enters the wait state closing the deadlock cycle. deadlock cycle (which is N * (N - 1)/2 exchange of messages,
Case 1: If probe(T1, TN), initiated due to the antagonistic in worst case, among deadlock detectors), he does not include
conflict T1 TN, has reached the transaction TJ+1 before it the expenses of transaction migration and construction of a
entered the wait state, the delay to detect the deadlock will be TWFG by deadlock detectors in terms of messages. In con-
equal to the time taken to exchange (2 * J - 1) messages. trast, in our model; the transmission of information from a
Case 2: If probe(TI, TN) is yet to reach the transaction TN, transaction to a data manager and from a data manager to a
i.e., transactions T, and TJ+ 1 entered the wait state in a quick transaction cost one message each. If the above two expenses
succession (closing the deadlock cycle), and the time gap was are also included in terms of messages, the communication cost
too small compared to the time taken to exchange one mes- for his algorithm will become equal to that of ours.
sage. In this case, the delay to detect the deadlock will be
equal to the time taken to exchange 2 * (N - 1) messages. V. EXTENSIONS TO THE DEADLOCK
Hence, if a deadlock cycle is closed by transaction Tj+ 1, then DETECTION ALGORITHM
the time taken to detect the deadlock will be any where be- In this section, we extend the algorithm to take care of two
tween (2 * J - 1) to 2 * (N - 1), for J = I (N- 1). refinements:
For the configuration given in Fig. 2, the delay will be mini- 1) availability of share lock (S_lock) mode as well, and
mum (i.e., the time taken to exchange one message) if 1) the 2) allowing a transaction to acquire locks on more than one
cycle is closed by transaction T2 by waiting for Tl, the highest data item simultaneously, either in share mode or in exclusive
priority transaction of the cycle, and 2) the probe initiated mode.
due to the .antagonistic conflict T1_ TN must have reached T2
before the latter entered the wait phase. A. Share and Exclusive Locks
From this result we can generalize that for any configuration The Distributed Database Model with Share and Exclusive
the minimum time taken to detect a deadlock is the time taken Locks: We extend the basic model, discussed in Section II, by
for exchange of one message, and this can happen only when distinguishing a share lock (S_lock) request from an exclusive
1) the cycle is closed by a transaction waiting for the highestlock (X_lock) request. Correspondingly, a locked data item
priority transaction of the configuration, and 2) the probe can be either in S_mode or in X_mode. The desired lock mode
initiated by the highest priority transaction had reached the is specified as a parameter of the lock request primitive: Lock-
cycle-closing transaction before the latter entered the wait (data-item, mode). In order to distinguish between the two
phase. kinds of lock requests, a data manager splits its request_Q into
Srequest_ Q and Xrequest_ Q, for storing pending S_lock and
C. The Storage Cost X_lock requests, respectively.
In this algorithm, each transaction requires storage space to If a data item is free, a transaction can lock it in any mode.
maintain its probe_Q, and a probe_Q exists until the transac- When a transaction has locked a data item in X_mode, and be-
tion enters the second phase of the two phase lock protocol. come the X_holder, no other transaction can lock the data item
The size of a probe_Q depends upon the number higher of in any mode. A transaction can lock a data item in S_mode,
priority transactions which wait for it directly transitively.
or and become an S_holder even if the item is already locked in
A probe_Q shrinks only when the transaction receives a clean S_mode. Thus, a data item in S_mode can have several S_
message, but not otherwise. holders whereas it can have only one X_holder. When the
If maximum number of transactions that can
the run concur- X_holder releases the lock, if the data manager decides to
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 73
can be granted if there is no S_holder or X_holder for the Deadlock Detection: We now extend the deadlock detection
item, and either algorithm described in Section V-A, to take into account
a) there is no S-retainer or X_retainer for the item, or nested transactions.
b) each S-retainer (X_retainer) is either T, or an ances- The probe_Q of a data manager is split into S_probe_Q
tor of T. and Xprobe_Q: the former stores the probes received from
For example, suppose in the transaction tree of Fig. 7, F re- S_requesters, and the latter stores the probes received from
quests an S_lock for a data item for which E is an X_holder. X_requesters. A transaction has only one probe_Q.
The S_lock can be granted to F only when either there is no 1) If a data manager cannot grant a lock requested by a
X_retainer or X_holder for the item, or A becomes the only transaction, it acts as follows.
X_retainer for the item, i.e., when C, D, and E commit or
abort. if the lock request of a transaction, T, cannot be honored
When an S_holder releases the lock and if it introduces an then begin
S_retainer to the data item, it may result in simultaneous for each X_retainer and the X_holder (f any), Tx,
scheduling of a descendant X_requester (if any). Similarly, do
when an X_holder releases the lock and if it introduces an if Tx < T
X_retainer to the data item, it may result in simultaneous then initiate probe(T, Tx) and send it to Tx;
scheduling of a descendant X_requester (if any), or one or if X_lock requested
more descendant S_requesters (if any). then for each S_retainer and each S_holder, Ts,
do
B. Nested Transactions and Deadlock Detection if Ts< T
and Resolution then initiate probe(T, T.) and send it to Ts
We shall now discuss the scheme for detecting deadlocks that end;
can arise in the nested transaction model described above. The
Note that in no case will a transaction send a probe to its an-
basic detection algorithm needs to be modified, in order to cestor since an ancestor always has higher priority.
take into account the fact that a transaction now waits for its
descendants to commit/abort. As in the basic algorithm, we 2) When a transaction begins to wait for a data item, or for
its children to commit/abort, it transmits each probe in its
shall use priorities for transactions in order to determine when
to initiate a deadlock computation, as well as for deadlock probe_Q to the data manager, or to its children.
resolution. Timestamps induce priorities among transactions 3) When a transaction T receives a probe P, it performs the
as described earlier. However, the scheme for assigning time- following.
stamps needs to be modified to take into account nested if P. junior > T then P. unior T;
transactions. save P in the probe_Q;
When a nonnested transaction (i.e., the root of a tree) is if T is waiting for its children to commit/abort
created, a (C, i) pair is generated as described in Section II, and then transmit a copy of the saved probe to each child
this pair is assigned as the timestamp of the transaction. When else if T is waiting for a data item
a nested transaction is created, a (C, i) pair is generated, and a then transmit a copy of the saved probe to the data
timestamp is generated for the transaction by concatenating manager;
this (C, i) pair with the timestamp of the parent transaction.
Thus, the timestamp of a nested transaction is a sequence of 4) When a data manager receives a probe P from a transac-
(C, i) pairs, the length of the sequence being determined by tion T it acts as follows.
the depth of nesting. Based on the ordering on (C, i) pairs de-
scribed in Section II, timestamps of transactions are totally if T if waiting for an S_lock
ordered in the following way. then save the probe in S_probe_Q
else save the probe in X_probe_Q;
Given two timestamps, X and Y of the form X1X2 * *Xm
and Y1 Y2 * Yn respectively, where each Xi or Xi is a (C, i) if P. initiator is either a retainer or the holder,
9
or
pair, their relations are defined as follows. P. initiator is a descendant of a retainer or of the holder
X is greater than Y, then declare deadlock and initiate deadlock resolution
if either else begin
1) m > n, and for each X_retainer and the X_holder (if any), Tx,
for all i, 1 <i<n, Xi Yi, do
or
begin
if P. initiator > T,
2) for some i, <i<min(m, n), then propagate the probe P to Tx
end;
Xi = Y1, X2 = Y2,, Xi-1 = Yi1, and Xi > Yi. if T is waiting for an X_lock
Note that in this order, the priority of a transaction is higher then for each S_retainer and each S_holder (if any), T,
than that of its descendants. do
SINHA AND NATARAJAN: DISTRIBUTED DEADLOCK DETECTION ALGORITHM 77
end;
5) When a new retainer or holder is introduced for a data
item, the data manager acts as follows. (Note that when a new (retained)
retainer is introduced, the data manager may have simultane- Fig. 8. A deadlock cycle with nested transactions.
ously scheduled a descendant X_requester, or one or more
descendant S_requesters, i.e., the introduction of a new re- An Illustrative Example: Let us illustrate the working of
tainer may result in simultaneous introduction of new holder(s) this extended algorithm for detecting deadlocks, through an
as well.) example.
if an S_holder or an S_retainer ls, is introduced then Consider the scenario shown in Fig. 8. A transaction T, re-
begin quests an X_lock for the data item Obj1. The lock cannot be
for each requestor, T, in X_request_Q granted since another transaction T2 is an X_holder for Objl .
do T2 has created a child T21 and is waiting for T21 to commit.
if T> Ts T21 is waiting for an S_lock on another data item Obj2, which
then initiate probe(T, Ts) and send it to Ts; has T1 as an X_retainer. (T1 had created earlier a child Tll
for each probe, P, in X_probe_Q which held the item Obj2 in X_mode, and it has committed.)
do In the above situation, a deadlock T1_T2_T21_T1 occurs
if P. initiator > Ts when T, begins to wait for Obj1. Let us illustrate how this
then send a copy of P to Ts deadlock is detected. We consider two possible cases.
end Case 1: T1 > T2. By definition, it follows that T1 > T21.
else % an X_holder or an X_retainer, Tx, is introduced When the data manager of Objl, DI, receives the lock re-
begin quest from TI, it originates probe(T1, T2) and sends it to T2.
for each requester, T, in S_request_Q or X_request_Q When T2 receives this probe, it saves the probe in its probe_Q
do and propagates it to its child T21.
if T> Tx When T21 receives probe(TI, T2), it modifies it to probe(T1,
then initiate probe(T, Tx) and send it to Tx; T21 ), saves it in its probe_Q, and propagates it to D2, the data
for each probe P, in S_probe_Q or X_probe_Q manager of Obj2.
do When D2 receives probe(T1, T21 ), it detects a deadlock since
if P.initiator > Tx the initiator of the probe Tl is an X_retainer for the item.
then send a copy of P to Tx The deadlock is resolved by aborting T21 .
end; Case 2: T2 > T1 . By definition, it follows that T21 > T1 .
Before T, issues its X_lock request for the data item Obj1,
In this extended algorithm, it is possible that a transaction its probe-Q contains probe(T21, Ti ). This is due to the fact
may receive more than one probe with the same value for ini- that when D2 cannot grant the Silock to T21, it initiates
tiator. This may arise because the transaction as well as some probe(T21, T1) and sends it to TI. Upon receiving this probe,
of its ancestors may be retainers or holders for a data item si- T, saves it in its probe-Q.
multaneously. In such cases, the transaction needs to process When T1 waits for an X_lock on Obj1, it propagates probe-
only the probe that it receives first, and it may discard others. (T21, T1) contained in its probe_Q to D1.
In Section VII, we discuss this issue again. Upon receiving probe(T21, TI), D1 detects a deadlock since
Deadlock Resolution and Post-Resolution Computation: As the initiator of the probe T21 is a descendant of T2 which is the
in the basic algorithm, we abort only the lowest priority trans- X_holder of Objl. The deadlock is resolved by aborting TI.
action to resolve the deadlock. However, the scheme for han-
dling clean messages requires some modifications as given below. C. Comparison to Related Work
1) When a transaction receives a clean message, it acts as
follows. Moss [131 has also proposed an edge-chasing algorithm for
detecting deadlocks taking into account nested transactions.
if T is in wait state As described earlier, a major difference between his algorithm
then if T = initiator and ours is that in Moss' scheme, probes are not stored within
then discard the clean message transactions and data managers, and his scheme relies on peri-
else if T is waiting for its children odic retransmission of probes to ensure eventual detection of
then propagate a copy of the clean message to every deadlocks. Apart from this, in Moss' scheme, a data manager
child sends a probe not to the holders of the item, but always to the
else propagate the clean message to the data "potential" retainers. Because of this, his algorithm is prone
manager where it is waiting. to detect phantom or false deadlocks.
2) When a data manager receives a clean message, it updates For example, consider the scenario shown in Fig. 9. There
its S_probe_Q and X_probe_Q, and propagates the message are two transactions T, and T2 , where T, > T2. T2 has created
to all holders and retainers. two children T21 and T22. T1 waits for an X_lock on an item
78 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-1 1, NO. 1, JANUARY 1985
Tx, priority is less than that of Tx. Otherwise, Tx waits for Tr,
which waits for its descendant to commit, and the latter waits
for Tx, resulting in a deadlock. Hence, in the case of nested
TN TN-1 TN-2 T5 T4 T3 T2 Ti
transactions, the above fair scheduling policy can be enforced
\ bjN-1 ObjN-2 Obj4 Obj3 °bi2 Objl /
only when no ancestor of the requesting transaction is a re-
tainer (S_retainer or X_retainer) of the data item. Thus, in
this case, an X_requester may encounter antagonistic conflicts
incrementally.
ObI N
Fig. 10. Propagation of an external probe in a deadlock cycle.
G. Computation of Cycle Length
Since we use an edge-chasing algorithm, it is quite simple to
tiated by an external transaction (called an external probe) en- compute the length of a deadlock cycle. For this purpose, a
ters the deadlock cycle. T2 will save the probe in its probe_Q, probe should have an additional parameter, say length (1),
and since it is waiting for Obj1, will propagate probe(Tx, T2) which is set to one to start with. When a transaction receives
toD1. a probe P, it increments P.1 by one before saving it in its
If T1 > Tx, i.e., the external transaction's priority is lower probe_Q. On receiving a probe P, if a data manager detects a
than the highest priority 'transaction of the cycle, DI will dis- deadlock, then the value of P.1 gives the length of the dead-
card the probe. On the other hand, if Tx > T1, D1 will propa- lock cycle.
gate the probe to T1. Once this probe has crossed over the H. Voluntary Abort by a Transaction
highest priority transaction of the deadlock cycle, it will cover Though the algorithm is designed for detection and resolu-
the entire cycle and will be saved in probe-Qs of all member tion of deadlocks, it can be used by transactions to abort vol-
transactions (and data managers). This is correct since the ex- untarily rather than wait until a deadlock cycle is formed, de-
ternal transaction Tx waits directly or transitively on all mem- tected, and resolved. When a transaction receives a probe, it
ber transactions of the deadlock cycle. But since Tx > T1, the can decide to abort voluntarily on either of two conditions:
probe will keep circulating the cycle indefinitely (until the cycle 1) a transaction with very high priority waits for it directly or
is broken) and a member transaction may receive a probe whose transitively, or 2) the value of P.1 is very high, i.e., a big wait-
initiator is the initiator for some probe already stored in its for chain is already formed.
probe-Q. Such a probe can be considered to be a duplicate,
and it should be discarded. To discard these duplicate probes, ACKNOWLEDGMENT
the following modification to the basic algorithm is needed. The authors thank the referee for his comments and sugges-
When a transaction receives a probe from a data man- tions. They are also thankful to Prof. K. Mani Chandy and
ager, it discards the probe, if there exists a probe in its Prof. M. Stonebraker for their helpful discussions.
probe_Q which has an identical initiator. REFERENCES
F. Fair Scheduling of Exclusive Locks [11 R. Bayer, K. Elhardt, J. Heigert, and A. Reiser, "Dynamic time-
stamp allocation for transactions in database systems," in Distri-
The policy discussed in Section V, of granting an S-lock re- buted Databases, H. J. Schneider, Ed. Amsterdam, The Nether-
quest when an'X_lock request is already pending, is unfair to lands: North-Holland, 1982, pp. 9-20.
X_requestors. A fair scheduling policy would be as follows. [2] P. A. Bernstein and N. Goodman, "Concurrency control in dis-
tributed database systems," ACM Comput. Surveys, vol. 13, pp.
When a transaction T, requests an S_lock, it is granted 185-221, June, 1981.
if there is no X_holder, and no' X_requester of higher [3] K. 'M. Chandy and J. Misra, "A distributed algorithm for detect-
ing resource deadlocks in distributed systems," in Proc. ACM
priority than T. SIGACT-SIGOPS Symp. Principles of -Disbributed Computing,
Ottawa, Ont., Canada, Aug. 1982.
Such a scheme ensures that an X_requester will never en- [41 K. M. Chandy, J. Misra, and L. M. Haas, "Distributed-deadlock
counter antagonistic conflicts incrementally. However, even detection," ACM Trans. Comp ut. Syst., vol. 1, pp. 144-156, May
in this case, S_holders are introduced incrementally, and to 1983.
[51 E. G. Coffman, Jr., M. J. Elphick, and A. Shoshani, "System dead-
take into account transitive wait on these additional S_holders, locks,"ACMComput. Surveys,-vol. 3, pp. 66-78, June 1971.
we need to maintain probe-Qs within data managers. Further, [6] C. T. Davies, "Recovery semantics for a DB/DC systeni," in Proc.
now an S_requester may encounter antagonistic conflicts with [71 ACM Nat. Conf.; vol. 28, 1973, pp. 136-141.
K. P. Eswaran, J. N. Gray, R. A. Lorie, and I. L. Traiger, "The
some S_-holders, and in such cases probes must be sent to notion of consistency and predicate locks in a database system,"
those S-holders. Commun. ACM, vol. 19, pp. 624-633, Nov. 1976.
We must point out here that this fair scheduling policy is not [81 V. D. Gligor and S. H. Shattuck, "On deadlock detection in dis-
tributed systems," IEEE Trans. Software Eng., voL SE-6, pp.
directly applicable in the case of nested transactions since we 435-440, Sept. 1980.
have to take into account retainers also. For example, suppose [9] J. N. Gray, "Notes on database operating systems," in Operating
for some data item there is a retainer Tr and an X_requester Systems, An Advanced-Course (Lecture Notes in Computer Sci-
ence 60). Berlin, Germany: Springer-Verlag, 1978, pp. 398-481.
Tx and let us assume that Tx > Tr. Now, when a descendant [101 R. C. Holt, "Some deadlock properties of computer systems,"
of Tr requests an S-lock, it must be granted, even though its ACM Comput. Surveys, vol. 4, pp. 179-195, Dec. 1972.
80 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-Il, NO. 1, JANUARY 1985
[11] L. Lamport, "Time, clocks and ordering of events in a distributed ment and Computing Techniques, Bombay. From September 1979 to
system," Commur. AC3M, vol. 21, pp. 558-565, July 1978. August 1980, he was a Visiting Engineer in the Computer Systems Re-
[121 D. A. Menasce and R. R. Muntz, "Locking and deadlock detec- search Group at Massachusetts Institute of Technology, where he worked
tion in distributed databases," IEEE Trans. Software Eng., voL on concurrency control problems in distributed systems. He has de-
SE-5, pp. 195-202, May 1979. signed and implemented various systems which include compilers, gen-
[13] J.E.B. Moss, "Nested transactions: An approach to reLable dis- eral purpose graphics systems, multiprocessor operating systems, and a
tributed computing," Lab. Comput. Sci., Massachusetts Inst. file server for a local area network. His current research interests are op-
Technol., Cambridge, MA, Tech. Rep. 260, Apr. 1981. erating systems, database concurrency control, and local area networks.
[14] N. Natarajan, "Communication and synchronization in distributed
programs," Ph.D. dissertation, National Centre for Software De-
velopment and Computing Techniques, Tata Inst. Fundamental
Res., Bombay, India, Nov. 1983.
[15] R. Obermarck, "Distributed deadlock detection algorithm," ACM
Trans. Database Syst., vol. 7, pp. 187-208, June 1982. N. Natarajan was born in Madras, India, on June
[161 D. J. Rosenkrantz, R. E. Stearns, and P.M. Lewis, "System level 28, 1950. He received the B.E. (Hons.) degree
concurrency control for distributed database systems," ACM in electronics and communication engineering
Trans Database Syst., vol. 3, pp. 178-198, June 1978. from the University of Madras, Madras, in 1972,
the M.E. degree in automation from Indian In-
stitute of Science, Bangalore, India, in 1974,
_M .% Mukul K. Sinha was born in Patna, India, on and th PhD. degree in computer science from
September 27, 1950. He received the B.Sc. the University of Bombay, Bombay, India, in
(Engineering) degree in electrical engineering 1983.
from Bihar Institute of Technology, Sindri, He has been working with the National Centre
s@ Ine@lIndia, in 1968, the M.Tech degree in electrical
@t.g for Software Development and Computing
engineering from Indian Institute of Technol- Techniques, Tata Institute of Fundamental Research, Bombay, since
ogy, Kanpur, India, in 1971, and the Ph.D. de- 1974 where he has worked on compilers, operating system for a multi-
gree in computer science from the University of processor, and the design of a local area network. He visited the Labo-
Bombay, Bombay, India, in 1983. ratory for CQmputer Science, Massachusetts Institute of Technology,
He is currently working as a Scientific Officer during 1979-1980. His research interests include operating systems,
at the National Centre for Software Develop- programming languages, computer networks, and distributed systems.