BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1277 
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Using Consistent Subcuts for Detecting Stable Properties
AUTHOR:: Marzullo, Keith 
AUTHOR:: Sabel, Laura S.
DATE:: April 1992
PAGES:: 27
NOTES:: replaces 91-1205
ABSTRACT:: 
We present a general protocol for detecting whether a property holds in a 
distributed system, where the property is a member of a subclass of stable 
properties we call the locally stable properties. Our protocol is based 
on a decentralized method for constructing a maximal subset of the local 
states that are mutually consistent, which in turn is based on a weakened 
version of vector time stamps. The structure of our protocol lends itself to 
refinement, and we demonstrate its utility by deriving some specialized 
property-detection protocols, including two previously-known protocols that 
are known to be efficient.
END:: CORNELLCS//TR92-1277 
BODY::
Using Consistent Subcuts
for Detecting Stable Properties*
Keith Marzullo
Laura Sabel
TR 92-1277
(replaces TR 91-1205)
April1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
This work was supported by the Defense Advanced Research Projects Agency
(DoD) under NASA Ames grant number NAG 2-593 and by grants from IBM and
Siemens. The views, opinions and findings contained in this report are those of the
authors and should not be construed as an official Department of Defense position,
policy or decision.
Using Consistent Subcuts for Detecting Stable
Properties*
Keith Marzullo			Laura Sabel
Cornell University
Department of Computer Science
23 March 1992
Abstract
We present a general protocol for detecting whether a property
holds in a distributed system, where the property is a member of a
subclass of stable properties we call the locally s?able propcr?i?s. Our
protocol is based on a decentralized method for constructing a maximal
subset of the local states that are mutually consistent, which in turn is
based on a weakened version of vector time stamps. The structure of
our protocol lends itself to refinement, and we demonstrate its utility
by deriving some specialized property-detection protocols, including
two previously-known protocols that are known to be efficient.
1 Introduction
It is conceptually simple to determine whether the global state of a dis-
tnbuted system satisfies a stable property; that is, a property ? that satis-
fies ? ? O?. One can have a process use a snapshot algorithm such as the
one given in [CL85] to collect the relevant local and channel states and then
test to see if the condition holds over the collected state. This technique can
This work was supported hy the Defense Advanced Research Projects Agency (DoD)
under NASA Ames grant number NAG 2-593, and by grants from IBM and Siemens. The
views, opinions, and findings contained in this report are those of the authors and should
not be construed as an official Department of Defense position, policy, or decision.
be used to detect any stable property. However, for many stable properties
of interest, such as deadlock, termination, and lack of a token, there exist
specialized protocols (for example, [Mis83,Mat87,BT84,CMH83]) that are
more efficient than a straightforward application of [CL85]. As well as being
more efficient, many of these specialized protocols are very elegant and their
relation to snapshots is not apparent.
It would be useful if one could derive such special-purpose protocols
by refinement of a general snapshot protocol. Unfortunately, the protocol
of [CL85] was not developed with refinement in mind, and we have not
found it conducive to such refinement. In this paper, we present a different
protocol for detecting stable properties that has proven to be more conducive
to refnement.
A naive general detection protocol is as follows: every time a process
executes an event, it appends its current state to a queue maintained in
local memory. A separate process Po periodically retrieves these queues of
local histories and extracts from them the latest global state. Process Po then
tests to see if the property holds in this global state, Unfortunately, this
protocol is impractical since it has a large execution overhead and requires
unbounded local memory. This can be fixed (at a cost of generality) by
having each process record only its current state at appropriate times and
by having Po consider some subset of these local states that could be part
of a sensible globai state of the system. Not all stable properties can be
detected this way, but it turns out that most stable properties that have
been discussed in the literature can.
In this paper, we present a method to detect a subclass of stable prop-
erties. The method can be easily expressed as a decentralized protocol and
can be customized for different properties in order to yield efficient special-
purpose protocols. We demonstrate its utility by using it to derive such
protocols including two previously-known protocols that are known to be
efficient.
2
2 Definitions
2.1 System Model
We consider an asynchronous distributed system consisting of a set of n
nonfaulty processes P = ?P1.P2..? ,p?J. Between any two processes p? and
p? there exist two unidirectional nonfaulty FIFO channels: Ci,j from p? to
and Cj,j from pj to p?. These channels have unbounded delivery time, and
processes communicate only by sending and receiving messages over these
channels.
Processes execute events, which are partitioned into send events, receive
events, and local events. We will denote the tth event executed by process
p? as et? and the resulting local state a2t. Thus, the execution of process p?
? e1 1 2 2
can be denoted (a? o? e? ?j o+..). Note that the state att reflects the
execution of events e11 through e?. When the ordinality of an event or state
is not important, we will drop the superscript, e.g. the execution of event
e? results in the local state ?j.
An arbitrary collection of local states may not constitute a sensible global
state: the local state of a process in the collection may reflect the receipt of
a message while no process' local state reflects the sending of that message.
Such sets of local states are called inconsistent; a sensible collection of local
states is called consistent. ([CL85])
A global state is defined to be a consistent set ? = fai, a2...., an) of the
processes' local states. We assume that channel states are captured in the
local states of the processes. There are many ways to do this, for example
by having each process maintain a history of all messages that it sends and
receives. In practice, one must ensure that the representation of the channel
states does not require an unbounded number of messages to be recorded.
A consistent cut is defined to be a set of events C = (e1,e2,. .
such that the set of states ?a1,a2,.. .,an) produced by C is a global state.
Thus, each consistent cut has a corresponding global state, and vice versa.
In this sense, global states and consistent cuts are equivalent notions. When
3
defining properties of a distributed system, it is convenient to refer to states;
our protocol uses events. For this reason, we define both global states and
consistent cuts for use in different contexts.
A property is a predicate over the global state of the system. A stable
property is an invariant: once it becomes true, it continues to be true1. The
most commonly studied examples of stable properties in distributed systems
are deadlock of a subset of the processes, termination of a distributed com-
putation, and the lack of a token among the processes. There are, of course,
other stable properties of interest. For example, in a token-passing system
that can lose but not generate tokens, the predicate "there are no more than
two tokens in the system" is a stable property.
Let aiI? (read Q' relative to ?) be the values of the subset of variables
of ? that are referenced in the formulation of property ?. An event et? is
relevant to a property ? if att.?lI? # crtt.I?; that is, if ett. changes ?j'5 local
state relative to ? by changing the value of a variable in the formulation
of ?. For example, if ? is "a subset of the processes are deadlocked" then
the relevant events are those that request a resource and those that grant
a resource, since ? is formulated in terms of resource requests and grants.
Note that local events, send events, and/or receive events can be relevant,
depending on how ? is formulated.
2.2 Vector Clocks
Our protocol is based on a variant of vector clocks [Mat89]. In the usual
definition of a vector clock V, each event e? has an fl--Hcomponent vector
V(ej)[1..n] associated with it. V(e?) is called the vector timestamp of e?.
The components of V(e?) are:
o+ V(e1?)[i] = t; that is, v(ett.)[i] is the number of events that p? has
executed up to and including ett..
1Some authors define an invariant property to be one that is valid in all states of the
system.
4
o+ v(et?)[j],j $ i is the number of events pj has executed that causally
precede ??.
As an example, Figure 1 shows a space?time diagram of a two-process system
with the events labeled by vector clocks.
x := 1			u := 1
t			t
I			L
(1,0)			(2,1)(3,1)			(4,1)			(5,1)			(6,4)
(0,1)			(0,2)			(3,3)(3,4)			(4,5)
x := 2
$
(7,4)
(4,6)
L
y := 2			y ;= 3
Figure 1: Execution with vector clocks.
A simple implementation of vector clocks has process p? maintain an
n--Helement vector Vi of counters. Process p? increments vj[i] whenever it
executes an event e?. If e? is a local event or a send event, then V(e?) --H
Vi. If e? is a send event, then p? includes Vi in the message. If ? is the
corresponding receive event, then process p? sets Vk : k $ j : v5[k] to the
maximum of the previous value of v5[k] and the value of Vi[kj in the message,
and V(e?) = V5.
The following three relations hold between vector clocks and global
states, where is the happened-befrre relation defined in [Lam78]. Equa-
tion 1 defines the happened-before relation in terms of vector clocks, Equa-
tion 2 defines when two events are consistent with each other (we call two
such events pairwise consistent), and Equation 3 defines when a set of local
states (a1,.. .,anl produced by events fe1,. . .,e?) comprise a (consistent)
global state:
Vi,j: i $ j: V(e?)[i] < V(e?)[i]			e?			e5
5
(1)
(V(ei)[ij > V(e5)[i]) A
> V(e?)[j]) = e? and e5 are pairwise consistent (2)
Vi,j: V(ei)[i] > V(e5)[i] = ?a1,...,anJ is aglobal state (3)
Equation 2 can be derived by noting that two events e?? and e? are incon-
sistent only if (without loss of generality) ett. e3u. A ?et?' : et? e,t' e3u,
or in terms of vector clocks, v(ett.)[i] < v(e3u.)[i]. Equation 3 can be derived
from Equation 2 by noting that all events in a consistent cut are pairwise
consistent. Observe that ? = ....... , ani is a global state if and only if
C = ....... , e?J is a consistent cut.
When vector clocks are used in actual protocols, not all events cause a
process' vector clock to be updated. For example, some causal broadcast
protocols are based on vector clocks that are updated only at send or broad-
cast events; other events (i.e., receive and local events) do not increment the
local component of V [Pet87,BSS90]. For our purposes, only the execution
of relevant events update the local counter of a vector clock. This is because
the execution of a nonrelevant event does not change the state of the system
with respect to ?.
If not all send events are relevant events, however, then Equations 2
and 3 need not hold. For example, suppose Ctj and e5 are relevant events
that are pairwise inconsistent: et? e3u A ?et?' : ett. etti e3u. If no such
ett.' is a relevant event, then v(ett.)[i] = v(e3u.)[i] and v(e3u)[j] > v(ett.)[j],
which satisfies the left side of Equation 2 even though et? and e3u are pairwise
inconsistent. On the other hand, att.I? = att.II?, and so the fact that ett. and
e3? are pairwise inconsistent is irrelevant with respect to ?, as long as e?' and
are pairwise consistent.
We therefore define a type of vector clock for which a weaker version of
Equation 2 holds. Let the weak vector clock V? for ? be the vector clock
in which V?(e?)[i] counts only the events relevant to ? that p? has executed
through e?. Therefore, the vector timestamp associated with several events
of p? may have the same value, but all such events result in the same local
6
state relative to ??2 For example, in the case of deadlock the relevant
events are sending a request for a resource, sending a grant of a resource,
and receiving a grant of a resource (see Section 5.2). If a process p requests
a resource and then sends an unrelated message, then the send event does
not change the local state of p with respect to possible deadlock. So, the
send event is given the same weak vector timestamp as the resource request
event.
We say that two events et? and ett.t are equivalent with respect to ?, written
N? e? , if et? and et?' have the same weak vector timestamp. We say
that two local states att. and ast' are equivalent with respect to ?, written
ti
a? N? cTj , if e1t. N? 4' Similarly, two global states ? = ?a1tl,. . . , antny and
= ?aitl',..., antnll are equivalent with respect to ?, written ? N? ?`, if for
all i, atti N? a2t.?. The following versions of Equations 1, 2, and 3 hold for
both vector clocks and weak vector clocks:
Vi,j: i $ j: V?(e2to+)[i] < v?(e3u.)[i] --H
(V?(etto+)[i] > v?(e3u.)[i]) A
(v?(e3u.)[j] > v?(e2t)[j])
?etj',ef.(ett,' N? etto+)A(e3?'
4			e?U
(4)
?4', e,u.': (ett.' N? 4) A (ef N? e3U):
e? and e3u are pairwise consistent(S)
Vi,j: V?(ej)[i] > V?(e5)[i] =--H ? global state ?`:
(6)
Figure 2 shows weak vector clock values for the execution shown in Fig-
ure 1, where we assume that the predicate of interest references x and y,
but not n nor any of the channel states. Note that although the events
x := 1 and y := 3 do not form a consistent cut, their timestamps satisfy
Equation 6 since there exist several cuts equivalent to this inconsistent cut
(all necessarily having (x = 1, y = 3)) and they are therefore consistent with
2Note that two events of different processes may have the same weak vector timestamp
as well.
7
respect to ?.
x
f
n			1
(1,0)			(1,o)(1,o)			(1,0)			(1,0)			(1,1)
x			2
k
(2,1)
(1,2)
y			2
II I
r
y := 3
Figure 2: Execution with weak vector clocks.
2.3 Locally Stable Properties
Our protocol will detect a subset of the stable properties that we call locally
stable properties. Informally, a stable property ? is locally stable if no pro-
cess involved in the property can change its state relative to ? an unbounded
number of times once ? holds. For example, suppose ? is "processes p? and
are deadlocked." The property ? is locally stable because once ? be-
comes true, neither p? nor p?, the processes involved in ?, can execute any
event that could affect ? (e.g., requesting or granting a resource). Hence,
QI? and a?J? remain constant once ? holds.
More formally, let g be the set of all global states that the system can
attain. For any ? E q, define ??? to be the subset of ? that is referenced
in the formulation of ?, and given a set of processes A define ?A to be
the subset of ? that consists of the states of the processes in A. We will
call ? locally stable if it is stable and if it satisfies the following condition:
consider any ? E q that satisfies ?, and let A be the set of processes p? such
that a? ? does not change an unbounded number of times in any execution
starting at ?. Then, for all ?` E ? such that ?1? = ?AI?, ? holds in ?`.
8
In other words, ? can be determined from only the states of the processes
in A. Note that A can be empty, but only for trivial stable properties; if A
is empty, then ? can be determined without knowledge of the state of any
process or channel and must therefore be valid or not valid in all states. For
this reason, we will assume in this paper that A is nonempty.
The most commonly studied stable properties--Hdeadlock, termination,
and lack of a token--Hare all locally stable. For example, if ? is a deadlock
state, then A includes the deadlocked processes, and so the presence of
deadlock can be determined by considering only the states of the processes
in A. An example of a stable property that is not locally stable is the
property "there are no more than k : k > 0 tokens" in a system where
tokens cannot be created but can be lost when passed. This is because if
? is a state in which there are k tokens, then every process can execute a
relevant event an unbounded number of times (namely, it can pass tokens),
thereby changing its local state relative to ? an unbounded number of times,
and so A is empty. The property, however, is not valid in all states of the
token passing system.3
For most locally stable properties of interest, the processes in A cannot
change their local states relative to ? at all once ? holds (i.e., the bound
on the number of future relevant events that they can execute is zero). In
this case, if our protocol presented below is initiated in a state in which ?
holds, it is guaranteed to detect ?. For locally stable properties for which
the bound is not zero, however, the protocol may not detect ? if initiated
in a state in which ? holds. However, the system will eventually reach a
state in which all processes in A will execute no further relevant events. If
the protocol is initiated in such a state, it is guaranteed to detect ?. Thus,
3A need not be empty for a stable property to be not locally stable. For example,
suppose ? is again "there are no more than k k > 0 tokens" and the token passing
system consists of red and green processors. Furthermore, only red processors can lose
tokens and a green processor never passes a token to a red processor. In this system, A
is the set of green processors (green processors never execute a relevant event), yet the
validity of ? depends on the states of both the green and red processors. Hence, ? is not
locally stable.
9
though the protocol eventually detects ? in all cases, it detects ? "sooner"
when the bound is zero.
3 Protocol
3.1 Basic Protocol
We first assume that a process Po will determine whether the global state of
the processes P = ?Pi, ,Pn) satisfies a locally stable property ?. Later,
we will change this protocol so that any number of processes in P may
concurrently assume the role of Po
Our protocol is based on the notion of a consistent subeut a set of
events whose timestamps satisfy Equation 6. (The state of a single process
is trivially a consistent subcut.) Informally, the protocol works as follows.
Whenever a process p? executes a relevant event e?, p? records in a buffer Bi
its local state relative to ? and the vector time stamp V?(e?) as B?.a and
Bj.V, respectively. Process Po periodically collects the values of the buffers
in any order, yielding a set B = ?B1, B2, ..., BnJ. Once Po has constructed
this set, Po determines if there exists a maximal consistent subcut of B such
that the states associated with the timestamps in the subcut satisfy ?. If
Po can find such a subcut, then ? must currently hold. Note that Po need
not examine all consistent subcuts; if A' c A and ? holds in ??IA', then
? will also hold in ?? A, so we need examine only the maximal consistent
subcuts of B. Of course, ? may be of the form Vpi : ?(p?), in which case
only a full consistent cut will satisfy ?.
Unfortunately, the number of maximal subcuts of a set of n weak vector
clocks is ?(2n) Fortunately, it is not necessary for Po to examine all of
these subcuts. Suppose the set of buffer values contains Bt and Bj that are
inconsistent: Bi.V[i] < Bj.V[i]. These two states violate Equation 6, and
so both cannot be part of the same consistent subcut. However, Bi.V[i] <
Bj.V[i] implies that p? executed a relevant event between the time that
B?.a was recorded and the time that B?.a was recorded. Therefore, Po need
10
o+ Each process p? C P records a? and V?(e?) in buffer Bj upon executing
a relevant event e?.
o+ Periodically, Po collects all of the buffers Bi and extracts from them
the latest subcut fa? : Vj : Bj.V[i] < Bi.V[i]).
o+ Po detects ? if ? holds on the latest subcut.
Figure 3: Basic Protocol
not consider subcuts containing Bi.a: if the system is in a state such that
the processes involved in ? will execute no more relevant events, then B?.a
cannot be necessary for the detection of ? and so need not be considered.
Otherwise, the system will eventually reach such a state. If B?.cr is involved
in determining ?, then Bi will be recorded such that Bj.V[i] > B3.V[i]
for all j. Thus, given a set of buffered values B and the partial order
VBj,Bj E B . ? Bj%? Bi.V[j] > Bj.V[j], Po need only find the greatest
elements of!3 with respect ?, which can be done in ?(n2) time.4 We call
this subcut the latest subeut of B. The latest subcut is clearly a maximal
subcut, since all states that are not part of the latest subcut are inconsistent
with some state in the latest subcut. This gives us the protocol shown in
Figure 3.
The soundness of this protocol is straightforward. We now argue that the
protocol is complete as well; that is, if ? holds, then our protocol will detect
?. Let ? be the first global state in which ? holds. Since ? is locally stable,
there is a nonempty set of processes A each of which executes a bounded
number of relevant events after ?; these processes will not change their states
relative to ? nor update their vector clocks an unbounded number of times
4The greatest elements of ? can be found by discarding any values B3 such that
Bi.V[j] > B3.V[j], which can be done in 0(n2) time using a straightforward algon'thm.
And, if all values are incomparable then all the values are?greatest elements of ?. To
determine that they are all incomparable takes n2 comparisons, and so the problem is
11
in ally run starting at ?. Suppose that Po initiates the protocol in or after
? (i.e., when ? holds). Because the processes in A can each execute only
a bounded number of relevant events after ?, and because message delivery
time is finite, there is some global state ?` reachable in finite time from ?
after which the processes in A execute no more relevant events. Therefore Po
will eventually collect the states ??IA. From the definition of ?, the state
of a process p? in A must be in any latest subcut constructed by Po because
p? will execute no more relevant events. Since ? is stable, ?` satisfies ?, and
since ? is locally stable, ? can be detected by examining ?`AI? Hence, Po
will detect ?.
3.2 Decentralization
In the above protocol, Po's role is to collect the local states, determine the
latest subcut, and check if ? holds in this subcut. We can decentralize these
steps by collecting the local states in a token.
Consider a token K that consists of n entries ....... , Dn? where each
entry Dj = (Bi.QBi.V[i]); that is, Di will hold the state of pi relevant to ?
and the local component of Pi `5 vector clock when it generated B?.a. Assume
that there exists a special value 1 for Di indicating that the state is not in
the token; all of the Di in K are initially set to 1.
To determine whether ? holds, a process generates a token K, inserts
its state and vector clock value into K, and passes the token to any other
process. When a process p? receives a token K, it takes the following steps:
1. Set Dj to (Bj.QBj.V[j]).
2. For all non-I values of Dk that are not in the latest subcut, set Dk to
I.
3. Determine whether the state values in K satisfy ?. If so, then the
detection is made; otherwise, p? forwards the token to a process Pk,
chosen fairly, that has Dk = I. If there is no such process, then p?
12
can either drop the token or, when p? computes a new value of Bj, pj
can restart at Step 1 with this token.
Note that when process p? executes rule 2, Bj must be part of the latest
subcut; if it were not, then there would exist a recorded value of B? in De
such that Bj.V[jj < B?.V[j]. This implies that Pe knows of a relevant event
executed by p? that results in a state causally after the state recorded in Bj,
which violates the definition of Bj. Thus, only the earlier values Dk need be
tested with respect to Bj. From above, the value Bk in Dk can be discarded
if Bk.V[k] < Bj.V[kj. The value Bk.V[ki is stored in Dk.V, so R' carries
enough information for p? to make this test.
The resulting protocol is summarized in Figure 4. Note that we have
no a priori restriction on how many tokens there can be in the system at
any time or on the order in which the token is passed, other than that it is
passed in a fair manner. These decisions can be made when the protocol is
applied to a particular problem.
If this protocol is initiated in a state in which ? holds and after which
no process executes a relevant event, then ? will be detected with no more
than n token passes. However, if processes do execute relevant events after
the protocol is initiated, then the initial detection may not be successful and
the protocol must be restarted. If the number of relevant events that can be
executed after ? holds is bounded by A, then detection can take up to an
additional An token passes. For large A, our protocol could perform worse
than a snapshot protocol. In practice, however, we do not expect A to be
large.
4 Termination Detection
We now instantiate the general protocol given above to obtain a protocol
that detects termination in a distributed system. There are many vari-
ations of this property; the earliest that we know of is due to Dijkstra
and Scholten [DS8O]. The following definition is the same as that given
13
o+ Each process pi E P records ? and V?(ei) in buffer Bj Upon executing
a relevant event e?.
When p? wants to detect ?, p? generates a token (D1 I,... ,
I?, sets Dj to (B?.a, Bi.V[i]), and forwards the token to any other
process.
When p? receives a token:
o+ p? sets Dj to (B5.a,B?.V[j]).
o+ For each Dk : k 96 5, Dk 96 I, p? sets Dk := I if Dk.V = Bk.V[k] <
Bj.V[k].
o+ p? determines if ? holds on the state values in the token. If not, p?
forwards token to any Pk such that Dk = I.
Figure 4: Decentralized Protocol
in [Mis83].
All processes are either active or idle. Only active processes can send
messages. An active process may become idle at any time, and an idle
process can become active only upon receipt of a message. The system
is terminated when all processes in the system are idle and there are no
messages in transit.
The local state of a process relative to termination consists of whether
the process is active or idle and whether there is a message on an incoming
channel. Therefore, the events that are relevant to termination are sending
a message, receiving a message, becoming idle, and becoming active. Each
process will update its (weak) vector clock upon executing any of these
events. Note that for this problem, we do not need to keep track of the
contents of the messages exchanged between processes; only the number of
messages is important. To capture the channel states, we have each process
keep track of how many messages it has sent and received on each adjacent
14
channel. The combined information of all of the processes will then yield
the number of messages in transit on each channel: if p? has sent more
messages to p? than p? has received from p?, then there is at least one
message on channel Ci,j. In this way, we can represent the relevant channel
states without recording an unbounded number of messages.
We instantiate the general protocol given in Section 3.2 as follows.
Each process p? maintains the following local state variables:
o+ active?: Boolean = true if and only if p? is active.
o+ sen?[1..n]: Integer array. send??[j] = the number of messages that p?
has sent to p?. All are initially 0.
o+ recvi[1..n]: Integer array. recv?[j] = the number of messages that pj
has received from p?. All are initially 0.
When p? sends a message to p?, sen?[j] is incremented. When p? receives
a message from Pi, recv?[j] is incremented. When p? becomes active or idle,
active? is set appropriately.
At some point, an idle process will start the detection protocol by cir-
culating a token as described in Section 3.2. The termination condition can
only be evaluated over a total global state (as opposed to a consistent proper
subset of the process states), so a positive determination can be made only
by the process pj that is the last to add its state to the token.
Process PI detects termination if and only if the following three condi-
tions hold:
1. The timestamps in the token form a consistent cut;
2. All processes are idle: Vi : active? = false;
3. There are no messages in transit: Vi,j : sen?[j] = recv5[i].
The following theorem and corollary show that item 1 is redundant. The
theorem assumes for simplicity that the buffered states are not collected in
a token; the corollary removes this assumption.
15
Theorem 1 Let 13 = ?B?: i = 1,2,... n? be a a set of buffered state values
that were recorded in a system that does not collect states in a token D. If in
13,Vi,j: sendj[j] = recv?[i], then the global state defined by 13 is consistent:
Vi,j: Bi.V[i] > B5.V[ij.
Proof: Suppose by way of contradiction that item 3 holds over 13 but the
timestamps in 13 form an inconsistent cut: ?i,j: Bi.V[i] < Bj.V[i]. Bj.V[i]
is advanced only when p? receives a message, and events local to p? affect
only Bj.V[j]. Therefore, in order for Bj.V[i] to advance beyond the recorded
Bi.V[i], there must have been a chain of messages between p? and p? between
the time that Bi was collected and the time that Bj was collected. This
implies that there is some k such that the recorded send,[k] < recv?[i],
contradicting the assumption that item 3 holds.
Corollary 2 Let 13 = ?B?: i = 1,2,... n) be a a set of buffered state values
that were collected in a token D. If in D,Vi,j: send?[j] = recv5[i], then the
global state defined by D is consistent: Vi,j: Dj.V[i] > Dj.V[i].
Proof: None of the events executed in collecting the buffered states into a
token are relevant. Hence, collecting the states in this way has no effect
on their consistency with respect to termination. The buffered states will
therefore be consistent with respect to termination when items 2 and 3 hold.
Corollary 2 implies that the vector clocks need not be maintained. Fur-
thermore, these checks can be done incrementally. For example, we can
assign a total order to the processes and have the token passed along that
total order. When process Pk receives the token, it tests to see if
?active? A (V?: 1 <  < k : (send?[l] = recvj[k]) A (sendt[k] = recv?[l])).
If this condition does not hold, then Pk can drop the token. If the
condition holds and k --H n then termination is detected; otherwise, Pk fills
in Dk and passes the token to Pk+i
16
This yields the protocol given in [Mat87] as the channel counting pro-
tocol, which requires only n messages to detect termination once it holds,
and which can be further refined into a protocol that is space-efficient. This
is a good example of how our general protocol, which constructs consistent
(sub)cuts explicitly, can be used to derive a much simpler protocol that
constructs consistent cuts implicitly.
5 Deadlock Detection
5.1 k-out-of-m Deadlock
We now instantiate the general protocol given in Section 3 to obtain a proto-
col that detects k-out-of-m deadlock in a distributed system. This problem
was first formulated and solved in [BT84]. In this formulation, a process
can request k resources from a pool of m resources.
A process is either active or blocked, where an active process is one
that is not waiting for any other process. Active processes may issue k-
out-of-m requests in the following way. When an active process p? requires
k processes to carry out some request, it sends request messages to each
of the m processes that can perform this action. Process p? then becomes
blocked, and waits until the action requested is carried out by at least k of
the m processes. A process can not send any further requests while blocked,
but a process can receive request messages while blocked.
Only active processes can carry out a requested action. If a process p3
receives a request while active, it will either become blocked or carry out
?j?5 requested action within finite time. In the latter case, p? will send a
grant message to p?. When p? receives k grant messages, it becomes active
again. It then relinquishes the requests made to the rest of the processes to
which it sent request messages by sending them relinquish messages. We
assume that the recipient of a relinquish message acknowledges the message
and that the sender of a relinquish message waits for ali acknowledgements
before sending another request message. By doing so, we guarantee that
17
p? can discard any grant messages received after the first k are received.
The state of a process p? relative to k-out-of-m deadlock consists of
the number of grants needed for p? to become active and the current set
of processes that p? is waiting for. We capture this state by having each
process keep track of the processes on which it is blocked and the number
of grant messages that it has sent and received on each adjacent channel.
We instantiate the general protocol given in Section 3.2 as follows. Each
process p? maintains the following local state variables:
o+ k?: Integer = the number of grant messages required for Pt to become
active (initially 0).
o+ g?send?[1..n]: Integer array. g?send??[j] is the number of grant mes-
sages that p? has sent to p? (all are initially 0).
o+ g?recv?[1..n]: Integer array. g?recv?[j] is the number of grant messages
that p? has received from pi (all are initially 0).
o+ wi;: Integer set. These are the processes that p? is waiting for. When
p? sends a request message to p?, Wfj := Wfj U (j?; when p? receives
a grant message from p? or sends a relinquish message to p?, Wfj
Ufj?fj)
Deadlock is determined by constructing and reducing the system waits-
for graph. This graph is constructed as follows:
o+ a waits-for edge is drawn from p? to pj if wi; ? j A (g?send5 [i] --H
??recvi[j])). That is, p? is waiting for a resource from p? and no grant
message is in transit from p? to p?.
o+ the number of grants ?? needed for p? to be unblocked is ki --H Ivi
g?send5[i] --H g?recvj[j]J. That is, ?? is the number of grants that p? is
waiting for less the number of grants in transit to p?.
Deadlock is tested by reducing this graph as follows: if an edge points from
p? to p? and p? is active, then the edge can be erased and ?? can be reduced
18
by one; and if a process has ?? = 0, then all of its outgoing edges can be
erased. The system is deadlocked if and only if there are edges that cannot
be removed by following these two rules.
In this system, the relevant events of p? are those that change wi;, kj,
g?sen? and ??`ecVi. Hence, the relevant events are requesting a resource,
sending a grant message, receiving a grant message and sending a re-
linquish message. We can now argue that k-out-of-m deadlock is locally
stable: a deadlocked process can execute only a bounded number of relevant
events (namely, it can receive up to kj --H 1 grant messages), and any valld
global state that contains the local states of the deadlocked processes still
yields an irreducible waits-for graph.
The deadlock detection protocol is as follows. When a process p? wishes
to test for deadlock, p? generates a token, fills Dj with ((Wfj, g?send,, g?recv?,
ki), Bi.V[i]), and forwards the token to some $ p?. Upon receiving a to-
ken, aprocess p? sets Dj to ((wf5,g?send?,g?recv5,kj),Bj.V[j]) and discards
all values Dk that are inconsistent with Bj by setting Dk to 1. p? then
checks to see if deadlock holds on the remaining values by constructing the
waits-for graph and reducing it. If deadlock does Rot hold, then p? forwards
the token to any process Pk such that Dk = 1.
We can improve this protocol by choosing the process to which the token
is passed more carefully. Since we would like to detect deadlock as quickly
as possible, the forwarding process should choose a process that is likely to
add information leading to the detection of a deadlock. A reasonable choice
is a process p? such that D3 = 1 and such that p? is in Wfj for some Di # I.
The full protocol is presented in Figure 5. We assume that the process
p? that generates the token does so because it suspects that it is involved in
a deadlock; that is, Wfj is not empty.
5.2 RPC Deadlock Detection
i-out-of-i deadlock is a special case of k-out-of-m deadlock that lends itself
to further optimization. This type of deadlock is called RPC deadlock be-
19
when p? receives token ??....., D??:
begin
Dj.?			k?,??sen4',y?recvj, wj;;
Dj.V			Bi.Vki;
for all Dj : Dj.V < B?.V[j]: Dj := I;
ifthereexistspj : (Dj = I)A(?pi :j E wfj)
then forward token to one of these p?
else begin
construct waits--Hfor graph;
reduce waits--Hfor graph;
if graph is not fully reduced then signal deadlock
else drop the token
end
end
Figure 5: Protocol for Detecting k-out-of-m Deadlock
cause it can occur in a remote procedure call system, where making a remote
procedure call is analogous to requesting a resource from a single processor.
The waits-for graph is constructed as for k-out-of-m deadlock, except that
kj = WI; and thus need not be represented in the wait-for graph. Fur-
thermore, relinquish messages are not needed and the waits-for graph is
reducible if and only if it does not contain a cycle.
We can instantiate our protocol for detecting RPC deadlock as follows.
As before, the relevant events are requesting a resource (here, making an
RPC request), sending a grant message (here, sending the reply to the RPC
request), and receiving a grant message. Any blocked process p? can decide
to detect deadlock by generating an empty token, inserting its buffered state
into Di, and passing the token to the (single) process in wI;. When p? receives
a token from pj, p5 will verify that p? waits-for p? and will pass the token
on to the process blocking p?. A process detects deadlock when it receives
a token that contains a complete cycle. The resulting protocol is shown in
20
RPCDeadlock(p?): cobegin
do forever when (Wfj $ ?) and (waited too long)
create empty token Ic;
K.D5 (g?sendi[w%], B?.V[j]);
pass K to
? do forever when receive token K from p?
if K.D5 = I then
if wf5 $ ? and K.D?.g?sen?[j] = g?recv5[i]
and VK.D?: K.Dk $ ?: K.Dk.V[k] > B5.V[k]
coend
then
K.D?:= (g?sen4[w%],B5.V[j]);
pass K to Wfj
else skip /* drop token K */
else
if iC.D?.g?send??[j] = g?recv5[i] then detect deadlock
else skip /* drop token K */
Fignre 6: RPC Deadlock Protocol, Original
21
Figure 6. Note that if the waits-for graph contains a d--Hcycle, then the token
need be passed only d times.
This protocol can be further simplified by applying the following two
theorems.
Theorem 3 If (K.D?.g?sen?[j] = g?recn,[i]) then p? has executed no rele-
vant event since setting R'.Dj.
Proof: Assume (K.D?.g?send?[j] = y?recv?[i]). The last relevant event that
can have executed before setting k.Dj was to send a request message to
p?. The first relevant event that p? can have executed after setting K.Dj is
the receipt of a grant message from p?. Since (K.D?.g?sendi[j] = g?recv5[i]),
p? has sent no grant messages to p? since p? sent the request to p??. Hence,
p? can have executed no relevant event since setting K.Dj.
Theorem 4 If(K.D?.g?sen?[j] = g?recvj[i]) then no process that has set
its value in K has subsequently executed a relevant event.
Proof: Let ? be the number of values Dk : Dk $ I, and assume that
(I?.D?.g?sen?[j] = g?recv5[i]). We will use induction on
Base case = 1). Follows directly from Theorem 3.
Induction case (> 1). By the induction hypothesis, no process prior to
p? had executed a relevant event when p? received K. No process prior to
can execute a relevant event until p? does, and by Theorem 3 p? has not
executed a relevant event since forwarding K to
0
Theorem 4 implies
(g?senc4[j] = g?recv5[i]) ? (Vk: Dk $ ?: Dk.V = Bk.V[k] > Bj.V[k]).
Thus, the vector clocks can be omitted and the token need only carry the
identity of the process that initiated the test for deadlock. The resulting
protocol, shown in Figure 7, first appeared in [CMH83] specialized for m =
22
RpCDeadlock(pj): cobegin
do forever when (Wfj $ ?) and (waited too long)
send (y?sendj[w%],j) to wf5
I do forever when receive (s,k) from p?
ifk $j then
if Wfj $ ? and 5
then send (g?send5[wf?],k) to Wfj
else
ifs = g?recvj[i] then detect deadlock
coend
Figure 7: RPC Deadlock Protocol, Refined
The protocol in Figure 6 can be easily generalized to detect detect and--H
deadlock (m-out-of-m requests), since a cycle in the waits-for graph is equiv-
alent to deadlock in this case as well. The only change necessary is that when
p? passes the token, it must replicate the token and pass a copy to each pro-
cess in Wfj. With and--Hdeadlock, however, a process can execute a relevant
event while deadlocked a deadlocked process can receive a proper subset
of the required grant messages. Thus, if the waits-for graph contains a
cycle, then even if tokens are generated by a deadlocked process and passed
along cycles, such tokens may be dropped up to n --H d times before the dead-
lock is detected. However, the protocol in Figure 7 can be effectively run
in parallel by having p? send (g?send5[u], k) to all the processes u E Wfj in
which case a token passed along a cycle will not be dropped. The resulting
deadlock detection protocol is the one presented in [CMH83].
6 Conclusion
This paper defined a proper subclass of the stable properties which we denote
the locally stable properties. This suliclass is interesting in that a process
that is "involved" in establishing the stable property is limited in what it
23
can do, and will eventually cease changing its local state with respect to
the stable property. llence, in order to detect a locally stable property, a
consistent cut need not be explicitly constructed the relevant local states
will form a consistent subcut implicitly. This leaves only the problem of
detection.
In order to make this observation, we needed to define consistent cuts
with respect to a global state predicate, and slightly extend the notion of
vector clocks to accommodate our definition. We then gave a simple and
decentralized protocol that detects when a locally stable property occurs in
a distributed system. The protocol can be easily refined, which we illustrate
by refining it to a known protocol for termination detection, a new proto-
col for k-out-of-m deadlock detection, and known protocols for m-out-of-m
deadlock detection.
In the reductions to the two known protocols, the vector clocks proved
redundant. This was because the processes involved in ? could execute
no relevant events once they established the condition of interest, and the
detection algorithm also ensured that the channels carried no undelivered
relevant messages. In both cases, the receipt of a relevant message was the
only relevant event that a process involved in ? could execute, and so an
empty channel between two processes implied pairwise consistency of the
recorded states of those two processes. This observation is similar to one of
the steps in the refinement of a termination protocol given in [CM86], yet
we have not been able to refine our protocol to their termination protocol.
The class of locally stable properties was defined in proving the protocol
correct. We would like to determine what kinds of properties are locally
stable. We know of two general classes: the locally stable properties of
distributed garbage detection, termination, and global deadlock can all be
expressed as detecting no token in a generalized token passing system, yet
deadlock of a subset of the processes does not seem to be so expressible. We
are interested in whether there are other classes of locally stable properties.
Not all interesting stable properties are locally stable, however. For
24
example, the property "the number of tokens is less than k > 0" in a token
passing system that can lose but not regenerate tokens is stable but not
locally stable. We do not know if there are protocols that are more message-
efficient than snapshot protocols for detecting such properties.
Our work was motivated by trying to derive message-efficient special-
purpose detection protocols from a general detection protocol. We have only
been partially successful. Our protocol is most efficient when no process can
execute a relevant event after the condition of interest holds. Furthermore, in
our derivation of the m-out-of-m deadlock detection protocol in Section 5.2,
our protocol could generate 0(n) extra messages. Hence, we would like to
better understand the notion of relevant events and weak vector clocks.
Acknowledgements We would like to thank 6zaip Babao?lu, Gil Neiger,
Fred Schneider, and Sam Toneg for their contributions to the ideas in this
paper. We would also like to thank Ken Birman, Bard Bloom, Navin Bud-
hiraja, Tushar Chandra, and Mark Wood for their valuable comments on
earlier drafts of this paper.
References
[BSS9O]
[BT84]
[CL85j
K. Birman, A. Schiper, and P. Stephenson. Fast causal mul-
ticast. Technical Report TR-90-1105, Cornell University, April
1990. Submitted for publication.
Gabriel Bracha and Sam Toueg. A distributed algorithm for gen-
eralized deadlock detection. In Proceedings of the Third Sym-
posium on Principles of Distributed Computing, pages 285--H301.
ACM SIGPLAN/SIGOPS, August 1984.
K. Mani Chandy and Leslie Lamport. Distributed snapshots: de-
termining global states of distributed systems. ACM Transactions
on Computer Systems, 3(1):63--H75, February 1985.
K. Mani Chandy and Jayadev Misra. An example of stepwise
refinement of distributed programs: Quiescence detection. ACM
Transactions on Programming Languages and Systems, 8(3):326--H
343, July 1986.
[CM86]
25
[CMH83] K. Mani Chandy, Jayadev Misra, and Laura M. Haas. Distributed
deadlock detection. ACM Transactions on Computer Systems,
1(2):144--H156, May 1983.
[DS8O]
[Lam78]
Edsger W. Dijkstra and C.S. Scholten. Termination detection for
diffusing computations. Information Processing Letters, 11(1):1--H
4, 1980.
Leslie Lamport. Time, clocks, and the ordering of events in a
distributed system. Communications of the ACM, 21(7):558--H565,
July 1978.
[Mat87] Friedemann Mattern. Algorithms for distributed termination de-
tection. Distributed Computing, 2(3): 161--H175, 1987.
[Mat89]
[Mis83]
Friedemann Mattern. Time and global states of distributed sys-
tems. In Michel Cosnard et. al., editor, Proceedings of the Inter-
national Workshop on Parallel and Distributed Algorithms, pages
215--H226. North-Holland, October 1989.
Jayadev Misra. Detecting termination of distributed computa-
tions using markers. In Proceedings of the Second Symposium on
Principles of Distributed Computing, pages 290--H294. ACM SIG-
PLAN/SIGOPS, August 1983.
Larry L. Peterson. Preserving context information in an IPC ab-
straction. In Proceedings of the 6th Symposium on Reliability in
Distributed Software and Database Systems, pages 22--H31, March
1987.
[Pet87]
26
