BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR94-1413
ENTRY:: 1994-04-19
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Simulating Fail-Stop in Asynchronous Distributed Systems
AUTHOR:: Sabel, Laura S.
AUTHOR:: Marzullo, Keith
DATE:: March 1994
PAGES:: 24
ABSTRACT::
The fail-stop failure model appears frequently in the distributed systems 
literature. However, in an asynchronous distributed system, the fail-stop 
model cannot be implemented. In particular, it is impossible to reliably 
detect crash failures in an asynchronous system.

In this paper, we show that it is possible to specify and implement a failure 
model that is indistinguishable from the fail-stop model from the point of 
view of any process within an asynchronous system. We give necessary 
conditions for a failure model to be indistinguishable from the fail-stop 
model, and derive lower bounds on the amount of process replication needed to 
implement such a failure model. We present a simple one-round protocol for 
implementing one such failure model, which we call simulated fail-stop.
END:: CORNELLCS//TR94-1413
BODY::
Simulating Fail-Stop in
Asynchronous Distributed Systems*
Laura Sabel**
Keith Marzullo
TR 94-1413
March 1994
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
This work was supported by the Defense Advanced Research Projects Agency
(DoD) under NASA Ames grant number NAG 2-593, and by grants from IBM and
Siemens. The views, opinions, and findings contained in this report are those of the
authors and should not be construed as an official Department of Defense position,
p*olicy, or decision.
* This author is also supported by an AT&T PhD Scholarship.
Simulating Fail-Stop in Asynchronous Distributed Systems*
Laura 5. Sabelt
Cornell University
Department of Computer Sdence
Keith Marzullo
University of California, San Diego
Department of Computer Sdence
10 March 1994
Abstract
The fail-stop failure model appears frequently in the distributed systems literature. How-
ever, in an asynchronous distributed system, the fail-stop model cannot be implemented. In
particular, it is impossible to reliably detect aash failures in an asynchronous system.
In this paper, we show that it is possible to spedfy and implement a failure model that is
indistinguishable from the fail-stop model from the point of view of any process within an
asynchronous system. We give necessary conditions for a failure modd to be indistinguishable
from the fail-stop model, and derive lower bounds on the amount of process replication needed
to implement such a failure model. We present a simple one-round protocol for implementing
one such failure model, which we call simulated fail-stop.
1 Introduction
The fail-stop failure model appears frequently in the distributed systems literature. The fail-stop
model makes two assumptions about the failure behavior of processes: processes fail only by
permanently crashing, and when a process crashes, surviving processes will eventually detect
that failure. The fail-stop model is appealing because it makes distributed algorithms easier to
formulate: fail-stop failures are easy to tolerate.
For example, supposethatasetofprocesses f1, 2,..., ?1 wish to solve the election problem: at any
point, no more than one process of the set can be the leader, and as long as all processes do not fail,
it is always the case that there will eventually be a leader. Assuming a fail-stop failure model leads
to a very siinple solution. Each process maintains a local copy of the list (1,2,..., n?, and the first
element of this list denotes the leader. When process i detects the failure of process j, i removes j
froms its local copy of the list. When i finds itseff the first element of its list, i knows that it is the
leader. Since a process becomes the head of the list only when all lower-numbered processes have
failed, there is no more than one leader at any time; and, as long as a process eventually detects
the failure of the lower-numbered processes, it will eventually become the leader.
This work was supported by the Defense Advanced Research Projects Agency (DoD) under NASA Ames grant
number NAG 2-593, and by grants ftom IBM and Siemens. The views, opinions, and findings contained in this report
are those of the authors and should not be construed as an offidal Departrnent of Detense position, policy, or dedsion.
tThis author is also supported by an AT&T PhD Scholarship.
A serious limitation of assuming a fail-stop failure model is that it is often an unrealistic
assumption. In particular, in an asynchronous distributed system (i.e., a system with no shared
memory? arbitrary message delivery times, no global dock, and arbitrary process speeds), the
fail-stop model cannot be implemented. This is because it is impossible to reliably detect crash
failures in an asynchronous system (see Theorem ?).
On the other hand, there are systems (e.g., Is's [BJ87I) that provide crash-failure detection with-
out making synchrony assumptions. They do this by allowing failures to be detected erroneously,
e.g., by using timeouts and gossip messages ([RB9?]) to attain agreement among a set of processes
that a process p has failed even though that process p may not have crashed. Hence, they provide
a failure model that resembles fail-stop in some ways but is not strictly fail-stop.
In this paper, we present a failure model, called simulated Jail-stop, that is internally indistin-
guishable from fail-stop, meaning that under this model, no process in the system can determine
that it is not running in a system in which the fail-stop assumption holds. We give a set of con-
ditions that are necessary in order for any model to be indistinguishable from fail-stop, and we
prove that simulated fail-stop is indistinguishable from fail-stop. We give lower bounds on the
number of processes needed for a oneround implementation of the simulated fail-stop model to
tolerate failures, and show that these bounds hold for any model that is indistinguishable from
fail-stop. Fmally, we show that the bounds are tight by giving a protocol that attains them.
The paper is organized as follows. Section 2 describes the system model used throughout
the paper, including notation, definitions, and a forrnal logic used to describe system properties.
Section 3 specifies the fail-stop and simulated fail-stop models, introduces the notion of indistin-
guishability of failure models, and proves that certain conditions are necessary and/or suffident
for a failure model to be indistinguishable from fail-stop. Section 4 gives lower bounds on the
number of processes needed to tolerate t failures for oneround failure detection protocols imple-
menting the simulated fail-stop model, and shows that these bounds hold for any model that is
indistinguishable from fail-stop. Section 5 shows that these lower bounds are tight by presenting
a protocol that meets them. Section 6 concludes the paper and discusses the work that remains to
be done on this topic.
2 System Model
We consider a distributed system consisting of a set of n processes P = ?l,2,..., nl A process
fails by simply stopping execution (crashing), and a failed process does not recover. The system
is asynchionous, meaning that the rate of execution of any process with respect to any other is
unbounded and there are no physical clocks. Between any two processes and j there exist two
unidirectional FIFO channels: Ci,3 from i to j and C5,i from j to?. Processes communicate only
by sending and receiving messages over these channels. The channels are nonfaulty: they do
not lose, generate, or garble messages. Message delivery time is unbounded. We assume for
2
simplicity that channels have infinite buffers and that all messages m are unique (they can easily
be made so by including in m its source and a sequence number). The state of a channel is the
sequence of messages that have been sent along the channel but not received along the channel.
A process is defined by a set of states, one of which is denoted the initial state. The state of
a process i consists of the values of all internal variables of the process, plus the values of n + 1
additional boolean variables that are defined as follows:
o+ crash?. This variable is initially false and can become true at any time. Once aash? becomes
true, the state of i does not change further. (This models the failure of i.)
o+ Vj E P: failed?(j). This variable is initially false for all values of j, and becomes true when
detects the crash of process j. Once failed1 (5) becomes true, it remains true forever. Exactly
when failed?(j) becomes true with respect to when crash3 becomes true is discussed in this
paper.
A global state of the system is a set of process and channel states. An initial global state is the
global state in which each process state is an initial state and each channel state is the empty
sequence.
An event e is a function that maps global states to global states. An event e applied to a global
state ? yields a new global state ?` that diifers from ? in the local state of exactly one process
and the state of at most one channel incident on i. We say in this case that e is an event of?, and
that e changes the state of?.
ff an event e of process i changes the state of Ci,5 for some ii then we call e a send event. A send
event changes the state of a channel by appending a message m to the sequence of messages on
that channel. ff ? changes the state of C5,1 for some 5, then we call e a receive event. A receive event
changes the state of a channel by removing a message from the head of the sequence of messages
on that channel.
We define events, runs, and predicates formally in Appendix A.1. Informally, send, receive,
crash, and failure detection events are defined as follows:
o+ send?(j, m) denotes the event whereby process i sends the message m to process 5.
o+ recvj(5, m) denotes the event whereby process? receives the message m from process 5.
o+ crashj denotes the event whereby crash? becomes true.
o+ failed? (5) denotes the event whereby failed?(5) becomes true.
Definition I A run ofthe system is an infinite sequence ofglobal states ofthe system: T = (?o, ?i, ?2,...),
where?0 isan initial global state and thereexistsasequenceofevents (?o, ci,e2, .) such that for all? > 0,
3
Definition 2 Given any run r = (?o. ?1, ?2, .), the history of r, denoted li?, is the sequence ofevents
(e0, e1, e?,..?) such that for all? >0 ?i+1 =
Note that for any run r, 1tr is uniquely determined. Furthermore, r can be constructed from a
history Nr and the initial global state ?o.
Throughout this paper, we use the notation 1ir = .... ..?........). This denotes that TLr
is of the form (x; e?; ? e5; z; ek; w), where e?, e?, and e? are events, x, y, and z are finite sequences of
events, and w is an infinite sequence of events.
We specify properties of systems using predicate logic over global states and linear-time
temporal logic over (infinite) suffixes of runs [Pne77J. We define the boolean predicates SENDi(j, m)
and RECVj(j, rn) as follows.
o+ V?, j, m: SENDj(j, m) and RECVi(j, m) are false in an initial global state.
o+ send?( m)(?) ? SENDi(j, m). That is, SENU)j(i, m) becomes true when send?(j, m) has oc-
curred.
o+ recv?(j, m)(?) k RECVj(J, m). That is, RECVi(j, m) becomes true when rccv?(j, m) has oc-
curred.
Furthermore, both SENDj(j, m) and RECVj(j, m) are stable by definition: once such a predicate
becomes true in a run, it remains true for the remainder of the run. ((CL85])
We define the boolean predicates CRASHi and FAI?J?Di(j) as follows. Let ? be a global state.
o+ ? CRASHi ffandonlyifaash? istruein?.
o+ Vj: ? ? E?IU?Dj(j)ifandorlyiffailedj(j)istruein?
Note that both CRASHi and FAll??Dj(j) are stable by assumption: once these local variables become
true m the local state of i, they remain true thereafter.
Let a = (?o. ??. .2....) be a sulfix of a run, let y be a predicate, and let P be a temporal logic
formula.
o+ (a,k)??iff?kk?.
o+ (s,k)k?Piff?j>k: (s,j)k?
o+ (s,k) # ??iffVj > k: (a,)) ?
Furthermore, we abbreviate (r, 0) ? P as r ?
We define the failed-before relation as follows:
Definition 3 If r ? ?FA1LED5(?) in some run ?, we say that ? failed before) in r.
4
Note that it is possible that both CRASHi and CRASH5 hold in some global state yet neither i failed
before j nor j failed before ?.
We use a version of the happens before relation of [lam78]. Given two events ei and e2, define
e2 (read "el happens before e?") in some history `1ir if one of the three following conditions
holds:
1. el and e? are of the same process, and either el = e2 or el precedes e2 in ltr;
2. el = send?(j, m) for some value of ?,j, and m, and e? = recv5(i, rn);
3.			there exists an event e such that el			e and e
The happens-before relation as defined here is the same as that given in [Lam78], except that
our relation is reflexive. This is for notational convenience. Note that for all e1 ? e21 e1 e2
implies that el precedes e2 in 1tr The converse does not hold, however.
Let r be a run. Let r? be the sequence of states of i in r, with repeated states removed (i.e., so
that adjacent states are distinct). ff x and y are runs, then we say that run x is isomorphic to run y
with respect to process i, denoted  Ej Y, if and only if j = y?. In other words,  =? y if and only
if runs x and ? are indistinguishable to process ?. Similarly, r? for Q C P is the sequence of states
of processes i E Q in r with repeated states removed and x =? y if and only if XQ = y?. (See
(CM86] for a detailed discussion of the ramifications of indistinguishability of runs.)
3 Specification of Failure Models
A failure model describes the manner in which the components of a system can fail. For our
purposes, a failure model constrams how crash events and failed events can occur with respect to
each other. We give these constraints as a set of properties and define the failure model as the set
of runs that satisfy these properties.
3.1 The Fail-Stop Failure Model
The minimal set of fail-stop assumptions found in the literature is that in any infinite run of the
system, a process's failure is eventuaily detected by all processes that don't aash, and that there
are no false detections offailure. These two conditions specify the failure model defined in [Sch84i.
Hence, we adopt this as the definition of the fail-stop failure model.
Formally, the two fail-stop conditions are:
FSI: Vr,i: r ? ?(CRAsHi ? Vj: ?(CRASHj V FAILED5(i)))
F52: Vr,i,j: r ? E(FAILED5(?)?CRASHi)
We denote with FS the set of runs satisf?ng properties F51 and FS2.
5
`Theorem I In an asynchronous system in which crash failures are possible, properties FS1 and FS2 are
impossible to implement.
Prnof: In (C?T91], an algoritlim is given for solving Consensus with a Strong Failure Detector. A
Strong Failure Detector is shown to be strictly weaker than a Perfect Failure Detector, implying
that a Perfect Failure Detector can also be used to solve Consensus. A solution to Consensus
contradicts the result of [FIP851; therefore, a Perfect Failure Detector cannot be constructed.
A Perfect Failure Detector is defined in [CT91] as a failure detector satisfiiing Strong Com-
pleteness and Strong Accuracy. These two properties are identical to FSI and FS2. Therefore,
implementing FS is equivalent to implementing a Perfect Failure Detector, and is therefore im-
possible.
E]
3.2 IndistInguishable Failure Models
A process determines which event to execute based on its state and the messages that it has
received. A run r is isomorphic to a run r' with respect to a process if executes the same events
in both r and r'. We know that the two runs are isomorphic with respect to i if i starts in the same
initial state in both runs, receives the same messages in the same order in both runs, and makes
the same nondeterministic choices (if any) in both runs. Consider a run r of a system. ff r is not
in FS but is isomorphic with respect to i to a run r'in FS, then the events i executes are the same
as if it were "inning in a system satisfying the fail-stop assumptions. Hence if r =p r', then no
process in P can determine that r is not in FS.
Definition 4 Afailure model M is indistingui?hablefrom thefail-stop model iffor any run r E M, there
exists a run r' E FS such that r =p r' (that is, r is indistinguishable from r' to every process in P).
Consider the election protocol described in Section 1. If a run of this protocol is in a failure
model M that is indistinguishable from, but not identical to FS, then there may be more than one
leader in some global state, but no process will be able to determine this. Thus, internally the
execution is the same as if there were only one leader at a time.
Recall that the reason that FS can not be implemented in an asynchronous system is because
the crash of a process cannot be reliably detected. A failure model M that can be implemented
and is indistinguishable from FS must be weaker than FS. However, it cannot be too weak; at the
very least, a process i must not be able to determine that some process j executes an event after
i detects that j has crashed. Furthermore, if a process detects the failure of i then i must crash
at some point, and process crashes must have been able to occur in some total order. Hence, the
following three conditions are necessary for indistinguishability from FS.
Condition I For all runs r, if r ? ?FAWEDi(j), then r ? ?CRASHj.
6
Condition 2 Thefailed-before relation must beacyclic. That is,forall runs randfor all k, there cannot exist
prncessesx1,r2? . .,x?suchthatr k FAILEDxi(X2)AFAILEDr2(X3)A' . AFAILED?k?l(Xk)AFAWEDxk(X1)
Condition 3 For all runs r, there cannot be an event e ofprocess j such that failed?(j) e in tir.
`Theorem 2 Iffailure model M is indistinguishablefrom ES, then all runs of Al sah's? Conditions 1--H3.
Proof:
Condition 1 In order for two runs to be isomorphic, their histories must contain the same events,
For every run r that satisfies FS, falled?(j) E Jir ?crash5 E ?r' Therefore, the same must be
true of every run that satisfies M.
Condition 2 For contradiction, suppose that there is some run r of M such that r does not satisfy
Condition 2. We show that there is no run r' satisfying FS that is isomorphic to r with respect
toP.
ff r does not satisfy Condition 2, then there is some set of processes ?xo, xi,. . ., xkj such that
= (`..faiied?0(x1)'' .falled?1(x2)faiThdz???(Xk) ` faiied?(X0)???) Foranyrun
r' satisfying FS, ?r' must contain crash?? for all 0 < i < k. Furthermore, Uas?? must occur
before failed?te1 (xj) and faiThd?? (xiei) must occur before cras?? where 0 and ? are --H and +
modulo k + 1 respoctively. By transitivity? this leads to circular constraints on lir': crash?0
must occur before ,ailed??? (x0), which must occur before crashx?, which must occur before
,alled???1(Xk)i???? crash?1 must occur before ,alled? (x?), which must occur before crash?. It
is impossible to satisfy all of these ordering constraints in a valid run. Therefore, there is no
run r'isomorphic to r that satisfies FS.			0
Condition 3 For contradiction, suppose that there is some run r of M such that r does not satisfy
Condition 3. We will show that there is no run r' satisfying FS that is isomorphic to r with
respect to P.
If r does not satisfy Condition 3, then `lir = .... faiThd?(j).. Sendj(k,mk)''' recv??(?, m5).
e5 `?), where send?(k, m?) recv??(?, m?). For any r' isomorphic to r, 7ir1 must maintain
the order of ,alled?(j), Sefldj(k, m?), and recv?? (?, m5) in order to satisfy the happens-before
relation. However, for r' to satisfy FS, crash? must occur beforefailed?(j) in ?r" This means
that in 1*r', crash5 must occur before recv5 (?, m5), which contradicts the definition of crash5.
Therefore, there is no run r'isomorphic to r that satisfies FS. 0
We have shown that Conditions 1, 2, and 3 are necessary for a failure model to be indistin-
guishable from fail-stop. However, these conditions are not sufficient.
`Theorem 3 There exists a run r that satisfies Conditions 1--H3 such that ??r' : r' =p r A r' E FS.
7
Proof: Let r be the following run:
falledy(x);sendy(a?ma);reCVa(y?Tna);Cra5ha;f?iThd?(a)?5?fldb(X??x)???CVx(b??flx)?C??5hx??
For any r' isomorphic to r, we have the following ordering constraints on TLr':
o+ falled?(x)			send?(a, ma)			recva(y, ma)			crasha
o+ failed6(a)			send?(x, m?)			recv?(b, m?)			crash?
o+ crashx must occur before falled?(x)
o+ crasha must occur before failed6(a)
It is impossible to satisfy all of these ordering constraints in a valid run. Therefore, there is no
run r' isomorphic to r that satisfies FS.			0
Theorem 3 implies that a failure model M that satisfies Conditions 1-3 may not be indistin-
guishable from FS. In the next seetion, we give a set of conditions that are sufficient, though not
all are necessary.
3? Simulated Fail-Stop
We give four properties that comprise a model that is indistinguishable from fail-stop. We call
this model the simulated fail-stop model (sF5).
To construct conditions for the sF5 model, we weaken one of the conditions of the fail-stop
model. Weakening F51 yields a model in which some failures may be undetected. Under such a
model, it could be impossible for a system to make progress. Therefore, we follow [Cm1,CHm2,
RB9l] and weaken F52. This yields a model in which nonexistent failures may be detected.
FSi is a liveness property? In a real system, it would be be implemented using timeouts: each
process would periodically send a message to every other process. If process i were not to receive
a message from process j within some predetermined length of time, then i would (perhaps
erroneously) detect the failure of j. We assume for the remainder of this paper that there is some
mechanism provided by the underlying system to implement FSI.
We replace F52 with the following four conditions:
sFS2a:Vr,i,j: r ? 0(FAILEDj(j)? ?cRAsH5)
This condition states that if process i detects that process j has crashed, then eventually j will crash
even if i's detection was erroneous. In conjunction with F51, this condition implies Condition 1:
if faiThd??(j) occurs in TLr, then crash3 occurs in Jir.
sFS2b : The failed-before relation is always acyclic.
8
sFSl:
sFS2a:
sFS2b:
sFS2c:
sFS2d:
FSl
r k ?(FAILEDi(5) ? ?cRAsH5)
The failed-before relation is acyclic.
r ?
r k ?[FAILETh-(j) A ?ENDj(k, m) ?
E((SENDi(k, m) A RECVk(i, m)) ? FAILEDk(j))]
Figure 1: Simulated Fail-Stop Conditions
This is Condition 2.
sFS2c:Vr,?: r k fl?FAILEDi(i)
This condition states that a process never detects its own failure. That is, failed?(i) does not occur
inltr.
sFS2d : Vr,i,j,k: r k C[FAILEDj(j) A ?SENDj(k,m) ?
E((SENDi(k, m) A RECVk(i, m)) ? FAIL?Dk(j))i
This condition states that once detects the failure ofj, then any subsequent messages sent by
to any process k will not be received until k has also detected the failure of j. That is, if sendj(k, m)
occurs after failed?(j) in +tr, thenfailed?(j) occurs before reco?(i, m) in Jir.
Properties sFS2c and sFS2d together imply Condition 3, as shown in the following lemma.
Lemma 4 If sFS2c and sFS2d hold in a run r, then there cannot be an event e of process j such that
failed?(j) e in ?ir.
Proof: Consider any run r. ff i = ii then the lemma is trivially true, because from sFS2c, f'tiThdj(?)
does not appear in 1jr Assume that i # j. For contradiction, let e be an event of j such that
failed??(j) e in ltr. Since falled?(j) and e are of different processes, from the definition of the
happen?beforerelationthereisa sequenceofeventsfalled?(j) send?(k1, m?1) recok?(i, m?1)
sendk1 (k2, m&,) ?...? recV?j(kt, mkt+i) e. From sFS2d, each process in this chain, induding j,
must have detected the failure of j by the time it receives its message. Therefore, ffiiled??(j) must
occur in TLr, which contradicts sFS2c.			E
The sFS conditions are summarized in Figure ?.
Theorem 5 The simulated fail-stop model (sFS) is indistinguishable from the fail-stop model (FS).
9
The full proof of this theorem is given in Appendix A.2. An outline of the proof is given below.
Consider a run r that satisfies USi and sFS2a-d but violates FS2. Then, there exists at least one
pair of processes i and j such that r k ?(FAiLEQ(i) A ?CRASHi). For each such pair, by sFS2a,
r k ?CRAsHj. Therefore, ?r = (...failed5(i)... crashj...). It can be shown that an event e can
be moved withm ?ir, resulting in `lir' such that r' p r, as long as the happen?before relation is
maintained in `lir'. We show in Appendix A.2 that ?fwled5(i) crashj), and that therefore, crashj
and all events e between failed5(i) and crashj in +tr such that e CraShi can be moved to precede
failed5(i) in ?ir'. Thus, if r satisfies sFS2a-d, then the events in `lir can be rearranged so that crashj
precedesfailed5(i) for all i,j in ltr'.
4 Lower Bounds
The simulated fail-stop properties (FS1, sFS2a-d) put restrictions on the way in which failures are
detected. Implementing these properties requires that processes follow a protocol for detecting
failures. In this section, we give lower bounds on message complexity and replication for failure
detection protocols implementing sF5.
A one-round protocol for detecting a failure is one in which each process i exchanges one round of
messages with other processes before executingfailed?(j). Any protocol simpler than a one-round
protocol would allow at least one process to unilaterally detect the failure of some other process.
Such a protocol, however, would limit which processes another process could detect as faulty
For example, suppose that process i can unilaterally decide that process j has failed. Process
can executefailed?(j) concurrently with any event of process j, and so process j can never execute
,alled5(i). Hence, we will consider the class of on?round protocols in order to determine message
and replication complexity
We say that a process i initiates a failure detection protocol when it "suspects" the failure
of another process j (e.g., due to a timeout at a lower level). In the first half of the round,
process i sends a message to all other processes; in the second hall of the round, processes send
an acknowledgement message to?. We call the first message SUSPi,5 and the acknowledgement
message ACK.SUSPj,j. Upon completion of the failure detection protocol, i will execute either crashj
orfalled??(j) for some j $ i.
A oneround protocol that implements sF5 must avoid cycles in the failed-before relation since
all runs in sFS satisfy sFS2b. Implementing sFS2b requires that in any run there is at least one
process that partidpates in all failure detections. To see why this is so, consider the problem of
avoiding cycles involving exactly two processes. Suppose that process a suspects the failure of
process b. Before a can executefaileda(b), the failure detection protocol must ensure thatfailed6(a)
has not been executed and thatfalled?(a) will not be executed in the future.
The failure detection protocol cannot require a to communicate with b directly, because b may
have indeed crashed. Therefore, the protocol must require a to receive information from, and
distribute information to, other processes. In particular, a must receive information from enough
other processes to be sure thatjailed6(a) has not been executed, and a must distribute information
to enough other processes to be sure that iffaileda(b) is executed, thenfalled?(a) will not be executed
in the future.
The relevant information that a must disseminate is that a suspects the failure of b. In order
for a to know that this information has been received by other processes, it must receive messages
from other processes acknowledging that the failure of b is suspected.
Definition 5 The quorum set Qij of failed?(j) is the set ofprocesses from which has received acknowl-
edgement messages relating to its suspicion of I's crash. Formally, Qij = E P : SENDi (k. SUSPi,j) A
RECVj(k, ACKSUSPj,j)1.
The set Qa6 must be large enough to ensure that b, after hearing from Qba, will not execute
falled?(a). In particular, the sets Qab and Qba must have a non-null intersection.
We call this property the Witness Property (W), because the quorum sets for any two failure
detections must have at least one process (the witness) in common. It can be shown that the same
property must hold in order to avoid cycles of any size. The Witness Property can be stated
formally as follows:
n			Qij#?
Vi,? FAILEDi(j)
That is, there is some process w that is in the quorum set of all failure detections. Note
that this is a stronger condition than what is necessaryr, for example, in the update of replicated
variables [Gif791 in which only each pair of quorum sets must intersect.
Theorem6 (Vr: r k ?sFSTh) ? (Vr: r ?
It was argued above that (r ? E]sFS2b) ? ? if only cydes of size two are possible.
The full proof of the theorem is given in Appendix A.3.
Since sFS2b (Condition 2) is necessary for indistinguishability from FS (see Section 3.2), Th?
orem 6 implies that W is necessary for any oneround protocol that implements a failure model
indistinguishable from FS. Let t be the maximum number of crashes in any run, induding those
that arise from erroneous suspicions. The necessity of the Witness Property places a constraint on
as a function of n and on the number of messages that a process must wait for before detecting
afailure.
The simplest way to ensure that W holds in a onewund protocol is to require a process to wait
for responses from every other process, except for those that are suspected to have failed, before
detecting a failure. If there is always at least one process that never fails, nor is suspected of failing,
then this process will be a witness to every failure detection that is executed. This implementation
only requires that t < n. However, if n is large and 1 is small, then each failure detection requires
a process to wait for many messages, which in practice could take a long time.
An alternative implementation is to require a process to wait for a fixed, predetermined number
of responses before detecting a failure. This approach reduces the size of the quorum for which a
process must wait, but it places a stronger restriction on the number of failures that can occur.
`Iheorem 7 Ifthe size ofthe quorum set is a fixed and equal sizefor eachfailure detection, then to guarantee
that r ? OW when t failures are possible, the size of each quorum set must be strictly greater than n( ?1)
Proof: We assume that in any run, no more than t failures will occur. Therefore, the largest possible
cyde in a run satisfying (simulated) fail-stop involves t processes. We must guarantee that any t
quorum sets ..... Qt have a nonempty intersection.
Let the size of a quorum be x. Let y = n --H x. Suppose y = Then there is a set
oft quorum sets such that ViE P : : i ? Q?. In particular, let Qi = P --H
Q2 = P--Hfy+?,y+2,...,2y),..,Qt = P--H?n--Hy+1,n--Hy+2,...,n?. Byconstruction,
each process is not a member of at least one quorum. Therefore, fl Q? = ?. Clearly, such a set of
i=i
quorum sets can also be constructed if y> [??t? Therefore, we must have y < [??t1
x = n --H y ?			x > n --H
nt --H n
?			i
? ? > [fl(t--Hl)j
Therefore, the size of a quorum must be an integer strictly greater than n( Hi).
0
Corollary 8 If the minimum quorum size is used in a one-round protocol forfailure detection, then it must
be the case that n > t2
Prnof: In a oneround protocol, the size of the quorum is equal to the number of ACK.SUSPi,j
messages that process must receive before executingfaiThd?(j). Since i is in its own quorum,
must wait for [n(?i)j messages before detectingj's failure. In order for the one-round protocol
to make progress, at least this many other processes must remain alive. Therefore, we have
n?t>[n(t?l)i ? n?1>[n?%>fl?t?tJ
t			--H			t
?			_
? t2<t[?ji<fl
? t2 < n
12
0
5 Upper Bounds
We give a simple one-round protocol that implements sFS2a-d. We assume that a failure can be
suspeeted spontaneously (e.g., due to a timeout), but that no more than failures are suspected in
any run. In this protocol, SUSPi,5 = ACK.SUSPj,j = "i failed".
o+ When process i suspects the failure of process j, sends the message "i failed" to all processes
(including itself). Process waits for messages of the form "i failed" from other processes
and takes no other action except for acknowledging "x failed" messages until it completes
the protocol or crashes.
o+ When process has received messages of the form "I failed" from more than n( tT?) processes
(including itself), i executes ffiiied?(I).
o+ When process x receives a message of the form "r failed", x executes crnsh?.
o+ When process r receives a message of the form "y failed", x suspects the failure of y.
We will argue informally that this protocol implements the simulated fail-stop properties.
sFS2a: Process i cannot executefailed?(j) without sending a message of the form "j failed" to ail
other processes, induding j. Since channels are nonfaulty? j will eventually receive such a
message, upon which j will crash.
sFS2b: The full proof is given in Appendix A.4. We give an outline of the proof for cydes of length
2. Suppose that the protocol generates a run r such that r k ?(FAwEDi(J) A FAILEDj(i)). By
Theorem 7, r k ? W holds. Therefore, there is some witness w such that i received "i failed"
from w and j received "i failed" from w. Process w sends these messages to all processes. ff
w sends "i failed" before it sends "i failed", then process j will receive "i failed" and crash
before it can executefaiThd??(?). Similarly, if w sends "i failed" before it sends "i failed", then
process i will receive "i failed" and crash before it can execute fai1ed?(j). Therefore, it is not
possible for bothfai1ed?(j) andfaiThd??(i) to be executed in a run.
sFS2c: Process i cannot execute faiied?(?) without receiving at least one message of the form
failed". Upon receiving such a message, i crashes. Therefore, fai1ed? (?) is never executed.
sFS2d: Since channels are FIFO, any message m sent by i to k afterfr?iThd?(j) is executed must be
received after the message 1,j failed". Upon receiving "I failed" from ?, process k suspects
the failure of j and initiates the failure detection protocol. Process k does not receive m
until either crash? or failed?(j) is executed. Therefore, message Tn is not received by k unless
falled?(I) has been executed.
6 Discussion
In Section 3.2, we showed that Conditions 1, 2, and 3 are necessary for any failure model to be
indistinguishable from the fail-stop model. In Section 4, we showed that the Witness Property
is necessary for any oneround protocol implementing Condition 2. We then showed that the
Witness Property imposes lower bounds on the number of messages that must be received before
a failure can be detected and on the number of failures that can be tolerated in a system.
We gave a protocol in Section 5 to demonstrate that these bounds are tight. This protocol,
however, was derived from conditions that are not necessary for indistinguishability. There may
be a failure model weaker than sFS that is indistinguishable from FS. However, such a failure
model is subject to the same bounds on t as sFS, and so we do not expect such a failure model to
be substantially more interesting than sFS.
The bounds on t arise from sFS2b. A failure model satisfying only the other sFS assumptions
would not require a process to wait for any messages before detecting a failure: the other sFS
properties can be implemented simply by having process broadcast a message "i failed" after
suspecting i's failure and before unilaterally executingJailed?(i). Such a failure model would, of
course, be distinguishable from FS, but if a collection of processes are insensitive to cyclic failures,
then they could be run in this cheaper simulated failure model. We do not know of any protocols
in the literature that are insensitive to cyclic failure detection, however.
As an example of sensitivity to sFS2b, consider the problem of deterinirung the last process to
fail ((Ske85]). Solving this problem requires that processes record information about the failures
that they detect (that is, their view of the failed-before relation). Then, when processes are
recovering after a total failure, the recovering processes can determine when the last processes to
fail have recovered. ff cyclic failure detection is possible, then the problem is not solvable. For
example, suppose P = ?1, 2), process ? falsely detects 2's failure, and then crashes. Process 2
detects ?`s failure, proceeds with its work, and finally crashes. if process 1 were to then recover, it
would condude that it was the last to fail. In general, if cyclic detection is possible then the only
possible recovery is to always wait for ail crashed processes to recover.
There are other protocols that require failure models even stronger than FS. For example, if
the failed-before relation is transitive as well as acyclic, then detecting the last process to fail can
be implemented so that as soon as the last processes to fail have recovered, then the processes can
determine this. if the failed-before relation is not transitive, then it is necessary to wait for more
processes to recover. The failed-before relation of sFS is not transitive. We are currently looking
into several stronger versions of fail-stop, whether they are implementable given fail-stop, and
into how they too can be simulated.
The protocols described in this paper are very simple and are easily implementable. Failure
detection such as described here is typically done as part of a group membership service (e.g.,
[RB9i,MPS91,ADKM92]). We believe that the protocols here could be used as the basis of a failure
14
detector that could be used outside of a system built using a grou?membership protocol. This
would allow for consistent failure detection on top of any kind of lower4evel communication,
including point4?point communication.
References
[ADKM92] Yair Amir, Danny Dolev, Shiomo Kramer, and Dalia Maiki. Transis: A communica-
tion sub-system for high availability. In Proceedings of the 22nd Annual International
Symposium on Fault-Tolerant Computing (FTCS), pages 7684, July 1992.
[BJ87]
Kenneth Birman and Thomas A. Joseph. Exploiting virtual synchrony in distributed
systems. In Proceedings of the Eleventh Annual ACM Symposium on Operating System
Principles, pages 1?138. ACM, 1987.
[CHT921 Tushar Deepak Chandra, Vassos Hacizilacos, and Sam Toueg. The weakest failure
detector for solving consensus. In Proceedings of the Eleventh Annual ACM Symposium
on Principles of Distributed Computing. ACM, August 1992.
[CL851 K. Mani Chandy and Leslie Lamport. Distributed snapshots: detennining global
states of distributed systems. ACM Transactions on Computer Systems, 3(1):6?75,
February 1985.
[CM86] K. M. Chandy and Jayadev Misra. How processes learn. Distributed Computing,
1(1):42--H5O, 1986.
[Cml] Tushar Deepak Chandra and Sam Toueg. Unreliable failure detectors for asyn-
chronous systems. Technical Report TR91-1225, Department of Computer Science,
Cornell Universityr, August 1991.
[FLP85]
[Gif79]
[Lam78]
Michael J. Frischer, Nancy A. Lynch, and Michael 5. Paterson. Impossibility of dis-
tributed consensus with one faulty process. Journal of the ACM, 32(2):37?382, April
1985.
David K. Gifford. Weighted voting for replicated data. In Proceedings of the Symposium
on Operating Systems Principles, pages 15?162. ACM SIGOPS, December 1979.
Leslie Lamport. ?Fime, clocks, and the ordering of events in a distributed system.
Communications of the ACM, 21(7):558?565, July 1978.
[MP591j Shivakant Mishia, Larry L. Peterson, and Richard D. Schlichting. A membership
protocol based on partial order. In Proceedings of the International Working Conference
on Dependable Computing for Critical Applications, February 1991.
15
[Pne77]
[RB9l]
[Sch84j
A. Pneuli. The temporal logic of programs. In Proceedings of the 18th Annual Symposium
ofFoundations of Cornputer Science. ACM, November 1977.
Aleta Ricciardi and Kenneth Birman. Using process groups to implement failure
detection in asynchronous environments. In Proceedings of the Thnth Annual ACM
Symposium on Principles of Distributed Computing. ACM, August 1991.
Fred B. Schneider. Byzantine generals in action: Implementing fail-stop processors.
ACM Transactions on Computer Systems, 2(2):14?154, May 1984.
[Ske85] Dale Skeen. Determining the last process to fail. ACM Transactions on Computer
Systems, 3(1):1530, February 1985.
16
A Appendices
A.1 Formal Definition of Events, Runs, and Special Fredicates
Recall that an event e is a function that maps global states to global states. An event e applied to a
global state ? either
o+ yields ?, in which case we say that e is a null event; or
o+ yields a new global state ?` that differs from ? in the local state of exactly one process i and
the state of at most one channel inddent on i. We say in this case that e is an event of i, and
that e changes the state of i.
A non-null event e is uniquely defined by the process i whose state it changes, the state s of
immediately before e is applied, the state s' of resulting from e, the states of the channels inddent
on i before e is applied, and the states of the channels inddent on i after e is applied. Let ?i,5 be
the state of channel Ci,5. Let Xj,* be the n-tuple (Xi,i, Xj,2,.. ., Xj,n) and let 6?*,j be the n-tuple
(X1,i, 1?2,i, . . ., X?,?).'Ihen, e is defined by the 7-tuple (i, s, s', r?i,*, ??,j, 614,j?? such that:
o+ if Wj,? $ X'j,* (e is a send event), then X*,i = X'*,j, there exists exactly one j $ i such that
Xi,j $ X'j,j? and X'j,j = (Xi,j :: m) for some message m (where:: is the catenation operator).
o+ if X*,i $ X'*,i (e is a receive event), then Xi,* = X'i,*j there exists exactly one j $ such that
X5,j $ X?,i, and (m:: X,',i) = X?,i for some message m.
If e is a null event, then e is not of any process i and therefore is not represented by a 7-tuple.
Definition 6 We say that (non-null) e = (i, s, s', X?,?, X'i,*, X*.,i, X',i') can occur in global state ? ifand
only if:
o+ the state cf process i in ? is s,
o+ the states of the incoming channels incident on i in ? are Xi.*, and
o+ the states of the outgoing channels incident on i in ? are X*i.
A null event can occur in any state.
Lete = (i,s, s', X?,*, X'i,*, X*,i, `?`*,i) We abbreviate send and receive events as follows.
o+ If e is a send event of i and C?',5 = (Ci,5 :: m) for some j, then e is denoted send?(j, m).
o+ If e is a receive event of ? and (m:: C?,i) = C5,i for some j, then e is denoted recv?(j, m).
We define `,aash" events and "failure detection" events as follows:
17
o+ ff aash? is false in s and true in s', then e is denoted crashj. By assumption, crashj changes
only the local variable aash?.
o+ If ?: failed?(j) is false in s and true in s', then e is denotedfalled?(j).
The events send?(j, m), recv?(j, m), crashj, andfalled?(j) are atomic; each event only changes the
relevant state variables of the process on which it occurs. For example, if aash? is false in local
state s of i when send?(j, m) occurs, then crash? is false in the resulting state of?.
Definition 7 Let r = (?o,?1,?2,...) be an infinite sequence of global states of the system. We say that
of the system if and only if ?o is an initial global state and there exists a sequence of events
o+ .) such that for all? > 0 e? can occur in ?? and ?i+1 =
r is a run
(eo, e?, e2,			_
o+ V?o+j, m: SENDi(j, m) and RECVi(j, m) are false in an initial global state.
o+ Let e = send?(j, m) and let ? be a global state such that e can occur in ?. Then send?(j, m)( ?) k
SENDi(j, m). That is, SENDi(j, m) becomes true when sendj(j, m) has occurred
o+ Let e = recv?(j, m) and let ? be a global state such that e can occur in ?. Then recv?(j, m)(?) ?
RECVi(j, m). That is, RECVi(j, m) becomes true when recoi(j, m) has occurred.
A.2 Proof of Theorem 5
Ibeorem 5 The simulated fail-stop model (sFS) is indistinguishablefrom the fail-stop model (FS).
In order to prove that for any run r that satisfies FS1 and sFS2a-d, there is an isomorphic run
r' that satisfies FSl and FS2, we wffl need to determine the conditions under which an event m a
history ?r can be moved to yield a history lir' such that r =p
Consider 1ir = (...ei,ei+i,ei+2...) corresponding to run r = (...,?,?+1,?+2,...)o+ By
definition, e? can occur m ?? and e?+i can occur in e?(??) = ?i+1 Assume that e? and e?+i are
non-null events.
Suppose that e? and e?+1 are of the same process k. Since e? changes the state of k, the state of
k is not the same in ?? as in ?i+1 Therefore, e?+1 cannot occur in ??.
Now suppose that e? and e?+1 are of two different processes k and e, respectively. The state of
e in ?? is the same as that in ?i+1, because e? does not change the state of e. Therefore, if e?+1 is not
a receive event, then e?+1 can occur in ??. If e?+1 is a receive event, and changes the state of any
incoming channel other than Ck,, then e?+1 can occur in ??, because the states of all other incoming
channels must be the same in ?? and ??+?. However, if e?+1 = recv(k, m) and e? = send?(?, m),
then Cj+1 cannot occur in V?j, because the message m is not part of Wk,? in ?j.
In summary? ?i+1 cannot occur in Vj if and only if
o+ e? and e?+i are of the same process, or
18
= send?(i, m) and e1+i = recv??(k, m).
In other words, e?+i cannot occur in ?? if and only if (e? e?+1).
Assume that e?+i can occur in ??, and let V?j+1 = ?i+i(?i) It can be shown by a similar
argument that e?+1 cannot change the state of k, ??k.?, or X?,k in such a way as to violate the
preconditions for e?, so e? can always occur in ??+?. Furthermore, e?(e?+?(??)) =
Therefore, r' = (???j,?'j+1,?j+2,??)isavalidru???h???TLr? =
Consider r?k,ey and r'?k??. (Recall that repeated states are removed in these sequences.) From
the construction of r', r? = r'k and r? = r?. Since e? and e?+1 do not change the states of processes
other thank and e, r? = r; for all process t ? ?k, ?. Therefore, r Ep
In summaryr, we have shown that if ?(e? e?+i) in 1iri then e?+1 can be moved before e? to
yield 1ir' such that r' Ep r. It can also be shown that for any two events e? and e3 in 1ir such
that i < j and ?(e? e?), e3 can occur in ??, e? can occur in e?(??), and e5(e?(??)) =
Therefore,e? can be moved to directly before e? to yield TLr' such that r Ep
We can now prove the theorem.
Proof: II run r satisfies FS2 then the theorem trivially holds, so we assume that r violates FS2.
Then, there exists at least one pair of processes i and j such that r ?(FA!?Dj (i) A ?CRASHi). For
each such pair, by sFS2a, r ? ?CRASHj. Therefore, lir is of the form ... .plled5(i)... crashj...).
Definition 8 A pair of processes (i,j) is bad in Jir if TLr = (...failed?(?)...crash????). Otherwise,
(i,j) is good in 1ir.
We prove the theorem by induction on the number of bad process pairs in lir
Base case Assume that there is only one bad pair in TLr. Let TLr = (x;failed?(i); y; uashj; z) where
x, y, and z are sequences of events. Let k be the number of events in y. We construct by induction
on k a run r'isomorphic to r such that TLr' = (x'; crash?;faiThd? ( i); y'; z) where x' andy' are sequences
of events.
Base Case (Inner Induction) Assume k = 0 Hr = (x;failed?. (i); crashj; z). Since
crashj and failed?(j) are of different processes, they can be swapped to yield TLr' =
(x; crashj;falled?(i); z) such that r' Ep r. Clearly, r' satisfies FS2.
Induction case (Inner Induction) Assume that the theorem holds for all histories
in which k < e --H 1, and assume that k = e. Jir = (x; falled?(?); el; e2; .?; eL; crashj;
z). By Leruma 4 we know that ??(faiThd?(i) --H crashj). Let e? be the first event of
(ei;? .;crash?) such that ?(Jailed)(i) e?). Since e? is the first such event and is
transitive Vx 1 < x < u: ?(e e?). Let Q c P be the set of processes such that
faiThd? (t), ..... . , e??1 are events of processes in Q. Then e? is an event of a process in Q.
Therefore, there is a history 1ir" = (x;e?;failed5(?);e1;e2; ;e???i;e?+i; . .;ej;crash?)
such that r" Ep r. By the induction hypothesis there is a history 1ir' of the desired
19
form such that r' =p r", and hence r' Ep r.
Inner hiduction
Induction case Assume that there are k bad pairs in 1ir, one of which is (x, y). We wffl show
that we can use the same inductive construction presented in the Base Case to yield a history ?r,
such that r' Ep r, with strictly fewer bad pairs,5o that the Inductive Hypothesis applies to lir'
Overview: Given a bad pair (x, y), consider another pair of processes (a, b). Using a case
analysis on all possible placements of failed6(a) and crasha in ?r with respect to faiThd?(x) and
cras&, we show that using the earlier inductive construction, we can "fix" (r, y) --H i.e., construct
a history `lir' in which (x, ?) is good --H such that:
o+ if (a, b) is bad in Jir, then (a, b) is either good or bad in ?r';
o+ if (a,b) is good in lir, then (a,b) is either still good in lir', or is bad in Jir' but can be
fixed without maldng (x, y) bad again by using a finite number of applications of the same
inductive construction.
There are twelve possible placements of faiied6(a) and crasha with respect to failed?(x) and
crash?. In each case, we consider the effect on (a, b) of applying the inductive construction to
1.			...			crasha ...			failed6(a)			...			falled?(x) ...			crashx
2.			...			failed6(a)			...			crasha			...			falled,,(x) ...			cras?
3.			...			faiThd,,(x)			...			crashr			...			crasha ...			faiied6(a)
4.			...			faiThd?(x)			...			crash?			...			faiied6(a) ...			crasha
5.			...			failed6(a)			... failed? (x) ... crash? ...			crasha			...
6.			...			crasha ...			...			crash? ...			failed6(a)			...
`alled?(x)
Since only events that occur between faiied?(x) and crash? are moved, (a, b) is independent
of (r, y) in these six cases, in that fixing (x, y) has no effect on the goodness of (a, b). Thus,
(x, y) becomes good and (a, b) is unchanged.
... crasha ... crash? ...
7.			... faiied6(a)			`alled?(x)
In this case, the history `flr' resulting from an application of the construction of the base case
has one of two forms, depending on whether or notfaiThd?(x) crasha:
o+ lir' = (...failed6(a)... crasha?? crash?;falled?(x)...)
20
8.
o+ = (...falled6(a)...crash?;falled?(x).? crasha ..)
In either case, (x, y) is now good and (a, b) remains bad.
falled? (2:) o+.			crasha ... crash? ... Jalled?(a) ...
In this case, the history Nr' resulting from an application of the construction of the base case
has one of two forms:
o+ = (...crasha...Jcrashx;ailedy(x) . .failed6(a)...)
o+ = .... crash?;failed?(x)?. crasha? .faiied6(a)...)
In either case, (x, y) is now good and (a, b) remains good.
9.			... crasha ...			... failed6(a) ... crash?
falled?(x)
In this case, the history ltr' resulting from an application of the construction of the base case
has one of two forms:
10.
o+ = .... crasha...faiied6(a).. crash?;falled?(x)...)
o+ = .... crasha?? crash?;faiThd?(x). .failed6(a)...)
In either case, (x, y) is now good and (a, b) remains good.
... faiied6(a) o+. crash? ... crasha
failed?(x)
Inthis case, the history ?r' resulting from an application of the construction of the base case
has one of two forms:
o+ = (...falled6(a)... crash?;faiThd?(x).. crasha..?)
o+ = .... crash?;falled?(x)...faiThd6(a).. crasha ..)
In either case, (x, y) is now good and (a, b) remains bad.
... failed6(a) ... crasha ... crash?
11.			.			faiThd?(x)
In this case, the history ?r' resulting from an application of the construction of the base case
has one of four forms:
o+ = (...failed6(a)...crasha.. crash?;faiThd?(x)...)
o+ = (. ..faimd6(a)... crash;failed?(x).. crasha--?)
o+ = (. ..crash?;failed?(x) . .faiied6(a) crasha
o+ = .... crasha?? crash?;faiThd?(x). .failed6(a)...)
In the first three cases, (x, y) is now good and (a, b) remains bad; in the fourth case, (x, y) is
now good and (a, b) is now good, thus reducing the number of bad pairs by two.
21
12.
falled?(x) ... crasha ... failed6(a).. crnsh?
In this case, the history TLr' resulting from an application of the construction of the base case
has one of four forms:
o+ = .... crash?;falled?(x). ?crasha? .failed6(a)...)
o+ ltr' = .... crasha .?faiThd?(a)???cras&;falleda(x)???)
o+ = (. crasha crash?fai1ed?(x). falled?(a)???)
o+ `lit' = (.``faiThd6(a)..' uash?;fai1ed?(x)'' `crasha''')
In the first three cases, (x, y) is now good and (a, b) remains good. However, in the fourth
case, (x, y) is now good, but (a, b) is now bad. Thus, the number of bad pairs may not be
reduced. Furthermore, for each pair (?,j) such thatft?lled?(i) and crashj appear in lit in the
`same order with respect tofalled?(x) and cras? asfailed6(a) and crasha, there can be one more
badpairinli?' thanthereisinli?.
However, we can construct a history lit" from lit' in the same manner in which lit' was
constructed from lit, such that (a, b) is good in litu and (x, ?) remains good in lit? as follows.
We have lit' = ... .failed6(a)." crash?;falled?(x).'' crasha'''). Recall that in the construc-
tion of 1t?' from lit, an event e between crash? and falled,,(x) was moved if and only if
?fai1ed?(x) e). Therefore, sincefailed6(a) was moved in the construction of lit' and crasha
was not, it must be the case that in both lit and lit'
?falled?(x)			falled6(a)) A (failed?(x)			crasha)			(1)
As shown in the case analysis, there are four possible results of applying the inductive
construction to lit'. Either of the first three possibilities yields a history `lit" in which (a, b)
is good and (t, y) remains good. We dairn that the fourth possibility cannot occur.
Proof: Suppose, for contradiction, that lit" = (...failed?(x) ... crasha;failed6(a). crash?
Then by the earlier argument it must be the case that in lit' and 1L?'1
?failed?(a) Jailed?(x)) A (Ja?led6(a) crash?)
(2)
falled?(x) crasha) in lit' irnplies thatfaileda(x) occurs in lit' by sFS2d and the definition
of happensbefore. Sirnilarly, (faiied6(a) crash?) implies that Jalld?(a) occurs in lit'.
Thus, Equations 1 and 2 imply that in lit' bothfaileda(x) andfailed(a) occur in lit', which
contradicts sFS2b. Therefore, lit" cannot have the assumed form, so both (a, b) and (x, y)
must be good in lit".
Thus, if fixing (x, y) in lit results in t new pairs (aj, bi) that are bad in `lit', then we can fix
all of these pairs in t applications of the inductive construction. (Note that the t bad pairs
22
do not interfere with each other: since all of them are bad, they all fall under one of the first
11 cases. Therefore, fixing one pair (a?, bi) either fixes another pair (a5, b5) or does not affect
(a5, b5).)
Thus, the number of bad pairs in ?r can be reduced by at least one in some finite number
of applications of the inductive construction given in the base case. Furthermore, this number is
bounded by n.
Therefore, we can construct a history lt?? with fewer than k bad pairs such that r' =p r. From
the Induction Hypothesis, there is a run r" that satisfies FS2 such that r' p r11; therefore, r =p
5
A? Proof of Theorem 6
Theorem6(Vr: r k osFS2b) ? (Vr: r k OW).
We will show that (?r: r k ??W) ? (?r: r k ??sFS2b). To do this, we first assume that
W does not hold in some state of r, i.e., that it is possible for k failures to be detected such that
the quorum sets for those detections have an empty mtersection. We then show that using this
assumption, a run can be constructed in which there is a k?cyde in the failed-before relation.
WedividethenprocessesinPintoksetsS0,...,S?isuchthatfor0<?<k--H1 iES?;thatis,
processes 0 through k --H1 are in sets S0 through Sk?1, and the rest of the processes are distributed
among S0 through 5k-1.
Consider the following scenario. For all i :0 < i < --H 1):
?. Process i suspects the failure of process i e 1, and sends the message SUSiN,i?i to all processes
in P. The messages sent to the processes in set Sjei are delayed indefinitely.
2. As a result of Step 1, process i receives a message SUSP50i,5 from process j e 1 for all
j 96 i, 0 < j < k --H 1, where e is subtraction modulo k. Thus, process i does not learn that
another process has suspected it of having aashed.
3.
Before receiving SUSP??ei,5, process i suspects the failure of process j, and sends SUSPi,5 to
all processes in P. The messages sent to the processes in set Siei are delayed behind the
previous messages (recall that interprocess channels are FIFO). Process also acknowledges
any SUSP messages with ACK.SUSP messages.
4. Process i has now received ACKSUSPk,i?1 messages from all processes k in ? S5.
5#iOl
letQj,j? = ? 55 foralli :0< <k--H1. NoprocessinS5 isinQ54?1;inotherwords,forevery
5#iOl
k--H1
process in P, there is some quorum set of which i is not a member. Therefore, fl Qi,iei =
t=O
23
Furthermore, by definition of Qij being a quorum7 every process i has received enough ACK.SUSP
messages to executefailed?(i ? 1). We havefailed0( 1),... ?)alled(k?2) (k --H1), andfalled???1? (0), which
causes a k-cycle in the failed-before relation.			5
AA ProofthattheProtocolofSection5ImplementssFS2b
Lemma9 Given theprotocol ofsection 57 then [rk?S=?1,2,...,k1: (FAtLED1(2)AFAILED2(3)A A
FAIL?Dk?l(k))] ? [3q: (sendq(S, "kfailed") sendq(S, "k --H ifailed") . o+? sefldq(S, "2failed"))
inltr].
Proof: We use the notation SENDj(S, m) as shorthand for (Vp E 5: SENDi(p, m)).
The size of the quora are suffident to ensure W, by Theorem 7. By W, r ? ?q: Vi, j E 5:
FAff?Di(j) ? REcv?(q, "i failed") ? SENDq(5, "i failed"). We prove the lemma by induction on k.
Base case For k = 2, the proof is triviaL let k = 3. 5 = ?1,Z3b r k FAIU?D1(2) A
FAiL?D2(3), and r k SENDq(5, "2 failed") A SENDq(5, "3 failed"). Assume for contradiction that
sendq(S, "2 failed") sendq(S, "3 failed")in?ir. Then,becausechanneIsareFIR),recv?(q, "2 failed")
recv2(q, "3 failed") in lir By the protocol, crash2 failed2(3) in ?r, sor k ?FA!LED2(3). `Therefore,
it must be the case that sendq(S, "3 failed") sendq(S, "2 failed").
Inductioncase Assume that the lemma is true for k = I --H 1. For k = I, we have FAILED1(2) A
FAIL?D2(3) A ... A FAILEDt?1(l). By the induction hypothesis, sendq(S, "I --H 1 failed") ...
Sendq(5, "2 failed") in lir Assume for contradiction that sendq(5, "1 --H 1 failed") sendq(5, "1 failed")
in ??. Then, as in the base case, recvt?i(q, "t --H 1 failed") recvt?i(q, "1 failed"), so crash1?1
failed1?1(l) in ?r and r k ?FAIL?Dt?i(l). Therefore, sendq(S, "I failed")			sendq(5, "1 --H 1 failed")
inTLr.			5
The quorum size for each failure detection is suffident to guarantee W. Assume for contra-
diction that the failed-before relation is not acyclic. Then r k ?5=f1,.. ., k?: FAILEDi(2) A ... A
FAll?Dk?1(k) A FAIU?Dk(l). By Lemma 9, ?q: sendq(5, "1 failed") sendq(S, "k failed")
sendq(S, "2 failed") in 1tr Thus, recvi(q, "1 failed")			recvi (q, "2 failed") in 7tr, crnsh1			failed1 (2)
in ?r, and r k ?FAIU?D1(2).			5
24
