BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR92-1284
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: The Inherent Cost of Achieving Causal Consistency
AUTHOR:: Critchlow, Carol M. 
DATE:: May 1992.
PAGES:: 127
COPYRIGHT:: Carol M. Critchlow 1991 - All Rights Reserved
ABSTRACT::
We consider the problem of distinguishing causally-consistent global states 
in asynchronous distributed systems. Such states are fundamental to 
asynchronous systems, because they correspond to possible simultaneous global 
states; their detection arises in a variety of distributed applications, 
including global checkpointing, deadlock detection, termination detection and 
broadcasting. A consistent-cut protocol is a protocol which in every run will 
designate for each processor a state, in such a way that these states 
together form a consistent cut. We analyze the cost of achieving causal 
consistency in terms of the extent to which a consistent-cut protocol delays 
events of the underlying system, and the message-complexity required by any 
such protocol. We refer to the delaying action of a protocol as inhibition. 
We consider a spectrum of protocol capabilities based on the type of 
inhibition that occurs: we distinguish local versus global inhibition and 
prove fundamental relationships between these concepts and the ability to 
determine causally-consistent states. A protocol using local inhibition may 
cause the delay of some of a processor's events until that processor has 
performed some number of local actions; a protocol using global inhibition 
may force the delay of some of a processor's events until that procesor has 
received some communication from other processors. Based on a variety of 
system and protocol characteristics, including the ability to locally or 
globally inhibit particular types of events, we give several impossibility 
results and examine some existing protocols. We are then able to present a 
thirty-six-case summary of protocols and impossibility results for the 
determination of causally-consistent states as a function of those 
characteristics. In particular, we demonstrate that local inhibition is 
necessary and sufficient to solve this problem for general FIFO systems, 
while global send inhibition is necessary and sufficient for general non-FIFO 
systems. Regarding message complexity, we demonstrate that a globally 
inhibitory consistent-cut protocol requires $O (N)$ messages where $N$ is 
the number of processors in the system. This is true whether the protocol is 
designed to work for FIFO or non-FIFO systems; the exact lower bounds, 
however, differ in the two cases. We also prove that a consistent-cut 
protocol which uses local inhibition only requires $O$ ($N^{2}$) messages, 
or, more precisely, $O$ ($\vert \cal C \vert$), where $\cal C$ is the channel 
set of the system. This latter result illustrates a trade-off between the 
message complexity of a consistent-cut protocol and the degree of inhibition 
which it requires.
END:: CORNELLCS//TR92-1284
BODY::
The Inherent Cost of Achieving
Causal Consistency
Carol M. Critchlow
Ph.D Thesis
92-1284
May 1992
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
THE INHER?ENT COST OF ACHIEVING CAUSAL
CONSISTENCY
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Carol M. Critchiow
August 1991
o Carol M Critchiow 1991
ALL MGHTS R?ESER?VED
The Inherent Cost of Achieving Causal Consistency
Carol M. Critchlow, Ph.D.
Cornell University 1991
We consider the problem of distinguishing causally-consistent global states in
asynchronous distributed systems. Such states are fundamental to asynchronous
systems, because they correspond to possible simultaneous global states; their de-
tection arises in a variety of distributed applications, including global checkpoint-
ing, deadlock detection, termination detection, and broadcasting. A consistent-
cut protocol is a protocol which in every run will designate for each processor a
state, in such a way that these states together form a consistent cut. We ana-
lyze the cost of achieving causal consistency in terms of the extent to which a
consistent-cut protocol delays events of the underlying system, and the message-
complexity required by any such protocol. We refer to the delaying action of
a protocol as inhibition. We consider a spectrum of protocol capabilties based
on the type of inhibition that occurs: we distinguish local versus global inhibi-
tion and prove fundamental relationships between these concepts and the ability
to determine causally-consistent states. A protocol using local inhibition may
cause the delay of some of a processor's events until that processor has per-
formed some number of local actions; a protocol using global inhibition may
force the delay of some of a processor's events until that processor has received
some communication from other processors. Based on a variety of system and
protocol characteristics, including the ability to locally or globally inhibit par-
ticular types of events, we give several impossibility results and examine some
existing protocols. We are then able to present a thirty-six-case summary of
protocols and impossibility results for the determination of causally-consistent
states as a function of those characteristics. In particular, we demonstrate that
local inhibition is necessary and sufficient to solve this problem for general FIFO
systems, while global send inhibition is necessary and sufficient for general non-
FIFO systems. Regarding message complexity, we demonstrate that a globally
inhibitory consistent-cut protocol requires 0(N) messages where N is the num-
ber of processors in the system. This is true whether the protocol is designed
to work for FIFO or non-FIFO systems; the exact lower bounds, however, differ
in the two cases. We also prove that a consistent-cut protocol which uses local
inhibition only requires 0(N2) messages, or, more precisely, O(IcI), where C is
the channel set of the system. This latter result illustrates a trade-off between
the message complexity of a consistent-cut protocol, and the degree of inhibition
which it requires.
Biographical Sketch
Carol Critchiow was born on April 19th, 1964, in Vancouver, British Columbia.
Her sojourn on the west coast was of short duration, however, and the majority
of her childhood was passed in the province of Quebec. Having little sense of
direction at the time of her graduation from high school, Carol was tempted south
of the border by the lure of a liberal arts education. She entered Amherst College
in September, 1981, and spent four years dabbling in subjects ranging from basic
electronics to ancient Greek. She received her B.A. in May, 1985. Her career
interests still being in soft focus ("maybe pure math, or computer science, or how
about operations research?"), she was drawn by the variety of opportunities offered
by the Center for Applied Mathematics at Cornell University. Consequently, she
started graduate school at Cornell in the fall of 1986, and eventually settled into the
study of computer science. Despite the development of an all-consuming addiction
to ice hockey during her third year in Ithaca, Carol earned her M.S. in January,
1990, and her Ph.D. in August, 1991. September of 1991 will see her taking up
the position of assistant professor in mathematics and computer science at Hobart
and William Smith Colleges in Geneva, New York; she hopes it will see her taking
up the position of center on the William Smith hockey team as well.
lii
Acknowledgements
First, I would like to propose a toast to Prakash Panangaden, whose encourage-
ment and support academic, moral, and vinous have been simply invaluable
to me throughout my time at Cornell. If not for him, I really might be a substi-
tute teacher of remedial algebra for high school students in Winnipeg. So here's
to you, Prakash, and merci bien, eh?
I also owe much to Kim Taylor, whose work first introduced me to the world
of consistent cuts, and with whom I enjoyed a brief but rewarding collaboration.
I am indebted to Bard Bloom for his time and attention; to Al Schatz for doing
time on my committee; and to Dolores Pendell for services too numerous to
record.
Many, many, many blessings upon Anne Rogers and Laurie Hendren, who
dragged me kicking and screaming to my first hockey practice, and thereby
created a monster. It is hard to imagine now what the past few years would
have been like without C.S. hockey, and all the activities to which it somehow
led: euchre and lacrosse, bridge and Chinese food, highway hockey and casual
hockey, pignapping and poetry wars. My heartfelt thanks to all the roommates,
teammates, friends, and swine who have shared with me these delights, and have
thus brightened my days at Cornell.
iv
Table of Contents
2
3
4
5
Introduction
1.1			Distributed Systems
1.2			Asynchronous Systems
1.3			Causality and Consistency
1.4			Protocols and Inhibition
1.5			Summary of Results . . -
1.6			Outline - . .
System Model and Definitions
2.1			Basic Definitions - - -			. - -
2.1.1 The Physical Structure
2.1.2			Events. . .			- -
2.2			Enabling Relations			. . .			.
2.3			Causality Relations
2.4			System Definition			- . .			. -
2.5			Partial Runs and Consistent Cuts
2.6			Summary - .			-
Consistent-Cut Protocols
3.1			Protocols
3.2			Consistent-Cut Protocols
3.3			Examples of CCPs			.
3.4			Summary - -			-
The Inhibition Spectrum
4.1			Inhibition -
4.2 Local and Global Inhibition
4.3 The Inhibition Spectrum
4.4 Summary
Preliminary Lemmas
5.1 Partial and Completed Runs
5.2 Causality and Receive Events . . -
2
4
5
8
9
10
11
12
13
15
19
22
24
27
28
29
32
33
42
43
44
45
48
51
52
52
57
v
5.3 Local Delay
5.4 Summary
63
- 64
6
7
8
Impossibility Results for Non-FIFO Systems
6.1 The Impossibility Theorem
6.2 Summary
Impossibility Results for FIFO Systems
7.1			Preliminary			Lemmas .			.
7.2			Primary Lemmas			. .			.
7.4			Further Results			. .			.
7.5			Summary .
Message Complexity
8.1			Results for Non-FIFO Systems . .			. . .			. .
8.1.1 Preliminaries
8.1.2 A Minimizing CCP for Non-FIFO Systems .
8.1.3 Lower Bound Theorem for Non-FIFO Systems
8.2			Results for FIFO Systems. .
8.2.1 Globally Inhibitory Protocols
8.2.2 Lower Bound Theorem for Globally Inhibitory Protocols
8.2.3 Lower Bound Theorem for Locally Inhibitory Protocols
Summary
8.3
9 Conclusions and Related Work
Bibliography
vi
65
66
71
73
75
78
89
91
92
94
94
97
99
113
114
118
119
121
122
126
List of Figures
2.1 Events and the happens-before relation.
2.2 Consistent (a) and inconsistent (b) cuts in a run.
3.1
3.2
3.3
3.4
3.5
4.1
4.2
20
26
Floodi (FIFO channels, system messages consistent) . . .			. . 34
Proof of correctness of Flood1. Part (a) illustrates Case 1			in the
proof; part (b) illustrates Case 2. . . 			36
Flood2 (FIFO channels, all messages consistent)			37
?ee (non-FiFO channels, all messages consistent)39
The three phases of protocol messages sent in a run of the			protocol
Tree. Dashed intervals contain no system send events
40
48
Protocol characteristics.
Consistent-cut protocols and impossibilities in the inhibition spec-
trum.
5.1 Lemma 2. The cancelled dashed arrow indicates "not happens-
before" and the solid arrows connect the sending and reception of
m
5.2 Proof of Lemma 2: message chains during iterative construction.
5.3 Proof of Lemma 2: send(q, qi, mi) H send(q, q?, m?) in the mes-
sage chain, but send(q,q2,m2) H send(q,qi,rni) at processor q.
6.1
7.1
7.2
Proof of Theorem 1. Dashed lines are intervals in which no receive
events occur, arrows represent happens-before, and `X' marks cut
states. (A),(B) follow from Lemma 2. (C),(D) then follow from
Lemma 4. (D) and absence of (E) imply (F).
50
58
60
61
67
Dashed lines represent
Proof of Lemma 7: (a) run r?; (b) run
receive-free intervals
Proof of Lemma 10(a) the original run r; (b) after the first step,
run r1; (c) the final run
Proof of Theorem 3. No event in the dashed portion of j's local
history can happen-before any event in any other processor's cut
state.
77
83
. . . .			. . .			87
7.3
vii
7.4 Proof of Theorem 3: the run r?? The dashed interval is receive-
free.			. .
8.1
8.2
8.3
8.4
8.5
8.6
8.7
89
96
The protocol Tree (a) and the proposed modification (b).
Minimizel T is a depth-first-search spanning tree of the network
rooted at I; parent(i) and chzldren(?) are the parent and children
of z in T98
The qj to c message is the last original message sent by q?; the
message m? lies on both the c to q? and p? to p paths. . .104
A hypothetical run. The solid arrows indicate original messages
and dotted arrows additional messages			. 107
The messages leading up to ml and m?108
The pruned message paths to mj and m?.			. . .			108
Minimize2 T is a depth-first-search spanning tree of the net-
work, rooted at I; parent(i) and children(i) are the parent and
children of i in T. . .			. . .			. .			. .			. 115
viii
Chapter 1
Introduction
1.1 Distributed Systems
A distributed system is a network of autonomous processors which communicate
with each other from time to time, to exchange information or perhaps perform a
system-wide task. Examples of distributed systems include the network of many
computers at various different sites which are connected by the INTER,NET. The
individual machines are all autonomous there is no central machine directing
all the others but users at the various sites may wish to acquire information
that is stored elsewhere. The INTER?NET allows users at different machines to
communicate with each other, and to transfer information between machines. On
a smaller scale, any local area network of workstations is a distributed system.
A typical process-control system or parallel machine are also examples.
Attempting to design protocols for such systems gives rise to difficulties which
do not occur in a strictly sequential setting: for example, problems of synchro-
nization, contention for resources, or liveness. Various formalisms have been
developed to aid in reasoning about these concerns. These include Calculus of
2
Communicating Systems (CCS) [MilSO], Communicating Sequential Processes
(CSP) [Hoa78,Hoa85], Petri Nets [Pet66,Bra8O], 1/0 Automata [LT89], Event
Structures [Win86], and several other algebraic or logical formalisms. Different
models reflect different possibilities in the properties of a distributed system. Eor
instance, the configuration of the processor network may be static or dynamic.
Processors may communicate by message-passing, or by means of shared mem-
ory, or a combination of the two. The most fundamental difference is in the
degree of synchrony present: processors may operate in lock-step or at arbitrar-
ily different rates; in a message-passing system, communication may be done by
means of broadcasts or on a processor-to-processor basis, and there may or may
not be upper bounds on ?he time messages take to be delivered. The status of a
single one of these or other variables can effect the solvability of basic problems
in distributed systems. The classic example is that of distributed consensus: the
processors in a system attempt to agree upon the value of a particular variable.
Solutions to the problem were sought that would be correct in the presence of
some given number of processor failures. It was shown [DDS87J that by varying
the assumptions about system synchronization properties, the problem ranges
from solvable in the presence of arbitrarily many failures to unsolvable in the
presence of even one.
1.2 Asynchronous Systems
The work in this thesis applies to systems in which the processor network is
static, processors communicate only via passing messages along point-to-point
communication channels, and both the processors and the communication are
3
asynchronots. In some distributed systems there may be means for processors
to attain some degree of synchrony in their actions. There may, for instance,
be a global clock to which all processors have access, and which therefore allows
them to perform actions at the same time or in some specific order relative to
each other. Alternatively, processors may have separate clocks, but with a known
upper bound on the difference in the rates at which the clocks proceed. There
may also be a known ceiling on the length of time it takes messages between
processors to reach their destination. All of these things aid the processors in the
system when it is necessary for them to take some kind of co-operative action. In
an asynchronous system, such features are not present. That is: there is no global
clock; processors proceed at arbitrarily different rates; and there is unbounded
delay in the transmission of messages between processors.
In such a system, it is difficult to obtain a measure of when events at different
processors occur in relation to each other. One cannot determine exactly which
events occur at any given real time, and consequently one cannot determine the
global configuration of the system at any point in its execution. This makes
it difficult to discuss what properties do or do not hold of a system at a given
real time. Essentially, an alternative to real time is needed in which to frame
questions about asynchronous systems. Such a framework was provided by Lam-
port [Lam78J in the form of potential causality, which we discuss in the following
section.
4
1.3 Causality and Consistency
In his work on logical clocks, Lamport introduces the notion of potential causality.
Two events in an execution of a system are related under potential causality if
the occurrence of one could possibly influence the occurrence of the other. In our
systems, this could happen in one of three ways.
o+ The two events occur at the same processor; the occurrence of the earlier
one could effect the occurrence of the later one.
o+ One event is the send of a message, and the second is the reception of the
same message.
0 The two events fall in the transitive closure of the two relationships above.
If two events on separate processors are not related in one of these ways, then
no effect of one event could reach the other processor in time to influence the
occurrence of the second event. The relationship is called "happens-before", and
it is a subset of temporal precedence: one event happens-before a second only
when we can be sure that the first occurred at an earlier time.
In our model of a distributed system, the state of a processor at any given
time is represented by the sequence of events that have occurred at that processor
up to that time. The state of a system is then represented by a vector of event
sequences, one per processor, corresponding to the processors' states at that time.
A real-time global state has the property that if a particular event is contained
in it, then all events which happened at an earlier time are also included. Having
replaced real time with potential causality in the study of asynchronous systems,
one considers sets of states with the corresponding property: if a particular event
5
is contained in the set, then all events which happen-before it are also included.
Such a set of states is a consistent global state or consistent cut. It is consistent in
the sense that it represents a configuration in which the system could find itself;
it is a possible real-time global state. The chief characteristic of such a set of
states is that it contains no messages which are received but not sent. In terms
of possible real-time states, it would be impossible for a message to be received
before it is sent. In terms of potential causality, this would violate the closure
under the happens-before relation: a send event happens-before its corresponding
receive, so a receive event cannot be included unless the corresponding send event
is also.
1.4 Protocols and Inhibition
A protocol for a distributed system specifies for each processor a set of events,
and an algorithm to determine the order in which that processor should perform
those events. The protocol events will occur interspersed among the events that
occur at a processor during the course of a normal execution of the system.
The protocol should run "on top of" a given system in the sense that it should
not prevent the system from executing some sequence of system steps that it
would have done in the absence of the protocol. That is, the subsequence of
system events in a sequence of system and protocol events should be a valid
event sequence of the original system. A protocol may, however, have the power
to temporarily obstruct the normal functioning of a processor by forcing it to
wait on the occurrence of some particular protocol event before continuing with
its task. This is termed inhibition. We distinguish two kinds of inhibition: in
6
the first, a protocol only delays events until some number of local actions sends
and internal events have been performed; in the second, it delays events while a
processor waits for communication from another processor. We distinguish these
as local inhibition versus global inhibition.
A consistent-cut protocol or CCP is a protocol that, in every combined system-
protocol run, will distinguish a set of local states that, taken together, form a
consistent cut. This in effect means guaranteeing that if the reception of a par-
ticular message is included in the designated set of states, then so is the corre-
sponding send. Consistent-cut protocols have numerous distributed applications
such as system checkpointing [Rus77,CL85,KT87], deadlock detection [BT87],
distributed termination [FraSO], and broadcasting [BJ87].
Chandy and Lamport give in [CL85] an example of a consistent-cut protocol
for FIFO systems; that is, systems in which messages sent directly between pro-
cessors are received in the same order in which they are sent. Each processor in
the distributed system will either
1. spontaneously record its state, and then send a marker along each channel
incident to it in the network before sending further messages along that
channel; or
2. upon receiving a marker, if it has not yet recorded its state, it will do so
and send out markers as above.
It is proved in [CL85J that the set of states recorded in the course of the exe-
cution of this protocol do indeed form a consistent cut (we present this argument
in Chapter 3). Our reason for introducing the protocol here is that it illustrates
many of the points which we will discuss in our analysis of consistent-cut proto-
7
cols. Note first that this protocol does indeed involve inhibition: there is a period
at each processor during which that processor cannot send out system messages.
There is never any restriction on the reception of messages. One of the issues
that we will consider is whether sends, receives, or both must be delayed in order
to guarantee the existence of a consistent-cut protocol. A second point to note
is that the inhibition is local, as defined above: a processor is prohibited from
sending system messages only until it has sent out a special marker; it does not
need to wait upon communication from another processor. Again, the question
of whether local inhibition suffices in all circumstances is one that we will address
in our study of CCPs. Thirdly, this protocol does not work in systems in which
the communication is not first-in-first-out (FIFO). In this case, the messages sent
by a processor along a channel after the marker (and hence after the processor
has recorded its state) may arrive at the destination processor before the marker
does (and hence possibly before that processor has recorded its state). We there-
fore include amongst the issues to be taken into consideration the question of
whether or not a system is FIFO. Note that some of the markers themselves
are inconsistent: the first marker to reach a given processor is sent after the
sending processor has recorded its state; unless the receiving processor has spon-
taneously initiated the protocol, the marker's reception will be contained in its
recorded state. Thus, it is actually only the set of non-protocol events which is
consistent. Whether or not protocol messages are allowed to be inconsistent with
respect to the cut designated by the protocol will effect the existence of CCPs as
well. Finally, the protocol above requires at least one marker between every pair
of connected processors. We will examine the lower bounds on the number of
8
messages required by CCPs, and the connection between inhibition and message
complexity.
1.5 Summary of Results
We analyze first the inhibitory cost of consistent-cut protocols. The variables
discussed above produce thirty-six possible combinations of system properties
and protocol characteristics. We prove the following results.
o+ Using up to global receive inhibition and local send inhibition, there is no
CCP for non-FiFO systems. Moreover, this is true even if protocol messages
may be inconsistent.
o+ There is no non-inhibitory CCP for FIFO systems, even if protocol messages
may be inconsistent.
o+ There is no CCP for FIFO systems that uses only local send inhibition, if
protocol messages must be consistent.
We also present three well-known protocols, at least one of which pertains to
each of the thirty-six cases in which an impossibility is not implied by the above
three results.
We also prove the following results concerning the number of messages re-
quired by a protocol in order to achieve causal consistency:
o+ Any consistent-cut protocol for non-FIFO systems requires 0(N) messages,
where N is the number of processors in the system. More specifically,
2(N 1) messages are required.
9
o+ Any globally inhibitory consistent-cut protocol for FIFO systems requires
0(N) messages. It may require as many as 2N --H 3 in the worst case.
o+ Any locally inhibitory consistent-cut protocol for FIFO systems requires
0(N2) messages. More specifically, it requires O(ICI) where C is the channel
set of the system.
These last two results illustrate a trade-off between the cost of a protocol in terms
of message complexity, and the cost in terms of the degree of inhibition required.
1.6 Outline
The remainder of this thesis is organized as follows. Chapter 2 contains a de-
scription of our system model, including formal definitions of the happens-before
relation and of causally-consistent cuts. In Chapter 3, we formally define pro-
tocols and consistent-cut protocols, and present three well-known examples of
CCPs. Chapter 4 contains the formal definition of inhibition, and a discussion
of the varying kinds and degrees of inhibition which together form the inhibi-
tion spectrum. The four subsequent chapters contain our impossibility and lower
bound theorems. We start in Chapter 5 with a set of preliminary lemmas which
are used in proving both sets of results. Chapters 6 and 7 contain our impossibil-
ity results, the non-FIFO case being discussed in the former, and the FIFO case
in the latter. The lower bound results appear in Chapter 8. Finally, we present
our conclusions in Chapter 9.
Chapter 2
System Model. and Definitions
We describe here a model of asynchronous distributed systems. A distribnted
system consists firstly of a set of autonomous processors, and secondly of a com-
munication medium which allows these processors to exchange information and
to co-operate in the achievement of various system-wide tasks. We are inter-
ested here in distributed systems in which the processors communicate only by
sending and receiving messages along bi-directional channels; there are no shared
data structures. By asynchronous, we mean that (1) there is no global system
clock, (2) there is finite but unbounded delay in the transmission of messages be-
tween processors, and (3) processors may proceed at different rates. We assume
that the network of processors is connected, though not necessarily completely
connected. We also assume that neither the processors themselves nor the com-
munication network can fail. In particular, we assume that any message sent by
one processor is eventually received by the processor for which it was intended;
any message received by one processor was indeed sent by some other processor;
and no message is altered in transmission, Messages in the system are unique, so
10
11
there can be no confusion as to whether a particular receive event corresponds
to a given send event.
Our model is similar to other models of asynchronous systems [CM85,PT88]:
the system behaviour is represented by a set of possible runs; each run is com-
posed of sequences of events corresponding to the local actions of each processor,
together with a partial order that models potential causality. This notion of
a run differs from that of a linearly ordered run, for example the timed runs
of [HM90]; a detailed discussion of asynchronous runs is contained in [PT89].
Unlike [CM85,PT88], we explicitly model the enabling of events by preceding
event sequences, as is typically done in studies of transition systems or event
structures [Win86,PT89,LT89,Bra80]. We also explicitly separate protocol and
system events (Chapter 3). These two differences will allow us to define inhibition
in Chapter 4 by comparing the enabling of events by a system alone with the
enabling of events by system and protocol together.
2.1 Basic Definitions
Our formal model for distributed systems will consist of five components:
o+ a processor identifier set and a channel set which describe the physical
configuration of the system, which is assumed to be static.
0 a description of the individual events which may occur at each processor;
o+ a description (called an enabling relation) of the manner in which events
may occur at each processor; and
o+ a set of runs which describe the possible executions of a system.
12
We discuss each of these below.
2.1.1 The Physical Structure
A distributed system consists of a set of N processors and a connecting network
of channels by means of which the processors communicate.
Definition 1. If 1 fl,2,... , NJ is a set of natural numbers used to identify
the processors in a system, then C C I x I is a channel set if (i,i) ? C for any
? I.
The channel set of a system describes the physical links between processors.
If i and j are members of I and (i, j) is an element of C, then processor z can
send messages to processor j in the processor network described by C. The
restriction (i, i) ? C states that a processor does not send messages to itself via
the communication network.
We are interested in modeling systems in which the channels are bi-directional:
that is, whenever processor? can send messages to processor j, it is also possible
for j to send messages to i.
Definition 2. A channel set C C I x I is hi-directional if whenever (i,j) ? C for
i,j ? I, then (j,i) ?
A second feature of?he channel sets of the systems which we wish to consider
is connectivity. We will assume that, for any two processors z and 5 in the system,
there is a communication path from i to 5.
Definition 3. The channel set C C I x I is connected if for every i,j E I, there
exist ?j,i2,...,in ? I such that (i,?1),(ij,i2),... ,(in?i,in), (in, 5) CC.
13
We are often interested in referring to pairs of processors in a network which
are capable of exchanging messages directly without necessity of intermediate
processors. Since we will be concerned only with systems for which the channel
set is bi-directional, we define what it means for processors to be neighbours for
bi-directional channel sets only.
Definition 4. If C c I x I is a bi-directional channel set, and (i,j) (and hence
(j, i) also) is in C, then i and j are neighbours in the processor network described
byC.
2.1.2 Events
With each processor in a system we associate a set of evenIs. Events describe
single indivisible local actions of a processor. Our model contains only three
kinds of indivisible operations: send events, in which a single message is sent by
a processor to a single destination; receive events, in which a single message is
received by a processor; and internal events, which do not involve inter-processor
communication. Earlier work by Taylor [Tay89] in the area of consistent-cut pro-
tocols considers an indivisible operation, called atomic receive-send, for receiving
a message and sending multiple responses. Some of the results of this work were
shown in [Cri89] to depend on the presence of this construct, and it was dropped
in later work by Critchlow and Taylor [CT90a], and in this work also. We discuss
atomic receive-send further in Chapter 4 when we define inhibition.
Definition 5. Given a set of messages Msg, an event set for a processor i ? I is
a subset of the set Send? U Peceive? U Interna4, where
Send? fsendj x ?iJ x I x Msg,
14
Receive?			(receive) x fi) x Ix Msg, and
Internal?			(internal) x fi) x Msg.
The event set of processor i is designated E?; any element of Ej is an event of
processor z.
The send event send(i,j,rn) occurs at processor i; i sends the single message m
to the single destination processor j. The reception of the single message mfrom
processor j at processor t is modeled by the occurrence of the event receive(i, j,m)
at processor i. (We assume that messages in the system are unique, so that there
is no ambiguity as to which send event corresponds with a given receive event.)
An internal event of a processor does not involve communication with any other
processor. The event set of a processor i reflects the physical conflg?ration of the
system in that it will contain events of the form send(i,5,m) only if (i,j) ?
and receive(i,j,m) only if (j,i) ?
Events describe single local actions of a processor. We now turn our attention
to sequences of events occurring at a particular processor in the course of the
system's functioning. We describe the state of a processor in terms of these
sequences.
Definition 6. Let Ejbe the event set of processor i. A sequence, finite or infinite,
of events from Ej is called a localhistoryof processor i. Any such finite sequence
is also called a localstate of i.
Definition 7. State?(Ei) is the set of all possible local states consisting of events
from Ej. The empty sequence of events, A, is included in States(Ej).
15
2.2 Enabling Relations
We wish to incorporate into our model some way to describe formally the set
of events that conid occur at any given point in a processor's execution. We do
this by means of a formalism known as an enablin? relation. Conceivably, there
are circumstances under which certain events cannot or should not occur at a
given processor. An enabling relation describes which events, exactly, may occur
next at any given point in a processor 5 execution. This formalism will allow us
to examine the exact effect of a protocol on a system 5 execution: we compare
the enabling relations of the original system with those of a system and protocol
together.
Definition 8. An enabling relation Mi for a processor i with event set Ej is a
subset of States(E?) X Ej. If the pair (li, e?) ? Mj, then we say that local state l?
enables the event e?.
Thus the set of events iej : ((?, e?) ? MjJ describes which events could occur
next at processor i if i is in state l?. Note that this set may have more than one
element; in fact, it may be infinite. It may also, in some cases, be empty.
We have defined States(E?) to be the set of all local states of a processor ?
having event set Ej. However, there may well be many states in States(Ei) in
which processor i will never find itself, in any execution of the system. Since
i's enabling relation describes which events could happen at any point in i's
execution, then any state lj in which i actually finds itself during the course of its
execution must have the property that each event of l? is enabled by the preceding
prefix of ii. We call a local state with this property a feasible local state, and, by
16
extension, any local history with the same property a feasible local history.
Definition 9. A local state or local history i? of processor i with enabling rela-
tion Mj is feasible if for each prefix k1 - e? of ij, 1:' enables e?, i.e. (ls'? e?) Ei Mi.
In other words, local state l? e?' 2 - e? is feasible if (A,e1) E Mj,
(e21-,?2) C Mi, ..., (e?1 ... - e???',e2?.) Ei Mi. At each point, the next event is a
"legal step", and the result is a local state which could actually occur.
We are interested in systems with two restrictions on the manner in which
receive events are or are not able to occur. At any point in its execution, pro-
cessor i may be willing to receive messages from one processor but not from a
second processor. However, processors cannot "peek" at messages prior to hav-
ing received them; they only gain knowledge of a message's contents through the
appropriate receive event. Therefore i may not selectively receive messages from
a single processor where the selection is based on message contents. We encode
this in our formal description by placing the following restriction on enabling
relations.
Property 1. For all feasible local states ii of processor i, if ((?, receive(i,j, m)) e
Mj, then (lj,receive(i,j,m')) Ei Mj for any message ?? from j.
Consequences of the alternative assumption are discussed at the end of Chapter 4.
The second system property that we wish to encode in the enabling relation
also deals with the reception of messages. It should be the case that any message
which is sent to a processor may always eventually be received by that processor.
That is to say, a processor cannot "miss its chance" to receive a message which has
been delivered by the communication medium. A system in which this property
17
did not hold would in effect be inherently unreliable: a message that is delivered
by the communication medium but cannot be received by the processor is in effect
a message that has been lost. It would be impossible, for example, to achieve
any kind of global co-operation in a system where some processor is never willing
to receive messages from any of its neighbours. The following two properties of
enabling relations will guarantee that messages can always be received eventually.
Property 2 states that a processor cannot reach a state in which it can no
longer perform any send or internal events, and yet is unwilling to receive mes-
sages from some neighbour. If such a state were reached, than any of that neigh-
bours messages still or subsequently in transit to that processor could not be
received.
Property 2. For all feasible local states i? of processor i such that no send or
internal events are enabled, all receive events rece?ve(i, j, m) in Ej are enabled.
In any infinite execution of a processor, we wish to guarantee that there are
infinitely many points during the course of the execution at which messages from
a given neighbour could be received.
Property 3. In any infinite feasible local history ij of processor z, for each neigh-
bour j of? there are infinitely many prefixes of ii which enable the reception of
messages from j.
Note that these three properties are necessitated by our assumption that a
processor does have the power to refuse to receive messages at given points in
the course of its execution: receive events are not assumed to be always enabled.
This is in keeping with such concurrent programming languages as Linda [Gel85]
18
and PLITS [Fel79], but differs from such models as those of [LT89,Dil88,Jos91],
in which any of a processor's input events can occur at any time. They take the
view that a processor cannot necessarily control whether or not other processors
send it messages, and hence whether or not there will be a message arriving on
some channel at any given point in its execution. One can contemplate splitting
our recetve events into two separate events: the arrive event, in which a message
actually is received at a processor, and the read event in which an arrived message
is processed. All arrive events would be enabled by any state. Thus, the reception
of a message would be two separate actions, the occurrence of only one of which
is under local control. However, potential causality (see Definitions 10 and 11 of
Section 2.3) would be transmitted through the reading of a message rather than
through its arrival at a processor: until it is actually read, no information about
the state of the sender can be gained. The result is that we would have added
arrive events only to more or less ignore them completely: we would still need
Properties (1) through (3) above to hold (of read events); the causal consistency
or inconsistency of a cut (see Definition 18 of Section 2.5) would depend on read
events only; the cut produced by a consistent-cut protocol (see Definition 22
of Chapter 3) would have to be distinguishable only modulo the occurrence of
arrive events, etc. We therefore choose the simpler model, and only remark that
a simple transformation would convert any impossibility proof in our model into
a proof of a corresponding theorem in the receptive model with separate arrive
and read events.
19
2.3 Causality Relations
The final component of our system description is a set of executions, or runs,
which we associate with each system. In order to describe possible system exe-
cutions, however, we require some means of providing temporal structure to sets
of local histories. In synchronous systems it is possible to describe the relative
order in which events at different processors take place by referring to the global
system clock. In asynchronous systems, however, we do not have this option. It
is possible to determine the order in which events occur at a single processor,
since these processors are assumed to be sequential. Also, if a processor receives a
message from one of its neighbours, then the corresponding send event must have
occurred earlier at the sending processor. The fact that a certain event precedes
other events implies that the first could have a causal effect on the later events.
Hence this is referred to as potential causality, or simply causality. If two events
are not related in one of these ways, or by a chain of such relationships, then
there is no way to determine in which order they actually occurred. However, it
is impossible, in this case, that one has any influence on the occurrence of the
other. Thus potential causality gives us a partial rather than a total ordering on
the events in the run.
Definition 10, Given an N-vector r =(r1,... ,r?), where r? is a local history of
processor i for each i, event e happens-immediately-before event e', written e ``
if either
1. e and ?? are both events in i'j for some i, and e occurs immediately before
e'in r?, or
20
e5
el
e7
Figure 2.1: Events and the happens-before relation.
2. e is send(?, j, rn) and J is receive(j, i, m) for some processors z and j, and
some message m.
Definition 11. The relation happens-before, denoted H, is the transitive closure
I.
of happens-immediately-before: e e in run r if there exist events e1, ..... . ,
such that e H> e1 i.' ... `?` e? H
Equivalently, e H e'if and only if there is an event ?t? such that e H+ e" and
ti
e H e.
Note that the happens-before relation holds exactly when we can be sure that
one event has temporally preceded a second.
Figure 2.1 illustrates the happens-before relation. We depict an execution of
a system by drawing an event line corresponding to each processor. Time moves
from left to right along each line. Individual events are represented by dots on
21
the event lines, The send and receive of a message are connected by a solid arrow.
In this figure, e1 e6 since e1 ,` e2 H e3 `, e4 `? e5 H e6. It is not the case
that e3 H e7? nor that ?7 H
We can now associate a set of runs with each distributed system, correspond-
ing to the possible executions of that system.
Definition 12. Let I ?1,..., Ni be the processor identifier set of a distributed
system; Ej be the event set of processor i for each? e I; and Mi be an enabling
relation for each i Ei I. A vector r of local histories is a run of this system if the
following conditions hold.
1. For all ? ? I, r? is a feasible local history of processor z.
2. The reflexive closure of happens-before is a partial order on the set of events
included in r, i.e. there are no two events e and e' such that e H ?? and
e1 H e in r.
3. For every local history r?, if r? is finite then there is no send or internal
event e? such that (r?,e?) E Mj.
The first condition requires that each component of r represents a legal se-
quence of steps as laid out by the enabling relation of the appropriate processor.
The second rules out the possibility of one event happening-before a second,
which in turn happens-before the first; since the happens-before relation is a
subset of temporal precedence, this would make no sense physically. This final
condition states that no processor will arbitrarily cease execution when it could
yet proceed with some send or internal event. We discuss the question of receive
events in the next section.
22
2.4 System Definition
Using the elements described thus far, we can now formally define a system.
Definition 13. A systemS is a 5-tuple (I,C,?,M,?), where
1. I = ?1, ..., NJ is a set of processor identifiers;
2. C is a bi-directional connected channel set for I;
3. ? = (E1,E2,... , EN) is an N-vector of event sets where for each i, Ej
contains events of the form send(i,j, m) or receive(z,j,m) only if (i,j) ?
4. M =(M1,A?,.. . ,MN) is an N-vector such that for each i Ei I, Mj is an
enabling relation on the set of events Ej, satisfying Properties 1 through 3;
and
5. 1Z is the set of all runs over I, t, and M.
Throughout this thesis we confine our interest to systems in which both the
processors and the communication medium are reliable. Processors cannot fail;
any message that is sent is eventually received at the correct destination; any
message that is received by a processor has actually been sent to that processor
by another; and no message is altered in transmission. Note that our Property 2
of enabling relations does not permit crash failures: there cannot be an internal
"crash" event after which no further messages can be received. However, though
we consider only systems in which Properties 2 and 3 hold of enabling relations,
it is entirely possible that there are vectors of local histories which satisfy the
definition of run, but which do not exploit these properties; that is, runs in which
not every message that is sent is also received. We do not wish at this point to
23
include these in our formal system description for the same reason as that given
above: we are interested in reliable system behaviour, and an unreceived message
is indistinguishable from a lost one. We therefore have the following definition.
Definition 14. A run r is said to be completedfffor every pair of processors i,j ?
I, if the event send(i, j, m) occurs in the local history r?, then receive(j, i, m)
appears in r?.
Note that this condition, together with clause 3 in the definition of run, implies
that if r? is finite in run r, then the only events that are enabled by r? are receive
events recezve(i,j,m) such that either (a) receive(i,j,rn) already appears in r?,
or (b) send(j, i, m) does not appear in r1.
The completeness condition on runs guarantees that all messages that are sent
are eventually received at the correct destination. We need to ensure, however,
that no messages are spawned or altered by the communication medium. This is
formalized by the final restriction in the following definition.
Definition 15. A system S = (I,C,t,M,??) is reliable if 1? is the set of all
completed runs over I, ?, and M, and for each r C ?, r contains no event
rece?ve(i, j, m) for which there is not a corresponding send event send(j, i, m) in
the appropriate local history.
Throughout the remainder of this thesis, the systems under discussion are as-
sumed to be reliable.
We will sometimes be concerned with modeling systems whose channels ex-
hibit flrst-in?first-out behaviour.
24
Definition 16. A system 5 (I, C, ?, M, 1?) is a FIFO system, if for all r ?
and all neighbouring processors i and j in I, whenever send(i, j, m) H send(?, j, m')
in r, then receive(j, i, m) H receive(j, i, m?) in r.
This definition requires that messages sent along a particular channel arrive
at their destination in the same order as they were sent. It does not require that
all communication between processors be completed in the same order as it was
initiated. For instance, suppose that in a run of FIFO system 5, processor z
sends a message m1 directly to processor j, and then sends a message m2 to a
third processor k. If k then "forwards" ?2 to j in the form of the message m3,
it is possible that m? be received by j before m? is.
2.5 Partial Runs and Consistent Cuts
A run is a description of an entire (possibly infinite) execution of a system. We
wish now to describe what the system might look like at some specific, finite
point in the course of its execution.
For each processor a local state will describe the sequence of steps taken;
there will be no infinite local histories. What must be true of the vector of local
states to make it meaningful as an instantaneous picture of the global system
state? Clearly, the first two conditions in the definition of a run that each local
state be feasible, and that there be no circular causality must hold, for the same
reasons as those given earlier. Equally clearly, the third condition that there be
no further send or internal events enabled need not hold. Since the system in
question is assumed to be reliable, we need an additional constraint: that there be
no receive event for which the corresponding send event has not yet occurred. (If
25
the system is unreliable, then a send-less receive is a possibility at any time: the
message being received may have been generated by the communication network.)
Also, if the system in question is FIFO, then the FIFO property must hold of
the vector of local states in order for it to correspond to a possible real-time
configuration of the system.
Definition 17. Let S (I,C,?,M,?) be a system, and r' (r'1,... ,r?') be
a vector of local states. We call r' a partial run of S if the following conditions
hold:
1. for all i ? I, r? is a feasible local history of processor i; and
2. the reflexive closure of happens-before is a partial order on the set of events
included in r.
3. for each receive event in r there must be a corresponding send event in the
appropriate local history.
If 5 is a FIFO system and send(i,j,m) H send(i,j,m') in r, then either (a) nei-
ther of the corresponding receives appears in r, (b) receive(j, i, m) only appears
in r, or (c) receive(j,i,m) receive(j,i,m') in r.
The definition of partial run gives the minimum conditions that must be true
in order for a vector of local states to represent a global state of a system. We
will show in Chapter 5 that the name "partial run" is justified in the sense that
any partial run can be extended to a completed run.
Definition 18. Given a partial run r' of a system, for any run r such that r?'
is a prefix of local history r? for all i, r' is called a consistent cut of r. Equiva-
lently, given a total run r and an arbitrary vector (11,12,... `1N) of prefixes of r.
26
3
Note that a consistent cut has the following property: if an event e is contained
in the cut, then any other event e' in the run such that e' H e is also included in
the cut.
Figure 2.2 illustrates consistent and inconsistent cuts in a run. The states
to the left of the "X" `s in part (a) form a consistent cut, since for every receive
event contained in them, the corresponding send event also appears. In part (b),
however, the event receive(j, k, rn? is included in the indicated cut, while the
event send(k, j, m) is not. We refer to a message such as m as an inconsistent
message.
We have defined partial runs and consistent cuts only in the context of systems
which are assumed to be reliable. If we were to include in our study systems
where communication is unreliable, we could relax clause 3 in the definition of
partial run: as noted above, a send-less receive is a possibility at any time in
1N) is a consistent cut of r if and only if it does not include the reception
of a message whose send is not also included.
Figure 2.2: Consistent (a) and inconsistent (b) cuts in a run.
(a)
z _____ --
3
27
such a system. However, in this case, a partial run will be a consistent cut in an
extending run only if the sends corresponding to these send-less receives do not
appear in the extension.
2.6 Summary
In this chapter, we have presented a formal model for asynchronous distributed
systems. This model includes the notion of an enabling relation, which provides
an exact description of what events could occur at any point in a processor's
execution. We have discussed the properties that should hold of the various
system components in order for them to describe the properties of the particular
systems in which we are interested. We have also introduced potential causality
as a method for imposing order on the events in a system execution, and the
related notion of consistency.
Chapter 3
Consistent-Cut Protocols
In this chapter we give a formal definition of a protocol for a distributed system.
Informally, a protocol specifies for each processor a set of protocol events, and an
algorithm to determine the manner in which that processor should execute the
events so as to accomplish the goal of the protocol. More formally, a protocol
maps an original system to a new one by introducing new events, new enabling
relations on sequences of system and protocol events, and a new set of associated
runs. (The protocol does not, of course, change the physical configuration of the
system.)
We have chosen a model in which system and protocol events are strictly
separate. There can be no "hybrid" events such as the merging of system and
protocol messages; this type of merging occurs in piggybacking techniques, in
which protocol information is added to the contents of system messages. We
discuss piggybacking further in Chapter 9. Here, the distinction between system
and protocol events will allow us to define inhibition, as discussed in Chapter 4
We introduced in Chapter 2 the notion of a consistent cut, and the correspon-
28
29
dence of the consistent cuts in a run to possible global states of the system in the
course of its execution. In the second part of this chapter we define consistent-cut
protocols or CCPs: these are protocols which in every protocol run will designate
a set of local states, one per processor, in such a way that these states together
form a consistent cut. Consistent-cut protocols have numerous distributed ap-
plications such as system checkpointing [Itus77,CL85,KT87], deadlock detection
[BT87], distributed termination [Fra80j, and broadcasting [BJS7]. CCPs differ
from snapshot [CL85,KT87] protocols in that we are not concerned with recording
the states of channels. (Of course, impossibility results for consistent-cut proto-
cols directly apply to protocols, such as snapshots, that perform other tasks in
addition to determining consistent cuts.)
We also present here three well-known consistent-cut protocols, and inves-
tigate the circumstances under which each protocol is successful. We consider
whether (1) the protocol works for non-FIFO systems or only FIFO systems, and
(2) all messages or only system messages must be consistent with respect to the
cut designated by the protocol. We present proofs that the protocols are correct
under the circumstances claimed.
3.1 Protocols
A protocol is, in effect, a function from systems to systems. Introducing a pro-
tocol into a distributed system produces a new system with identical identifier
and channel sets: a protocol does not change the physical configuration of a
system. The remaining three components of our formal system definition will,
however, change. We examine these changes and introduce some terminology
30
before proceeding to the actual definition of a protocol.
Firstly, none of the events of the original system can be lost in the transforma-
tion by the protocol. New events, however, may be introduced. We distinguish
these as system and protocol events.
Definition 19. Let S = (I,c,?,M,1?) be a system and ? a protocol mapping
S to ?(S) = (I, C, ?, M1, ??) For each processor i E I, an event ej such that
E Ej is a system event, while an event e? such that e? Ei E? but e? ? Ej is a
protocol event.
If the event send(i,5, m) is a system (resp. protocol) event, then we also refer
to the message rn as a system (resp. protocol) message.
Given a sequence of events occurring at a processor during an execution of
the new system, we often wish to separate out the subsequence of system events.
For this we have the following definition.
Definition 20. Given a system S = (I, C, ?, M, 9Z) and its image under protocol
?, ?(S) = (1, C, ?, M', 1?'), the function SysEvents maps a sequence of events
from E11 ?... u EN' to its projection onto the set of events E1 ?... U EN.
The enabling relation for each processor? in ?(S) is a subset of States(E?') x
E11. However, it should be the case that in the absence of protocol events, the
enabling relation is as in the original system. That is, until the execution of the
task designated by the protocol has actually commenced, the system functions
entirely as it did originally.
Finally, we require that the presence of the protocol does not prevent the
underlying system from performing its own task. For any run ....... r?') of
31
the new system, the run (SysEvents(r'1)... 5ysEvents(r'?)) should be a valid
behaviour of the original system.
We now give our formal definition of a protocol.
Definition 21. A protocol? is a function which maps a systemS = (I, C, ?, M, ?)
to a new system ?(S) = (I,C,?,M',1?') such that
1. ff?=(E1,...EN)and?=(E1',...E?'),thenEjCE?foreachi?I;
2. for all processors i, all ii E States(Ei), and all e? E Ej, (li, ci) Ei Mi if and
only if (l?, ci) ?
3. for all runs r' = (r'1, ..., r'?) in 1?', the run (SysEvcnts(r'1), ..., SysEvcnts(r'?))
is in??; and
4. if 5 satisfies the FIFO condition then so does ?(S).
Note that because ?(S) is a system, it will automatically satisfy conditions 1
through 5 of Definition 13 in Chapter 2, as well as the reliability conditions of
Definition 15. We refer to runs of this modified system as runs of the protocol with
respect to the original system. Also note that the third requirement in the defini-
tion above is not sufficient to guarantee that the FIFO property is not lost under
a protocol ?: it governs only system messages. To ensure that protocol messages
also obey the FIFO condition, both with respect to system messages and to each
other, we add the fourth clause above. Finally, we remark that the definition
leaves open the possibility that some valid system behaviours may be lost: it is
not guaranteed that for any given run r of the original system there will be a pro-
tocol run r' = (r'?,. . . ,r'?) such that r = (SysEvcnts(r'1),... ,SysEvents(r?')).
The protocols which we present, however, do have this stronger property; and
32
our impossibility results using the weaker definition would of course also hold if
we required all CCPs to have it.
3.2 Consistent-Cut Protocols
We now turn our attention to a specific class of protocols for distributed systems:
namely, consistent-cut protocols.
Definition 22. A consistent-cut protocol (CCP) is a protocol ? which, for every
system S and every protocol run r in ?(S), will designate a set of local states
cut(r) called cut states in such a way that:
1. for all r, cut(r) is a consistent cut of r;
2. for any two protocol runs r and ?? both containing local state ii of proces-
sor i, if i? is i's cut state in run r then it is also i's cut state in run
3. for any partial run r of the original system S, there is a protocol run ?? of
P(S) that extends r; i.e. for each i, r? is a prefix of r'?; and
4. for every run r of the protocol and every processor i, there is a prefix
ii of local history r? such that every protocol event which occurs in r? is
contained in t?, and for every extension l'? of i?, (1:", e?) E AIj implies that
?
The second condition implies that the cut state of each processor is distin-
guishable to that processor; in other words, each processor z "knows" that any
messages sent subsequently to j will not be received in j's cut state, regardless
of the sequence of steps that j may have taken in the meantime. More formally,
each processor achieves concurrent common knowledge [PT8SJ of the fact that it
33
has reached the cut designated by the protocol. The third condition implies that
a CCP cut may occur anywhere in the run of the underlying system; this elimi-
nates CCPs that, for example, only distinguish the cut (A, ..., A) at the beginning
of a run. The final condition states that the action of the protocol must be finite.
We sometimes consider CCPs in which cut(r) may contain inconsistent pro-
tocol messages; this implies that condition (1) is changed to require that
(SysEvents(11),.. ,SysEvents(l?)) be consistent.
3.3 Examples of CCPs
In this section we give three CCPs and the proofs that they perform as claimed.
The first two are flooding algorithms, in which messages are sent along every
channel in the system [Cha82,CL85]. Floodi is essentially the Chandy and Lam-
port checkpointing algorithm [CL85] Flood2is similar to that of [Tay89?, with
the indivisible receive-send mechanism replaced by local receive inhibition. The
third algorithm, Tree, is a two-phase spanning tree protocol from [Tay89j; it is
similar to protocols in [Era80,Awe85j.
In both Floodi and Flood2, messages are sent along every channel in the
system, beginning with messages sent by a distinguished initiator 1. The proces-
sors reach their cut states either immediately upon, or closely following, the first
reception of a protocol message. A consistent cut essentially occurs because any
system message sent after the protocol messages must arrive after the cut due to
FIFO channels. Additionally, either the sending or receiving of system messages
must be suspended during the interval in which protocol messages are sent. In
Ftoodi, in which the cut is reached at the beginning of that interval, system mes-
34
o+ Initiator I, at any time
start CCP; (Cut) send(I, j, Cut) to each neighbour j before send-
ing further system messages to j.
o+ Other processors i, immediately upon the first occurrence of a re-
ceive event of the form receive(i, k, Cut)
(Cut) send(i,j, Cut) to each neighbour j # k before sending fur-
ther system messages to j.
Figure 3.1: Floodi (FIFO channels, system messages consistent)
sages sent to a neighbour after the cut and prior to the protocol message could
be inconsistent and are therefore not permitted by the protocol. In Ftood2, the
cut is reached at the end of the interval. Consequently, it is the reception of sys-
tem messages during this interval which could cause inconsistency and must be
prevented. This phenomenon is called inhibition; we will give a formal definition
and a detailed examination of inhibition in Chapter 4.
Each (Cut) in Figure 3.1 indicates a state in the consistent cut. We as-
sume that an internal protocol event, denoted Start CCP, begins each initiator's
protocol execution.
Note that conditions (2) through (4) in the definition of consistent-cut pro-
tocols are certainly met by Floodi: for each processor, the first protocol event
that occurs in its local history will be the last event of the cut state, and so the
cut is distinguishable; the Start CCP event, and hence all further protocol events
can occur arbitrarily late in the run; and there are only finitely many protocol
35
events, and finite intervals over which events are delayed. We need only show
that the designated states together form a consistent cut with respect to system
messages. (There will be inconsistent protocol messages in any run of Floodi:
they are all sent out after the sending processor has reached its cut state, but no
processor (other than the initiator) reaches its cut state before receiving one.)
Proof of correctness of protocol Flood]: Suppose there is some run r of
protocol Foodi in which the system send event send(?,j, m) occurs after i's cut
state in the run, but receive(j, i, m) appears in j's cut state.
Case 1: Suppose that i does not reach its cut state upon receiving a Cut
message from 3; i.e. either i is the initiator or ? reaches its cut state upon
receiving a Cut message from some processor other than j. In this case, i will
send a Cut message to j after its (i's) cut state, and before sending any post-
cut system messages to j. Thus ? sends the cut message before it sends the
inconsistent message in. Because the system is FIFO, the event receive(j, i, Cut)
will occur at 5 before the event rece?ve(j, i, in). (See Figure 3.2, part (a). The
solid "X" marks i's cut state, and the dotted "X" `s the possible locations of 5's
cut state.) However, 5 will reach its cut state upon receiving the Cut message
from i if it has not already done so. Therefore it is impossible that the event
receive(j, i, in) be included in 5's cut state, contradicting our original assumption.
Thus, if there is an inconsistent system message from i to 5, it must be that i
reaches its cut state upon receiving a Cut message from 5. Let us consider this
case. Case 2: Suppose i reaches its cut state upon receiving a Cut message
from 5. The event receive(i,5, Cut) will appear in processor i's cut state, while
the event send(5, i, Cut) occurs after 5's cut state. However, we have assumed
36
j ________________ _____________ --H			1			______
(a)
Figure 3.2: Proof of correctness of Floodi. Part (a) illustrates Case 1 in the
proof; part (b) illustrates Case 2.
that scnd(i,5, m) occurs after i's cut state, and therefore after receive(?,5, Cut).
Similarly, receive(j, ?, m) occurs in j's cut state, and thus before send(j, i, Cut).
(See Figure 3.2, part (b).) We now have
send(j,i,Cut) rece?ve(i,j,Cut) H send(i,j,m)
receive(j, i, m)			send(j, i, Cut).
But there cannot be circular causality in a run, and we have reached a contra-
diction. In sum, there cannot be an inconsistent system message in any protocol
run. 1
The protocol Flood2 is described in Figure 3.3. Once again, it is easy to see
that conditions (2) through (4) in the definition of CCP are satisfied by Flood2.
We must still show that the designated states form a consistent-cut, this time
with respect to all messages.
Proofofcorrectness ofprotocol FThod2: Suppose that there is an inconsis-
37
o+ Initiator I, at any time
start CCP; send(I,j, Cut) to each neighbour j before receiving
further messages. (Cut)
o+ Other processors i, immediately upon the first occurrence of an
event of the form receive(i, k, Cut)
send(i, j, Cut) to all neighbours j ? k before receiving further
messages (Cut)
Eigure 3.3: Flood2 (FIFO channels, all messages consistent)
tent message ni from processor z to processor j in some run r of protocol Flood2.
If i has sent j a Cut message in run r, then that message will have been received
at j before the inconsistent message m, since z sends all Cut messages before
reaching its cut state, and the system is FIFO. However, j does not receive any
messages from i after the event receive(j,?, Cut) and before reaching its cut state.
Therefore the event receive(j, i, m) would occur after j's cut state. Since m is
an inconsistent message by assumption, we conclude that ? does not send a Cut
message to j in the run.
It must be the case, then, that the Cut message from j was the first Cut
message to be received at i in the run r. Therefore the event receive(i,5, Cut)
occurs in i's cut state. Since send(i,j,m) occurs after i's cut state, we have
recetve(i, j, Cut) H send(i, j, m). But it must also be the case that receive(j, i, m) H
send(j, i, Cut), since there are no messages from i received by j between send(j, i, Cut)
38
and the end of j's cut state. Now, however, we have
send(j,i,Cut) receive(i,j,Cut) H send(i,j,m)
receive(j,?,m)			send(i, 5, Cut).
Since there cannot be circular causality, we must conclude that there can be no
inconsistent message. I
The flooding protocols are correct for FIFO systems because the presence of
the protocol messages between each pair of neighbours forms a barrier against
inconsistency for messages sent afterwards. If the system is not FIFO, how-
ever, these protocol messages cannot perform this service, since system messages
sent afterwards may arrive at the destination processor first. We therefore need
a different approach for non-FiFO systems. Protocol Tree uses a three-phase
method. A spanning tree of the network, assumed to be known in advance, is
used to minimize communication. Again there is a distinguished initiator I (the
root of the spanning tree). Three messages, PrepareCut, Cut, and Resume are
sent respectively down, up, and back down the spanning tree. System send events
are suspended as Cut messages move up the tree, and the suspension is lifted as
Resume messages move down the tree. In sum, any inconsistent message sent
after the cut and after the re-starting of system send events would have to be
received before being sent, due to causal chains from the receiver to the initiator
and from the initiator to the sender.
Protocol Tree is described in Figure 3.4, and illustrated in Figure 3.5. Con-
ditions (2) through (4) being satisfied, we must show that the designated states
form a consistent cut with respect to all messages.
39
Let T be a spanning tree of the network rooted at I
Let parent(?) and children(i) be the parent and children of i in T.
(1) Initiator I: start CCP; send PrepareCut to children(I).
(2) Each internal node ?, after receiving PrepareCut: send
PrepareCut
to children(t).
(3) Leaf nodes i, after receiving PrepareCut: Disable system sends;
send Cut to parent(i). (Cut)
(4) Each internal node i, after receiving Cut from children(?): disable
system sends; send Cut to parent(?). (Cut)
(5) Initiator I, after receiving Cut from children(i): (Cut) send
Resume to ch?ldren(I).
(6) internal nodes i, after receiving Resume from parent(?): send
Resume to children(?); enable disable? system sends.
(7) Leaf nodes i, after receiving Resume from parent(?): enable dis-
abled system sends.
Figure 3.4: T??ee (non-FiFO channels, all messages consistent)
40
7'
"Cut
Figure 3.5: The three phases of protocol messages sent in a run of the protocol
Tree. Dashed intervals contain no system send events.
41
Proof of correctness of protocol Tree: Suppose that there is a run r of
Tree in which there is an inconsistent message ni from processor? to processor 3.
In order to reach a contradiction, we wish to prove the following claim: for any
processors p and q, the last event ep of p's cut state must happen-before the
first event e?q after q's cut state which could be a system send to p. If we can
prove this, then we have a contradiction, because it will then be the case that for
inconsistent message ni,
H e? H send(i,j,m) H receive(j,?,m) H
and there is causal circularity.
Let p be an arbitrary processor. The last event of p's cut state for p ? I is
send(p,parent(p), Cut). (See Figure 3.5.) Each processor waits to receive Cut
messages from all of its children and then relays the Cut message to its parent.
Since I is at the root of the spanning tree T, this means that
send(p, parent(p), Cut) H receive(I, p*, Cut)
where p* is the last of 1's children from which it receives a Cut message in run r.
This is still an event of 1's cut state (in fact, it is the last event), and thus
happens-before 1's first post-cut system send. So far, then, we have proved our
claim for p ? I and q I.
Now, I does not send Resume messages until it has received Cut messages
from all of its children. No processor q ? I sends a Resume message until it
receives one from its parent, nor can it send system messages before this point.
Again, since I is the root of the spanning tree, we have that
rece?ve(I, p*, Cut) H rece?ve(q, parent(q), Resume)
42
and the claim holds for p I and q ? I.
Combining the two results above, we have
send(p, parent(p), Cut) H receive(I, p*, Cut) receive(q, parent(q), J?e?urne)
and our claim holds in the final case of p, q ? I. 1
3.4 Summary
In this chapter we have given formal definitions of protocols and of consistent-cut
protocols. We have presented three examples of consistent-cut protocols that are
correct under various system and protocol assumptions; we have discussed these
circumstances, and presented proofs of correctness. In discussing these protocols
we have indicated that, in each, some subset of system events must be suspended
by the protocol in order for it to succeed. We now proceed to Chapter 4 where
this phenomenon is formally defined as inhibition, and examined with more care.
Chapter 4
The Inhibition Spectrum
We wish to investigate the degree to which a consistent-cut protocol must inter-
fere with an underlying system in order to determine causally-consistent global
states in the course of an execution. The definitions of this section will allow us to
characterize nine different categories of protocols with respect to their inhibitory
characteristics.
A protocol runs on top of, or in conjunction with, the normal execution of
an underlying system: system and protocol events will be interspersed at each
processor, but the subsequences of system events, taken together, should form a
valid run of the original system, Thus the presence of the protocol should not
prevent the underlying system from performing correctly. What the protocol
may do, however, is to prevent a processor from functioning normally for some
finite interval. For example, it may force a processor to wait upon occurrence of
some protocol event before proceeding with system events that would have been
permitted in the absence of the protocol. We refer to this phenomenon as inhibi-
tion; in [BF88],this concept is termed freezing and is defined for a general notion
43
44
of superimposed processes, We make an important distinction between local and
global inhibition: that is, between cases where a protocol only delays events until
some number of local actions sends and internal events have been performed
and cases where it delays events while a processor waits for communication from
another processor. The latter is in some sense a more serious form of inhibi-
tion, as one processor may have to wait upon the reception of a message from a
much slower neighbour. If processors or communication medium are unreliable
then the wait may be indefinite. We will prove in subsequent chapters that the
character local or globJ??f the inhibition that a protocol uses is a crucial fac-
tor in the existence and efficiency of protocols for determining consistent global
states.
In Chapter 3 we presented three consistent-cut protocols and investigated the
conditions under which they succeed. We will now add questions of inhibition to
our analysis of consistent-cut protocols. Together, these issues produce thirty-
six possible combinations of protocol and system properties; we re-investigate to
which cases our three protocols pertain. In each case for which none of these
protocols suffices, we can prove that no protocol with the appropriate properties
and effects exists. These arguments are contained in Chapters 6 and 7.2; this
chapter ends with a tabular summary of all existence and impossibility results.
4.1 Inhibition
First, we define inhibition essentially as in [Tay89?. We take the intermediate
approach of first defining the notion of disabling an event. Suppose that, in an
original system, a given event could occur immediately following a particular
45
sequence of events. If, in the system created by a protocol ?, the same sequence
of system events, with protocol events interspersed, does not enable that event,
then we say it has been disabled (by the protocol).
Definition 23. Let S = (I,C,t,M,?) be a system and ? be a protocol with
= (1, C, C', M',?Z'). An event e? is disabled in state lj of protocol run r if
the subsequence of system events in l? enables e? in the original system, but ii
does not enable e?:
(SysEvents(l?), e?) Ei Mj
?
Definition 24. A protocol is non-inhibilory if no event is disabled in any run of
the protocol. Any protocol which does disable system events is inhibitory.
A non-inhibitory protocol, then, does not interfere with the running of the
underlying system. Note that there need be only one system S and one run
of ?(S) in which events are disabled in order for the protocol ? to be inhibitory.
4.2 Local and Global Inhibition
We distinguish two types of inhibition, local and global: recall that we wish to
separate cases where a protocol delays events only until some number of local
actions send or internal events have been performed from cases where it delays
events while waiting for communication from another processor.
Definition 25. Suppose event e? is disabled by a protocol in state ii of run r.
We say e? is locally delayed if there exists a run r' such that
1. r'? contains an extension li' of li;
46
2. e? is not disa?bled by the protocol in ij' or any extension of l?'; and
3. l?' --H l? contains no receive events.
Note that in the run r itself, there may well be receive events occurring between l?
and the point at which e? is no longer disabled. However, this may be purely
coincidental; thus provided that there is some run where there is no intervening
receive, we can conclude that the duration of the delay is under local control.
We require that e? not be disabled at any point after (`? to avoid the following
scenario. Suppose in some run r there is an event e? "locally" delayed in lj, and
there is a receive event occurring between lj and the end of the disabling. There
will be another run r' with an extension l'? such that l'? --H l? contains no receive
events and e? is no longer disabled. However, e? may subsequently be disabled
again, say in l'7, and in run r' there is a receive event between l'?' and the end of
the disabling. This could be endlessly repeated: the definition would leave open
the possibility that the last disabling of e? requires communication to terminate
it.
We use the terminology that an event may be locally or globally delayed,
whereas in doing so a protocol exhibits local or global inhibition. Given the
definition of local delay, locally and globally inhibitory protocols are defined in a
straightforward manner.
Definition 26. A protocol is locally inhibitory if any event disabled in any run
of the protocol is locally delayed. An inhibitory protocol that does not have this
property is globally inhibilory.
This implies that, in a globally inhibitory protocol, some events are delayed while
waiting for communication from another processor.
47
Finally, we also consider whether or not send events or receive events are
delayed by an inhibitory protocol. Recall that we will be interested in asking
questions about the degree of inhibition necessary to allow a protocol to distin-
guish possible global states in asynchronous runs. Because the occurrence (or
lack thereof) of internal events cannot effect the consistency of a set of states,
the ability (or lack thereof) to inhibit internal events is not an issue.
Definition 27. A protocol exhibits send inhibition if some (locally or globally)
delayed events are send events. Likewise, a protocol exhibits receive inhibition if
some (locally or globally) delayed events are receive events.
Note that local send and receive inhibition together suffice to allow a pro-
tocol to create, in effect, new "atomic" actions, such as the atomic receive-send
of [Tay89J: the protocol need only delay all other events until the requisite series
of send and internal events has been performed.
Consider now the three consistent-cut protocols of Chapter 3: Floodi, Ftood2,
and Tree. We wish to categorize these protocols according to the type of inhi-
bition they display. Floodi does not allow a processor to send system messages
for an interval following the first reception of a Cut message. Flood2, on the
other hand, disables receive events. In both protocols, the inhibition is only local
because a sequence of send events ends the disabling, with no necessary com-
munication from another processor. Thus Floodi exhibits local send inhibition,
and Flood2 local receive inhibition. Tree, in common with Floodi, disables send
events for an interval during its execution. However, in Tree's case, a message
chain from the initiator, ending with the event receive(p, parent(p), Resume), is
required to re-enable the send events of processor p. This is therefore an exam-
48
Name			Channels			Inhibition			Protocol Messages
Floodi			FIFO			local send			inconsistent
Flood2			FIFO			local receive			consistent
Tree			non-FiFO			global send			consistent
Figure 4.1: Protocol characteristics.
ple of global send inhibition. We summarize the properties of these protocols in
Figure 4.1.
4.3 The Inhibition Spectrum
We have defined three types of inhibition based on whether send events are
delayed locally, globally, or not at all, and similarly for receive events. This results
in nine different combinations of protocol capabilities. For example, a protocol
exhibits local receive inhibition and global send inhibition if it delays both receive
and send events, and no delayed receive event requires communication to be re-
enabled.
Since a globally inhibitory protocol may both globally and locally delay
events, it follows that global inhibition of a particular type of event (send or
receive) is at least as strong as local inhibition of that event. Clearly, local inhi-
bition of a particular type of event is at least as strong as no inhibition of that
event. Thus, for example, when other factors remain constant, global send inhibi-
tion is at least as strong as local send inhibition, which is at least as strong as no
49
send inhibition. This is independent of the type of protocol under consideration.
In sum, we now have thirty-six combinations of system and protocol assump-
tions, the issues being whether:
o+ the protocol works for non-FiFO systems or only FIFO systems;
o+ all messages or only system messages must be consistent with respect to
the cut designated by the protocol;
o+ the protocol uses no, local, or global inhibition;
0 send events, receive events, or both must be delayed.
Figure 4.2 illustrates which CCPs are successful under varying system and proto-
col characteristics. For every case in which none of these protocols can be used,
we can demonstrate impossibility as designated by IMP. Specific impossibility
results for the IMPt cases are presented in the following chapters; the remaining
impossibilities are immediate consequences of those results.
Recall that we do not assume that processors can choose to receive some
messages but not others on a single channel; the appropriateness of this capability
depends on the level of software being modeled. Also, the behavior of "selective
receive" is ambiguous in the case of FIFO channels. If this mechanism is allowed
in non-FIFO systems, then there is a protocol symmetric to Tree, in which
receives are globally disabled rather than sends; it is successful regardless of send
inhibition and protocol message consistency. This case is less interesting from
the standpoint of understanding the precise limitations on attaining CCPs.
50
No			Receive			IMP			IMP			IMP			IMPt
Inhibition
Send			Local
Receive			IMP			Flood2			IMP			Flood2
Inhibition			Inhibition
Global
Receive			IMP			Flood2			IMP			Flood2
Inhibition
No
Local			Receive			IMP			IMPt			IMP			Floodi
Inhibition
Send			Local
Receive			IMP			Flood2			IMP			Floodl,Flood2
Inhibition			Inhibition
Global
Receive			IMP			Flood2			IMPt			Floodl,Flood2
Inhibition
No
Global			Receive			Tree			Tree			Tree			Tree,
Inhibition			Floodi
Send			Local
Receive			Tree			Tree,			Tree			Tree,
Inhibition			Inhibition			Flood2			Floodi ,Flood2
Global
Receive			Tree			Tree,			Tree			Tree,
Inhibition			Flood2			Floodl,Flood2
Figure 4.2: Consistent-cut protocols and impossibilities in the inhibition spec-
trum.
51
4.4 Summary
In this chapter, we have given a formal definition of inhibition. We have dis-
cussed the different degrees of inhibition that a protocol can display, distinguish-
ing between local and global inhibition, send and receive inhibition. We have
re-examined the three consistent-cut protocols presented in Chapter 3 to analyze
their inhibitory behaviour. We have indicated to which cases in our thirty-six
combinations of protocol and system assumptions these three protocols apply.
We claim, and will demonstrate in the following chapters, that in all cases to
which none of these protocols pertain, no consistent-cut protocol with the requi-
site characteristics exists.
Chapter 5
Preliminary Lemmas
In this chapter we present six lemmas that will be useful in the impossibility and
complexity results of the following chapters. The first five lemmas concern the
structure of the collection of runs in asynchronous systems in general. Lemma 1
examines the relationship between partial runs and completed runs; the proof is
based on an argument presented in [Tay89]. Lemmas 2 and 3 demonstrate how
the reception of messages can be "pushed back"; that is, a given receive event can
occur almost arbitrarily early in the course of a system's execution. Lemmas 4
and 5 discuss the connection between receive events and the flow of potential
causality. The proofs of the first three of these lemmas originally appeared in
[Tay89]; we present them here for completeness. The final lemma of the chapter
concerns the structure of the set of runs containing locally delayed events.
5.1 Partial and Completed Runs
In Chapter 2 we discussed how a vector of local states could represent a simulta-
neous global state only if certain conditions hold; we called such a vector a partial
52
53
run. The first lemma here demonstrates that the name "partial run" is justified
in that any partial run can indeed be extended to a completed run. Thus, a
vector of local states can represent the global system state in the course of an
execution if and only if it is a partial run.
Lemma 1. If r is a partial run of system S, then there exists at least one com-
pleted run r' of S that extends it; i.e. for all ?, r? is a prefix of Yj
The proof of this first lemma is based on an argument in [Tay89J, which is
in turn derived from ideas in tLT87]. The proof in [Tay89] assumes that the
enabling relations of the processors are finite-branching; i.e. for each processor t
and each local state ii of?, there are only finitely many events e? E Ej such that
(ti, ci) ? Mj. This assumption was relaxed in subsequent work [CT90b], and the
proof accordingly modified. The proof given below is again a modification of this
later proof. The changes arise because we wish to guarantee that every message
that is sent is eventually received by the processor for which it is destined; this
is not true of the construction given in [CT90b].
Proof: We construct a (possibly infinite) completed run r' from r by iterating
through the set of processors and adding appropriately enabled events to the
current partial run. We initialize r' to r. For each processor i, we let Lj be
a list of all receive events rece?ve(?,j, m) such that send(j, i, m) appears in
but receive(i,j, m) itself does not yet appear in r?'. If the system in question
is FIFO, we make sure that the receives appear in Li in the appropriate order,
i.e. for each 5, the events receive(i,j, m) appear in Lj in the same order as the
events send(j,i,m) appear in r;.. Upon each iteration, for each processor i, we
traverse the list Lj until we reach a receive event which is enabled by the current
54
state r;; the event is deleted from the list and added to r'?. We then proceed
to the next processor. If there is no such enabled receive event, but there are
send or internal events that are enabled by r'?, then we arbitrarily pick one of
these and add it to r'?. If the added event is send(i,j, m), then we also add the
event receive(j, i, m) to the back of the list L5. Again, we then proceed to the
next processor. If there are no enabled events in Li, and no enabled send or
internal events, then r'j is left unchanged, and we continue directly to the next
processor. Iterations through the set of processors continue indefinitely, unless
on some iteration no event is added to any local state.
We first show that either the iterations terminate or else every processor is
visited infinitely many times during the course of the construction. To do this,
we must assure that at each stage, and for each i, the list Li is finite; otherwise
the step "traverse the list Lj" will not terminate. This is certainly the case in
the first iteration, since a partial run contains only finitely many events and thus
only finitely many sends to any processor. Since Li consists of receive events for
which the corresponding send events are present, Lj is initially finite. At each
iteration, at most N send events are added to the partial run, and therefore at
most N receive events are added to all the lists Li together. Thus each Li remains
finite at each stage in our construction.
We must show that this construction results in a completed run. R?ecall that
a partial run is a set of local states such that (1) the ?th local state is a feasible
state of processor i, (2) the happens-before relation is an irrefiexive partial order
on the events in the set, and (3) for each receive event there is a corresponding
send event in the appropriate local state. Clearly each step of the construction
55
results in a new partial run, since the original r' (r) is a partial run and all added
events have the necessary enabling local states and (in the case of receive events)
corresponding send events. Also, new events cannot create causal circularity since
they do not happen-before any event in the current partial run: the only way
an added event could happen-before an event of the current partial run would
be for the added event to be send(i, j, m) where rece?ve(j, ?, m) is already in ??
This is impossible since there are no unmatched receives in the original r', and
no receive is added unless the corresponding send is already present.
It is also necessary to show that the additional constraint on runs is satisfied,
i.e. for each i either (1) r'? is infinite, or (2) there is no event e? enabled by r1,,
unless it is the reception of a message from j and there is no corresponding send
event in r?'?. Suppose r'? is finite. This implies that after the last iteration of our
construction in which r?? changed, there were no send or internal events enabled
by r?'.. Property 2 of enabling relations ensures that therefore all receive events
from all neighbours of i are enabled by r?'. If, during subsequent iterations through
the processors, the event send(j, i, m) had been added to r;., then the event
rece?ve(i,j, m) would have been added to Lj at the same time. Following our
construction, receive(i,j, m) would then have been added to r'? in the following
stage. The fact that r'? did not change implies that Li remained empty, and so
no such send events could appear after the last iteration at which r?" changed.
Therefore the only events enabled by r1, are indeed receive events for which the
corresponding send events are missing. Thus r' is indeed a run.
Finally, we need to show that the run is completed, i.e. that every message
that is sent in the run is also received. Note first that if the event send(j, i, m) is
56
added to at some stage in the construction, then the event receive(i,j, m) will
be added to Lj at the same time. We must now ensure that every receive event
added to Li is eventually deleted from it. We have already argued that if r'? is
finite then Lj will be empty, so we need only consider the case where r'? is infinite.
By Property 3 of enabling relations, for each neighbour j of i, there are infinitely
many prefixes of r? at which receives from j are enabled. Since receive events are
added to r2, in preference to send or internal events, this means that infinitely
many receive events may be added. Also, the list Li is always traversed from
front to back, and the first enabled receive event that is encountered is added to
the local history. Since any element of Li is preceded in Lj by only finitely many
other elements, this guarantees that each receive event will eventually be added.
If the system is FIFO, we must guarantee that all messages are received in
the appropriate order. In the initial step, receive events are placed in Lj in
the appropriate order; subsequently they are added at the same time as the
corresponding send event is added to a local state. Thus for each j, the receive
events receive(i,j, m) are added to Lj in the same order as the events send(j, i, m)
are added to r;.. At each step, the first enabled receive encountered in Lj is
removed. By Property 1 of enabling relations, it is impossible that the reception
of one message from a given processor be enabled, while that of a second message
from the same processor is not. Therefore the receive events receive(i, j, m) will
be added to r'? in the correct order for each j. I
Note that Properties 2 and 3 of enabling relations are required for this proof.
If these properties did not hold of the enabling relations in our system, we would
be unable to guarantee that the partial run could be extended to a completed run.
57
If a particular receive event is not at the head of a processor 5 receive list at one
of the finitely many points at which it is enabled, then we will have a message
that is sent but not received. In this case, we would have a slightly weaker result:
namely, that a partial run of system 5i could be extended to a completed run of
some system 52 with identical processor, channel, and event sets. S2 need only
have enabling relations which are supersets of the corresponding relations in Si,
and which in addition do satisfy Properties 2 and 3. In this system the proof
would proceed as above.
An immediate consequence of Lemma 1 is the following: if r is a run of
system 5, (11,..., in) is a consistent cut in r, and for some i there is a send event e?
which is enabled by ij, then there is a run r' of 5 in which e? immediately follows i?.
This is true because (1i, .., ii e?, ..., 1N) is again a partial run and can therefore
be extended to a completed run. Similarly, if e? is an internal event, or a receive
event which does not itself appear in ???...., 1N) but whose corresponding send
does, then there is a run of 5 in which e? is the event following ii.
5.2 Causality and Receive Events
Our second lemma states that the event rece?ve(j, i, m) may occur at processor
j as soon as all events of j which happen-before send(i, j, m) have occurred.
This is true because of the asynchrony of our systems: message transmission
may be arbitrarily fast, and the receiving processor proceed arbitrarily slowly in
comparison. The only constraint is that a message cannot actually travel back
in time, and be received before some event which precedes its sending.
58
M
j
send(i, j, m)
mm
ii
e5 receiv?,z,m)
ii
? send(i,j,m)
rece?ve(j, z, mT
lj
(a)
Figure 5.1: Lemma 2. The cancelled dashed arrow indicates "not happens-before"
and the solid arrows connect the sending and reception of m.
Lemma 2. Let r be a run such that i? send(i,j, m) is a prefix of r? and tj e5 is
a prefix of ry. Suppose that recezve(j, i, m) is enabled by ly and that it has not
occurred already in ly. Then if ej does not happen-before send(i,j, m), there is a
run r' such that ii. send(i,j, m) is a prefix of r'? and ly receive(j, i, m) is a prefix
of r?'. (See Figure 5.1.)
Proof: The proof proceeds by first showing that there is a partial run r" of
S where ii send(?,j, m) is a prefix of r'7 and = ly (without e5). Then
recezve(j, ?, m) is added to r?, still resulting in a partial run which can be ex-
tended to a completed run r1 by Lemma 1.
We find a partial run which satisfies the conditions on r" above, iteratively.
(It is not obvious that such a run exists. Perhaps any combination of local
59
states having the required prefixes for i and j will have a message m potentially
involving other processors which is received but not sent.) Let MinSends(1?,q)
be the minimum local state of processor q in run r which includes the sending
of all messages received by p from q in local history l?. Let max(11,12), where ii
and 12 are local states of a single processor in a single run, be the larger sequence
of the two (and therefore encompassing the other). Initialize a vector of local
states s as follows: 55 1?; Sj max(4 . send(i,j,m),Min5ends(t5,i)); and
5p = MinSends(15,p) for p not equal to z or j. We call this the initial state
vector; of course, this initial set of states is not necessarily a partial run. On
each step of the iteration, find a message m' from any processor p to any other
processor q such that receive(q,p,?I) is in s but send(p,q,m') is not (if such a
message does not exist then we are finished). Add events from rp to 5p until the
necessary send is included. In order to meet the conditions of the theorem, we
must show that (1) the iteration terminates, and (2) the local state of 5 is not
changed, i.e. upon termination 55 = lj.
We first make an observation to be used extensively in the remainder of the
I.
proof: for any inconsistent message rn in the above construc?ion, there is a
message chain beginning with m' and ending with a message received within the
initial state vector. Consider any message m' as above; either receive(p,p??m')
is in the initial state vector, or it is included as the result of adding events until
the sending of some rn?? is included, so receive(p,pt,m') H send(p,pfl,m"). This
argument can be continued resulting in a chain of messages
5efld(p?,p,mI) H receive(p,p?,mI) H send(p',p",m") H
receive(p1,,p?,m") ... H receive(p0,pfl,m0)
60
Figure 5.2: Proof of Lemma 2: message d?ains during iterative construction.
Al"			Mi'
where receive(p0, pfl, m0) is in the initial state vector. (See Figure 5.2. The solid
circles mark the extent of the initial state vector.)
Suppose that the iteration above never terminates. Then there must be mes-
sage chains as above where two messages m1 and m2 are sent by the same proces-
sor. If send(q, qi, mi) H send(q, q2, m2) in the message chain then send(q, q?, m?) H
send(q, qi, m1) also because there is a local state which includes send(q, q', m2)
but not send(q, q', m1) (see Figure 5.3). This cannot occur in any run, hence the
iteration terminates. Suppose that during the iteration state Sj is changed, due
to a message ??? that is, there is a send event send(j, p, m') such that either
send(j, p, m') or e5 H send(j, p, m'), and send(j, p, m') must be included
in s?. Again, there must be a chain of messages ending with a message m0
such that receive(q, q0, m0) occurs in the original states and send(j, p, m')
receive(q, q0, m0). There are three cases to consider, depending on what q is.
(1) If it is a processor other than ? or j, then by the definition of the original set
of states s and MinSends there must be a message mq sent after receive(q, q0, m0)
from q to j such that receive(j, q, mq) .` e>. However, since either e5 = send(j, p,
61
qi
q
Figure 5.3: Proof of Lemma 2: send(q, qi, rni) send(q, q?, m2) in the message
chain, but send(q,q2,m2) H send(q,qi,mi) at processor q.
or H send(j,p,m'), we have
send(j,p,m') H receive(q,q0,rn0) H send(q,j,mq)
receive(5, q, mq) H send(j, p,
and an invalid circularity results. (2) If it is processor j then there is a cir-
cularity in the happens-before relation as in the proof of termination above.
(3) If it is processor i then either (a) m0 arrives before send(i,j, m) or (b) m0
arrives after send(i, j, m) but before the sending of a message m? to j as in
the definition of MinSends. In case (a), send(j, p, in') H receive(i, q0, in0) and
receive(i, q0, in0) H send(i, j, in). However, since either e? H send(j, p, in'), or
e5 send(j,p,in'), this violates the assumption that e5 7+ send(i,j,in). Case (b)
is analogous to case (1). Hence the final Sj is equal to
Let r11 be the final s of the iteration. Then r" is a partial run of S such that
= i? and i? send(i, j, in) is a prefix of r'1. Since rece?ve(j, i, in) is enabled by ij
62
in system 3, adding event recezve(5, z, m) to still results in a partial run of S.
By Lemma 1 this can be extended to a completed run r1 as the theorem states.
Lemma 2 does not quite hold for a FIFO system because adding the reception
of m could violate the FIFO condition. If, however, there is no message m' that
is sent to 5 prior to m but is not received in i?, then adding receive(j, i, m) to 1?
cannot violate the FIFO condition. This fact, along with the proof of Lemma 2,
is sufficient to state the lemma's FIFO counterpart without proof.
Lemma 3. Let run r of FIFO system S contain events on processor 5 and
send(i,j, m) on processor z, i.e. there are local states i? and i? such that ii
send(i, 5, m) is a prefix of Tj and i? % is a prefix of r5. Also, let i? ? not contain
recezve(j, i, rn) and let i? enable receive(j, ?, rn). If e5 74 send(i, 5, m) and there is
not a message m' from i to 5 sent before m but not received in l?, then there exists
a run r1 also in S such that i? send(?,5, m) is a prefix of 4 and (? receive(j, z,
is a prefix of
Potential causality between two events at different processors must be trans-
mitted by a message between those two processors, and through the reception of
that message. If an event e? of i happens-before an event e? of 5, then if e? is
not a receive, e? must happen-before the event immediately preceding e5 as well.
This is the conclusion of the fourth lemma.
Lemma 4. Let r be a run, e? a non-receive event occurring in r?, and 4 the
event occurring immediately before e? in r?. Then for any event e? occurring in
r5, if e1 H e? it must be the case that ? 4.
63
Proof: Since e5 and e? are on different processors and e? is not a receive, e5
cannot happen-immediately-before e?. Therefore, by the definition of happens-
before, there exist events e?1, ..., e?? such that e5 H e?1 H ...e?? H e?. Since e? is
not a receive, only e'? can happen-immediately-before it. Therefore e?? = e'? and
H+ e?1... `?` e'?, so			H e'?. I
Repeated applications of Lemma 4 lead to the following result.
Lemma 5. Let r be a run and e? a non-receive event occurring in r?. Suppose
that i? e'? e02 e?? is the state immediately preceding e? in r?, and that none of
the events e0? through Ct??i5 a receive event. Then for any event e? occurring in
r?, if e5 e? it must be the case that % H
Proof: Lemma 4 implies that e5 H c2??, which, together with Lemma 4 again,
__ n--H1
in turn implies that e5 e? , and so on. I
5.3 Local Delay
Our final lemma concerns the following consequence of local delay: if a set of
events has been locally delayed in the states of a partial run, then in some ex-
tending run, each event is re-enabled with no intervening receives. Note that this
is not immediate from the definition of local delay: that definition guarantees
only that for each delayed event, there is some run where no communication is
required to end the disabling. We need to show that it is not the case that in
each of these runs, some other event must wait upon reception of a message to
be re-enabled.
Lemma 6. Let r be a protocol run; ....... , 1N) be a consistent cut of r; Pi.?? , Pk
be (not necessarily distinct) processors; and events e?1,... ,ep? be locally delayed
64
by the protocol in states `p1... 1Pk' respectively. Then there is a completed
run r' such that (1i. tN) is a consistent cut of r', and, for i = 1,..., k, there
are no receive events between `p? and a state in which e?? is no longer disabled.
Proof: First we apply the definition of local inhibition. There must exist, for
each p?, a run r?? containing an extension of `p?? such that --H `p? contains
no receive events and e?? is not disabled in Let e?1, ....... , e?? denote the
subsequence of non-receive events between `p? and In order for run rrn to exist,
there must be pairs (`p?' ci1), (`p? e?1, e?2), and soon in the enabling relation for?.
Consequently, we can add each of these events to the consistent cut ....... 1N).
resulting in a partial run of the system. (Note that, because these events are not
receives, there is no causal circularity introduced, and no receive event is added
for which the corresponding send will be missing.) This generalizes to all other
processors. Since any partial run of a system can be extended to a completed run
of the system, by Lemma i, the resulting completed run satisfies the conditions
for r' in the theorem statement. I
5.4 Summary
In this chapter, we have presented and proved six lemmas. The first demon-
strates the correspondence between partial and completed runs, and is used in
the proofs of three of the remaining five lemmas. These three lemmas examine
the relationships between the various runs of a given system. The remaining
two lemmas trace the connection between receive events and potential causality.
All these lemmas will be used extensively in the proofs of our impossibility and
complexity results in the following chapters.
Chapter 6
Impossibility Results for
Non-FIFO Systems
In this chapter we present impossibility results for the non-FiFO cases in Fig-
ure 4.2 of Chapter 4. According to the claims made in this figure, global send
inhibition is necessary (and sufficient) for a consistent-cut protocol for non-FiFO
systems, regardless of whether or not protocol messages may be inconsistent in
the designated cut. To prove our claim, we assume the possible use of the ma?i-
mum degree of inhibition other than global send inhibition: that is, global receive
inhibition and local send inhibition. We then show that these are not sufficient
for a protocol to produce cuts in which system messages are guaranteed to be
consistent. We do this by displaying a possible run of the protocol in which there
is an inconsisteny system message. The impossibility results for the remaining
non-FiFO cases then follow immediately, as in these cases the degree of inhibition
involved is weaker and/or the requirements on the designated cut are stronger.
Given that there is no consistent-cut protocol with a particular kind and
65
66
degree of inhibition which is successful for all non-FiFO systems, we might wish to
moderate our demands, and consider whether there are CCPs with the requisite
characteristics that are successful for some subset of non-FiFO systems. For
instance, might there be a consistent-cut protocol using, say, local send inhibition
only, which is correct for all non-FiFO systems in which the processor network
has a tree structure? If we specify a class of systems by some physical property of
the processor network, then we can show that, once again, there is no consistent-
cut protocol using up to global receive inhibition and local send inhibition that
is successful for any class of non-FiFO systems.
6.1 The Impossibility Theorem
The theorem presented below corresponds to the third IMPt case in Figure 4:2;
all other impossibility results for non-FiFO systems in Figure 4.2 are then direct
consequences of this theorem.
Theorem 1. Using up to global receive inhibition and local send inhibition, there
is no consistent-cut protocol for non-FIFO systems. Moreover, this is true even
if we require only that system messages be consistent.
In order to prove Theorem 1, we need only prove that, for any protocol satis-
fying the conditions of the theorem, in some run of the protocol with respect to
some system, the cut produced by the protocol is inconsistent. We can therefore
make worst-case assumptions about the enabling relations on sequences of system
events. We then show that, for any CCP as described, there is necessarily a run
with the following property: for some two processors, there is causal circularity
67
"i
`I
1;?e5
I,
Cj
ej
send(mi5)
run
send(ni5?)
Figure 6.1: Proof of Theorem 1. Dashed lines are intervals in which no receive
events occur, arrows represent happens-before, and `X' marks cut states. (A),(B)
follow from Lemma 2. (C),(D) then follow from Lemma 4. (D) and absence of
(E) imply (F).
between the last states preceding their respective cuts in which they are willing
to receive messages from each other.
Proof of Theorem 1: Given a run r of the protocol, let l? e? be the cut
state of p in r for each processor p. Let i be any processor. Let ii' be the last
proper prefix of i's cut state in which receives from some processor are enabled
and ??? be the event following ij' in r. Let j be any processor receives from which
are enabled in 1?1; from the manner in which l?1 was chosen, there must be at
least one such processor. Let lj1 be the last proper prefix of j's cut state in which
receives from some processor are enabled and e?1 be the event following 1?1 in r.
Let lj11 be the last proper prefix of in which receives from i are enabled and
be the event following i;'. (See Figure 6.1. Note that some of the receive-free
intervals may be empty; e.g., it may be that e? =
68
Let send(z,j, mii) be a send event of processor? that is enabled by the subse-
quence of system events in i? e?. Let send(j, ?, m5?) be a send event of processor
j that is enabled by the subsequence of system events in i? e?. The proto-
col may inhibit the occurrence of send(?, j, mq) in i? e? and of send(j, i, m5?) in
ii. %. However, since this inhibition can only be local, by Lemma 6 there is a
run r1in which l? e? is a prefix of rp? for each p, and no receive events occur
between i? e? (respectively ii e5) and the state in which send(?,j, m??) (respec-
tively send(j, i, m??)) is re-enabled. We can also assume that send(i,j, m?5) and
send(j, i, m5?) occur as soon as they are re-enabled. We consider r
We know that the event rece?ve(?,j, m5?) does not occur in l'?; if it did, the
cut would be inconsistent. We also know that receive(i, 5, m??) is enabled by 12",
by the definition of t'? Suppose that e'? /+ send(j, i, m1i). Lemma 2 then implies
that there is some run r" in which t?', receive(?,j, m??) is a prefix of r7 and
o+ e5 ... o+ send(j, ?, m??) is a prefix of r;.'. In this run, neither it" nor any prefix
of t1? can be i's cut state: if it were, the same local state would be i's cut state
in run r' (by condition (2) of the definition of CCP), which is not the case. Also
by condition (2), i? e? must be 5's cut state in r11o+ In sum, r11 is a possible
run in which the message m?? is sent after 5's cut state and received before i's,
and the cut produced by the protocol is inconsistent. Thus the assumption
that e1? /+ send(j, i, mii) in run r1 leads to a contradiction. Therefore it must
be the case that e1? ` send(j, i, mii). A symmetric argument establishes that
`I
ej H send(i,5,mii). (See arrows (A) and (B) in Figure 6.?).
Recall that no receive events occur in 5's cut state after ?", since l?' is the
last prefix of 5's cut state in which receives from any processor are enabled.
(39
Also, by the choice of ???, no receive events occur between e? and send(j, ?, m??).
By applying Lemma 5, we have that e'? send(j, i, inji) implies e'? H e'?. Simi-
larly, there are no receive events after e'? and before send(i, j, mij). Therefore if
e?11 H send(i,j, m??), it must be the case that H e'?. (See arrows (C) and (D)
in Figure 6.1.)
We claim that there can be no message sequence starting between e' and
e? and ending between %ll. and e?' (the cancelled arrow (E) in Figure 6.1). If
we can show this, then e? H e5 implies e? H % (arrow (F)) and the happens-
before partial order will be violated. So we suppose that there is such a message
sequence. It must have length at least two, since receives from i are disabled
at j in the interval of interest. Suppose the first event of the message sequence is
lt
= send(i, k, m) to some neighbour k of i. The event receive(k, i, m) must occur
in k's cut state; if it did not, then the remainder of the message sequence will
start after k's cut state and end before j's cut state, thereby necessarily including
an inconsistent message. We can assume that either the event immediately after
send(i,j, m??) is a system send event send(i, k, mik) to k, or that system sends to
k have been disabled by the protocol in the state i? ..... . . send(?, j, m?j). In the
latter case, we can consider (again by Lemma 6, since the delay can only be local)
a run which is identical to r' up through send(i,j,mij), send(j,i,m?i) and k's cut
state, and in which there are no receive events occurring between send(i, j, m?5)
and the state in which some system send to k is re-enabled. Assume that the event
following this state is send(i, k, mi?). In either case send(i, k, mi?) immediately
following send(?,j, m??), or after a sequence of non-receive events necessarily
recezve(k, i, m) 74 send(i, k, mik). Otherwise, since there are no receive events
70
between e'/ send(i, k, ni) and send(?, k, niik), recezve(k, i, ni) H send(i, k, mi?)
would imply (by Lemma 5) that receive(k,?, m) H = send(i, k, m), resulting
in circular causality. Now, since receive(k, i, m) 74 send(i, k, mi?), we can apply
Lemma 2 and produce a run in which the message ?ik is inconsistent. Therefore,
by contradiction, there is no message sequence beginning between e'? and e? and
ending between e?ll and
____ I,			____
Thus e? H e5 implies e? e5. We have already shown that e'? e,'. and
that H e??; consequently, there is causal circularity in run r1. This of course
is impossible. Therefore it must be the case either that e'? 74 e,' and hence
e2, 74 send(j, i, rnji) or that e?11 74 e'? and hence e?11 74 send(i,j, m??). Suppose for
definiteness that it is the former. Again we invoke Lemma 2 and condition (2) of
Definition 22 to show that there is a run in which the message rflji is inconsistent.
Since m>? is a system message, our proof is now complete. I
We noted in Chapter 4 that if one allows the selective reception of messages
from a given neighbour of a processor, then there is a CCP using global receive
inhibition which is successful for non-FiFO systems. The proof of our impossibil-
ity theorem should therefore require the non-selectivity assumption; and in fact
it does. For instance, the argument that if receive(k, z, m) can occur in k's cut
state then receive(k, i, mi?) can occur at the same point, is no longer valid.
We have shown that a consistent-cut protocol must use global send inhibition
if it is to be successful for non-FiFO systems in general. One might next ask
whether there is some class of non-FiFO systems for which CCPs with lesser
degrees, or different kinds, of inhibition might exist. The answer, however, is
negative. If we specify a class of systems by some property of the processor
71
network, such as the number of processors or the network topology, then there is
no class of non-FiFO systems for which anything less than global send inhibition
sullices.
Corollary 1. Let K be a class of non-FiFO systems 5 = (I,C, ?, M, ?), where
membership in K is based solely on properties of the sets I and C. Then there
is no consistent-cut protocol using up to global receive inhibition and local send
inhibition that is successful for all systems 5 in k. Moreover, this is true even if
we require only that system messages be consistent.
Proof: The only network characteristic that was exploited in the proof of
Theorem 1 is that there be a pair of neighbouring processors. This is true of every
system in Ic, as a system is required by definition to have a connected channel
set. Since k contains all systems having the required network property, we can
again make worse-case assumptions about the enabling relations on sequences of
system events. The proof then proceeds exactly as for Theorem 1. I
To be strictly accurate, we should rule out the possibility that Ic consists of
all non-FiFO systems in which there is only one processor Since the claim of
such a system to be "distributed" is disputable, we ignore this caveat and state
our corollary as above.
6.2 Summary
We have shown that global send inhibition is necessary in order for a consistent-
cut protocol to be successful for non-FiFO systems: using up to global receive
inhibition and local send inhibition, there is no such protocol. This result holds
whether or not protocol messages would be allowed to be inconsistent. The
72
argument used to prove Theorem 1 suffices to prove an even stronger result: that
global send inhibition is required when designing a consistent-cut protocol for
any class of non-FiFO systems. Theorem 1 also suffices to imply the remaining
impossibility results of Figure 4.2. The impossibility results for FIFO systems
will be presented in the following chapter.
Chapter 7
Impossibility Results for FIFO
Systems
This chapter contains the theorems corresponding to the two IMPt results for
FIFO systems in Figure 4.2 of Chapter 4:
Theorem 2. There is no non-inhibitory CCP for FIFO systems, even if protocol
messages are not required to be consistent.
Theorem 3. There is no CCP for FIFO systems that local send inhibition and
no receive inhibition, if protocol messages are required to be consistent.
Recall from Chapter 4 that there is a consistent-cut protocol (Floodi) for FIFO
systems which uses local send inhibition; however, Floodi does produce cuts in
which there are inconsistent protocol messages. Theorem 3 asserts that it is
impossible to prevent this phenomenon using local send inhibition only. Theo-
rem 2 demonstrates that some form of inhibition is always required, whether or
not one is concerned with the consistency of protocol messages. The remainder
73
74
of the impossibility results for FIFO systems as illustrated in Figure 4.2 follow
immediately from Theorems 2 and 3.
The proofs of these theorems are fairly similar: that of Theorem 2 first ap-
peared in [Cri89], and that of Theorem 3 in [CT90b]. We present the proof of
Theorem 3 first here, as it is slightly the more involved of the two. We then
discuss the modifications of this argument required to prove Theorem 2. In both
proofs, we assume the existence of a CCP with the required characteristics and
derive a contradiction. The proofs differ in that in the non-inhibitory case, one
can assume that the event immediately following a cut state is a system send,
while in the presence of local send inhibition, one cannot. The proof of Theorem 2
produces a possible run in which one of these system messages is inconsistent.
The proof of Theorem 3 also produces a possible run in which there is an incon-
sistent message; however, due to local send inhibition, this message can be made
to be a protocol message. This is the intuition behind Floodi.
The arguments of this chapter are more complicated than those of the im-
possibility proof for non-FiFO systems. The extra difficulty arises because one
cannot as easily "push back" the reception of messages in a run. In the non-FIFO
case, it is sufficient to show that a causal relationship could not hold between
particular events; here, we must ensure in addition that the FIFO condition is
not violated. Thus the fact that the system is FIFO can be made to work for a
hypothetical protocol, and against our attempt to derive a contradiction.
A sketch of the development of our proof of Theorem 3 is as follows. We
assume that there is a CCP for FIFO systems which utilizes local send inhibition
only, and which guarantees that both system and protocol messages are consis-
75
tent. Through a sequence of lemmas we derive a contradiction. First we show
that if event e? is the last event of i's cut state and event e? is in j's cut state,
then e? H e? implies that in fact e? and e5 are the sending and receiving of a
single message, and e5 is the last event of j's cut state (Lemma 8). Next we show
that, if such a message does not occur from i to j, then there must be an "essen-
tial" message that j sends to i before reaching its cut state; the reception of this
message causes i's cut state to occur if it has not previously done so (Lemma 9).
Next, by Lemmas 10 through 13, we prove the existence of a run in which no
cut states occur along a single message as in Lemma 8 and therefore every pair
of processors must exchange essential messages; furthermore, for one processor,
say k, all of the essential messages which it sends arrive after its neighbours' cut
states. Finally, our argument concludes with a proof that there is a possible run
as in Lemma 13, but in which k is forced to reach its cut state before sending
an essential message to one processor. The absence of this message allows an
inconsistent cut to be produced by the protocol.
7.1 Preliminary Lemmas
We now proceed to the sequence of lemmas required for the proof of Theorem 3.
Throughout the remainder of this chapter, we assume that systems are FIFO.
Furthermore, we assume that there exists a consistent-cut protocol for FIFO
systems that uses only local send inhibition, and that guarantees the consistency
of all messages. Again, we can make worst-case assumptions about the enabling
relations on sequences of system events. In particular, for simplicity we assume
that all system receives are always enabled.
76
The following lemma states that if one processor reaches its cut state imme-
diately upon (after) sending a message, then if the receiving processor has not
already reached its cut state, it must do so upon reception of that message. If it
does not, then it leaves open the possibility of a post-cut message being received
before the cut state is reached.
Lemma 7. Let r be a run of the protocol, and suppose that j's cut state in r
occurs upon the send of a message to?. Then i's cut state in r must occur before
the corresponding receive, or immediately after.
Proof: Suppose j's cut state is tj send(j, i, m), and i's cut state does not occur
upon or before receive(i,j, in), i.e. there is at least one event after receive(i,j, in)
which is contained in i's cut state. Let l? be the prefix of i's local history
up to and including receive(i, j, in), and e? be the event immediately follow-
ing receive(i, j, in). We assume that some system send to i is enabled by the
subsequence of system events in j's cut state. Suppose this send event is dis-
abled by the protocol for some (possibly zero) number of states following j's cut
state. By Lemma 6, we know there is some run ?? with identical cut states and
in which there are no receive events between j's cut state and the re-enabling
of this send event. We can also assume that this send occurs as soon as it is
re-enabled. Let send(j, i, in') be the first send to i occurring after j's cut state
(i.e. in' is either a protocol message sent in the receive-free interval following j's
cut state, or it is the system message which we have assumed occurs at the end
of this interval see Figure 7.1 (a).)
If e? H send(j, i, in') then by Lemma 5 and the fact that no receives occur
between send(j, i, in) and send(j, i, in'), we have e? send(j, i, in). However,
ii
77
(a)
ii
j
v
Figure 7.1: Proof of Lemma 7: (a) run r1; (b) run r". Dashed lines represent
receive-free intervals.
send(j, i, m) H receive(i,j, m) H e?, resulting in causal circularity; therefore, it
must be that e? 74 send(j, z, m'). Also, since the system is FIFO and m is received
at z before e? occurs, all messages sent from j to i before ?? have been received
before the occurrence of e?. Therefore by Lemma 3, there is a run r11 in which
ii receive(i,j,m') is a prefix of r'?', and lj send(j, ?, m) ... send(j, i, m') is a
prefix of r?" (see Figure 7.1(b)). The cut state of i in run r1, cannot 6ccur before
recezve(i,j, m'), or it would have done so in run r1 (condition (2) of the definition
of CCP). Therefore in run r'1, m1 is sent after j's cut state and received before i's,
and the cut is inconsistent, which is a contradiction. I
The next lemma states the circumstances under which the last event of one
cut state can happen-before some event in another processor's cut state: namely,
the two events must be the send and receive of a single message, and the latter
event must be the last event in the cut state of the processor at which it occurs.
Lemma 8. Let r be a run of the protocol. Suppose that i's cut state in r is
78
e?, and e? e? where e5 is an event in j's cut state. Then e? is send(z,j,m)
and eJ is recezve(j, i, m) for some message m, and e? is the last event of j's cut
state.
Proof: Since e? H %` there must be a sequence of messages mo,. . - , ?? such
that
1.			e? H send(i,p0,rn0) or e?			send(?,po,rno);
2. recezve(pt,pt?i,rn?) H send(m,pt+1,mt+i) for each t; and
3.			recezveY,pn?i,mn) H			or e?			rece?ve(5,pn--Hi,mn).
Suppose that e? ? send(i,p0,m0). Then send(z,p0,m0) occurs after i's cut
state; however, ? and therefore receive(j,pn?i,mn) occur before j's cut state.
This implies that one of the messages mo...., m? is inconsistent. Therefore
Cj = send(i,p0, rno)- By Lemma 7, processor Po must reach its cut state be-
fore or immediately upon reception of mo. Consequently, we must have mo
m?: otherwise, ..... . , ?? would contain an inconsistent message. Now, we
know that either receive(5, i, mo) H e> or e5 = receive(j, i, mo) (see (3) above).
Again by Lemma 7, j's cut state must occur, at the latest, immediately af-
ter receive(j, i, mo). Thus receive(j, i, mo) -. e5 would imply that e5 is not an
event of 5's cut state, contradicting the assumptions of the lemma. Therefore,
= receive(j, i, mo), and is the last event of 5's cut state. I
7.2 Primary Lemmas
The question of whether a pair of processors' cut states occur as in Lemma 8,
i.e. immediately upon the sending and reception of a single message, is a critical
79
one. If they do, then there is no possibility of an inconsistent message between
the processors: any such message would violate either the FIFO condition or
the happens-before partial order. If they do not, then there could potentially be
an inconsistent message between the processors. We therefore use the following
terminology: in a run r, i's and j's cut states occur along a message if there is
some message m such that i reaches its cut upon send(i, j, m) and j reaches its
cut upon receive(j, i, m), or vice versa. A processor j's cut state occurs along a
message if there is some i such that i's and 5's cut states occur along a message.
Lemma 9 states conditions which must be true of the cut states of processors
which do not cut along a message.
Lemma 9. Let r be a run of the protocol in which i's and 5's cut states do
not occur along a 5 to i message. Let ii. e? be i's cut state in r. Then ii.
must contain an event send(i, 5, m) such that, in all runs ?? of the protocol in
which ii. e? is i's cut state and in which i's and 5's cut states do not occur
along a 5 to i messageeither receive(5, i, m) is the last event in 5's cut state, or
recezve&, i, m) occurs after 5's cut state.
Proof: Let 5's cut state in run r be i? e? and suppose that there are no sends to
5 included in i's cut state in r. If some send event from i to 5 has been disabled
by the protocol in i's cut state, then by Lemma 6 there is a run r in which cut
states remain the same and the re-enabling of this send event occurs without
intervening receives. We can assume that the send occurs immediately upon its
re-enabling. Let ?? be the first message sent from i to 5 after i's cut state. (As
in Lemma 7, m' is either a protocol message sent during the receive-free interval,
or the system message that we have assumed is sent at the end of this interval.)
80
There are no receives between e? and the sending of m'. Furthermore, it is the
case that e5 74 send(?,j, m'), because by the preceding observation and Lemma 5,
e5 H send(i,j, m') would imply ?? H e?; however, by Lemma 8 this could only
happen if ? = send(j,?,m11) and Cj = recetve(?,j,m11), which we have assumed
not to be the case. So ? 74 send(?,j, m'). Since there are no messages sent from
i to j in i? e?, the hypotheses of Lemma 3 are satisfied, and therefore there exists
a run r" in which rece?veU, i, m') occurs immediately after t?. In run r", j's cut
state cannot occur before receive(j, ?, m'); therefore the cut produced in run
is inconsistent.
So i must send at least one message to j in i? e?. Let m be the last such
message. Assume there is a run in which ?`s cut state is (? e?, i's and j's cut states
do not occur along a j to i message, and receive(j, i, m) occurs before e'?, the last
event of 5's cut state. As above, if system sends from i to 5 have been disabled
by the protocol in i's cut state, we can consider a possible run r' and message
such that: r1 has the same cut states as r, m' is the first message sent by i after its
cut state, and there are no receive events between i's cut state and send(i,j, m').
Since i's and 5's cut states do not occur along a 5 to i message, ?` 74 e? so
e?' 74 send(i,j, m') (by Lemma 5, since there are no intervening receives). All
messages sent to jin i? e? have been received before e31 (since m' is the last such
message, and receive(j, i, m') precedes e,'?), and so applying Lemma 3 produces a
run in which the cut is inconsistent. Consequently, either = receive(j, i, m) or
receive(j, i, m) occurs after 5's cut state. I
We call a message such as that described in Lemma 9 essential (i.e. an
essential message is one upon whose reception the receiving processor will reach
81
its cut state, if it has not already done so). If a message is not essential, but
in the run under consideration it happens to be received after or immediately
before the receiver's cut, the message is termed incidental. We have seen that
every pair of neighbours must either have cut states occurring along a message or
exchange essential messages in order to guarantee consistency of the cut. When
essential messages are exchanged, Lemma 10 demonstrates that the reception of
certain messages can be "pushed back" until the receiving processor necessarily
reaches its cut state immediately after the reception of one of them. Later, in
order to complete the proof of our theorem, we demonstrate a run in which this
"pushing back" results in a processor reaching its cut state before it has sent all
the necessary essential messages.
Lemma 10. Let r be a run of the protocol in which there are processors i and 5
whose cut states do not occur along a 5 to i message. Let l? ep be the cut state of
processor p in run r for all p. Then there exists a sequence of messages ..... . ,mk
sent toj in i? e? such that in some run ??, 15.receive(5,i,m1)-. . ..recezve(j,i,m?)
is 5's cut state. Moreover, it is possible to find an r' for which, in addition, the
following two properties hold:
1. for all p such that e> 7> e? (i.e. 5's and p's cut states do not occur along a
j to p message in r note that processor i falls into this case), l? e? is p's
cut state in
2. for all p such that Cj H e? (i.e. 5's and p's cut states do occur along a 5 to
p message in r), l? is a prefix of p's cut state in
Proof: Let m1 be the first message sent to Sin i? e? but not received in lj. Let
.2.... ,?? be any other such messages. By Lemma 9, .1.... ,?? must contain
82
an essential message. If ? receive(j, i, m1), then r itself satisfies the lemma.
Otherwise, let s1 be the vector of local states where (a) sJ = ii. receive(j, ?, mi),
(b) s?' = (? e? for p as in (1) in the statement of the lemma, and (c) Sp1 --H ip for p
as in (2). Then s1 is a partial run as follows. Firstly, each event of each processor
is enabled by the preceding local state, since (1i ..... , 1N e?) is a partial run,
and the only additional event is receive(j, ?, mi). Secondly, rece?ve(5, i, mi) is
the only receive event added, and send(i, j, m1) occurs in i? e?; also, the only
send event deleted from the events in (1i e1,. , 1N e?) is (possibly) e5, and
any corresponding receives have been removed as well (see Lemma 8 and (2)
in the statement of the current lemma). Thus there is no receive event in s1
for which the corresponding send is missing. Finally, the happens-before partial
order is not violated: rece?ve(j, ?, mi) does not happen-before anything in s1, so
no circularity is introduced. Thus, 3 is a valid partial run. The FIFO condition
is not violated, as m1 is the first i to 5 message not received in !?.
The partial run ?i can be extended by Lemma 1 to a total run r1. (See
Figure 7.2(b). Event e? in (a) follows l? in (b) if ? 74 e?.) Ifj's cut state in r1is
ij receive(j, ?, m1), then r1 satisfies the lemma. Otherwise, let 32 be the vector of
local states such that (a) sj2 = i? receive(j,?,mi).receive(j,i,m?), (b) 3?2 =
for p as in (1), and (c) 3p2 = ip for p as in (2). By a similar argument to that
above (a slightly simpler one, as 32 is an extension of si), ?2 is also a partial run.
Since mi...., m? contain an essential message mk, we can repeat this process to
obtain a run satisfying the lemma, in k or fewer steps (Figure 7.2 (c)). 1
The following three lemmas show that there exists a possible run of the pro-
tocol in which one processor sends an essential message to each of its neighbours,
83
MM
Ii
Figure 7.2: Proof of Lemma 10: (a) the original run r; (b) after the first step,
run r1; (c) the final run
I,
j
_______________________			`I_____________________
ii
p
(c) ?
v
ip
lj
ii
ip
m1
ep
(a) j
ip
1'
ii
?1'
mi
ej
#L
84
but all neighbours reach their cut state before that message is received. We start
with an arbitrary run, and progressively massage it into a run satisfying these
criteria. The final argument in the proof of Theorem 3 will then show how this
run can be manipulated to form a possible run in which the cut is inconsistent.
We start by showing that it is possible to find a run in which at least one
processor's cut state ends with a receive event. Below, the degree of a processor
in a network refers to the number of its neighbours. Recall that a protocol must
apply to systems with any network configuration (processor set and channel set).
Lemma 11. There exists a protocol run r in which there is a processor j such
that degree(5) > 2 and j's cut state in r occurs upon a receive event.
Proof: Let S be any system such that the processor network contains two neigh-
bouring processors i and j each with degree at least two. Suppose there exists
a run r of the protocol with respect to S in which neither ?`s nor j's cut state
occurs upon a receive; therefore, they do not occur along a j to' message and i
sends an essential message to j. Lemma 10 can then be applied, and produces a
run in which 5's cut state occurs upon receipt of a message from i. I
Next, we show that it is possible to find a run in which one processor's cut
state occurs upon a receive but not along a message.
Lemma 12. There exists a protocol run in which there is a processor whose cut
state occurs upon reception of a message but not along a message (i.e. a proces-
sorz such that the last event of i's cut state is recezve(i, 5, m) but send(5, i, m) is
not the last event of 5's cut state).
Proof: By Lemma 11 there is a run r and a processor 5 of degree at least two
85
such that j's last cut-state event in r is a receive. Let lj % be j's cut state. Let
i be a neighbour of j such that e5 is not the receive of a message from i; it is
possible to find such an i since degree(j) > 2. By Lemma 9 there is an essential
message sent to i in l? ?; since ey is not a send, this message must be sent in t?.
Let m be the first j to i message that is incidental or essential in run r. If z's last
cut-state event e? is rece?ve(i, 5, m) then i satisfies the lemma. If not, one can
apply Lemma 10 to get a run in which i's cut state does occur upon reception of
a message from 5 and in which i? % is 5's cut state. In this new run, i satisfies
the lemma.I
The third step is to find a run in which each processor's last cut-state event
is the reception of a message not along the cut, and there is one processor which
has sent none of these messages.
Lemma 13. There exists a protocol run in which
1. every processor reaches its cut state upon a receive but not along a message;
and
2. there is a processor 5 such that none of 5's neighbours reaches its cut state
upon receiving a message from 5.
Proof: By Lemma 12, there is a protocol run r in which at least one processor
satisfies the condition described in (a). If i's last cut-state event is receive(i, p, m),
set D = ?iJ, and let Q be a queue consisting of all neighbours of i, with p at
the head. Since i's and p's cut states do not occur along a p to i message, there
exists (by Lemma 10) a run riin which p's cut state occurs upon reception of
a message from i, and in which i's cut state remains unchanged. Add p to D,
86
delete it from Q, and add its neighbours other than? to the back of Q. Repeat
this procedure: at any step, the processors in D are amongst those which, in the
current run rt?i, satisfy condition (1). In addition, the processor p' at the head
of Q is a neighbour of a processor k E D, and p"s and k's cut states do not occur
along a p' to k message. Lemma 10 again gives a run rt of S in which p"s cut
state occurs upon reception of a message from k, and for all d ? D, d's cut state
in rt is identical to that in r??1 Processor p' satisfies the condition of (1) in
run rt; add it to D, delete it from Q, and add all its neighbours which are neither
in D nor already in Q to Q. Since at each step one processor 15 added to D, this
procedure must eventually halt. Since the network is connected, all processors
will eventually be added to Q and thus to D. Therefore the repetition ends with
a run rt satisfying (1).
Let j be the last processor removed from Q; all of j's neighbours must be
in D. However, after the first iteration of the above construction, any processor
in D reaches its cut state upon reception of a message from some other processor
in D. Thus j satisfies (2). I
7.3 ProofofTheorem 3
By the final lemma above, we can find a run r and a processor j such that j's cut
state in r occurs upon a receive but not along a message, and such that all of j's
neighbours reach their cuts upon reception of messages sent by processors other
than 5. Let P'5 cut state in run r be 4 e? for each processor P Since 5's cut state
does not occur along a message, and e? is a receive event,lj must contain the
send of an essential message to every neighbour of 5. Let i be the processor whose
87
5
p
ii
A
send(z,j,m1)			send(i,j,m?)
Figure 7.3: Proof of Theorem 3. No event in the dashed portion of 5's local
history can happen-before any event in any other processor's cut state.
first incidental or essential message from 5 in run r is sent last in ty. Let l? e? be
i's cut state in r. (We know that e? is a receive, but not of a message from 5.) Let
send(5, i, m) be the first send of an incidental/essential message to i in 1?. Then
neither send(j, i, m) nor any event after it can happen-before any event in any
other processor's cut state as follows. If 1?! is the portion of j's cut state before
send(j, i, m), then ly' . send(j, i, m) contains the send of an incidental/essential
message to each of 5's neighbours (see Figure 7.3). By assumption, no processor
reaches its cut upon receiving a message from 5, so each of these messages is
received after the receiver's cut state. Thus any message sequence starting at or
after send(5, i, m) to an event of another processor's cut state will make the cut
inconsistent. Therefore send(j, i, rn) does not happen-before any event in another
88
processor's cut state, and no event which follows it in j's local history can do so
either.
Let mi...., m? be the messages sent from i to j in ij that are not received in
lj' (there is at least one, since by Lemma 9, ? sends j an essential message in i?;
this essential message cannot be received in 1?1, as j's cut state only occurs at
some point after send(j, i, m)). The above argument shows that send(j, i, m) $+
send(i, 5, rni) By an argument almost identical to that in Lemma 10, we can
find a run r1in which every processor p $ 5 has the same cut state as in r, and
lj' receive(j,i,m1) . .. . receive(j,i,rn?) is 5's cut state, for some k < n. (The
only difference in the argument is the justification that there is no receive event
in the intermediate partial run s? for which the corresponding send event is not
also present. In the proof of Lemma 10, we invoke Lemma 8; here we use the
argument in the last paragraph above.)
As in the proof of Lemma 9, if system sends ftom 5 to i have been disabled by
the protocol in 5's cut state in r', we can consider a possible run r11 and message
m' such that: r? has the same cut states as r1, m1is the first message sent by 5
after its cut state, and there are no receive events between 5's cut state and the
sending of m1 (see Figure 7.4). Since e? is a receive event, it does not happen-
before any event in 5's cut state. There are no receive events between 5's cut state
and send(5, i, ml), so e? does not happen-befor? send(5, i, m1) either. Therefore
the hypotheses of Lemma 3 are satisfied, and there is a possible run in which
message i? receive(i,5, m1) is a prefix of i's cut state, so that m1is inconsistent
with respect to the cut formed by the protocol. 1
89
ii
z --H
j
se?n?d?, i, m')
Figure 7.4: Proof of Theorem 3: the run r". The dashed interval is receive-free.
7.4 Purther Results
As we have stated, the proof of Theorem 2 is very similar to, but slightly simpler
than, that of Theorem 3
Proof of Theorem 2: We assume that there is a non-inhibitory consistent-
cut protocol for FIFO systems, and use a sequence of lemmas almost identical
to those used in the proof of Theorem 3. The lemmas differ only in this: rather
than having to take into consideration the fact that the first send by a processor
after its cut state might be a of protocol message, and occur only after a receive-
free interval, we can assume that the first post-cut event is a system send to the
appropriate processor. Because the protocol is non-inhibitory, this send event
cannot be prevented. Thus the inconsistent message that will be produced in the
final phase of the proof is guaranteed to be a system message 1
We now ask, as in Chapter 6, whether there is not some class of FIFO systems
90
for which the impossibility results do not hold. In this case, there is indeed such
a class.
Fact 1. There is a non-inhibitory consistent-cut protocol for the class of star
networks (i.e. networks in which one processor has degree N --H 1 and all others
have degree 1). Moreover, all messages will be consistent with respect to the
designated cut.
Proof: For any such system, let 1 be the processor having degree N --H1. The
algorithm is trivial: 1 sends a Cut message to each processor j $ I. I reaches
its cut state upon sending the last such message; for j $ I, j reaches its cut
state upon the occurrence -of the event receive(j, I, Cut). This protocol is clearly
non-inhibitory. We claim that the designated cut will always be consistent: for
suppose there is some run r of the protocol in which this is not the case. Let m be
an inconsistent message in run r; either the sender or the receiver of message m
must be processor I. However, because the system is FIFO, any message sent
from I to 5 after 1's cut state will be received after the event receive(j, I, Cut)
and therefore after 5's cut state. Thus I must be the recipient of the inconsis-
tent message m. Now if send(j, i, m) occurs after the cut and receive(I, 5, m)
occurs before it, then receive(I, 5, m) H send(I, 5, Cut) " receive(j, I, Cut) H
send(j, I, m) receive(I, 5, m) and the happens-before partial order is violated.
The same non-inhibitory protocol will of course also be successful for any
class K which is a subset of the class of star networks. These, however, are the
only exceptions.
Corollary 2. Let K be a class of FIFO systems S = (I,c,?,M,1t), where
91
membership in K is based solely on properties of the sets I and C. Then the
conclusions of Theorems 2 and 3 hold unless k is a subset of the class of all
star-shaped networks.
Proof: The only characteristic network assumption that is required for the
proofs of Theorems 2 and 3 is that there be two neighbouring processors each
having degree at least two. In any class of systems which does not contain only
star networks, there will be at least one system satisfying this property. All
systems with an identical network configuration will also be included, and so our
proofs follow through as above. 1
7.5 Summary
In this chapter we have presented and proved theorems that together cover all the
impossibility results for FIFO systems that we claimed in Chapter 4. The first
demonstrates that no non-inhibitory CCP can exist, regardless of requirements on
protocol message consistency. The second shows that, while local send inhibition
suffices if protocol messages need not be consistent (as illustrated by Floodi), it
is not sufficient to guarantee consistency of all messages. The conclusions of these
two theorems carry over to all classes of FIFO systems with the sole exception
of the subsets of the class of star networks.
Chapter 8
Message Complexity
In this chapter we look at the cost of achieving causal consistency in terms of the
number of messages required by any consistent-cut protocol. This issue is not, in
fact, separate from that of inhibition. As we will demonstrate, there is, for FIFO
systems, a direct correlation between the degre?local or global?f inhibition
used by a CCP, and the number of messages required. In particular, we show that
any protocol which uses only local inhibition requires O(ICI) messages. On the
other hand, if a consistent-cut protocol for FIFO systems use global inhibition,
then it need use only 0(N) messages; in the worst case, the exact lower bound is
2N --H 3 messages. Thus there is a trade-off between the degree of inhibition and
the message complexity. The quantity 2N --H 3 is a lower bound for a CCP that
is required to work for all FIFO systems. There are classes of systems, however,
for which there are CCPs which use fewer messages. For instance, there are
consistent-cut protocols using only N --H 1 messages which are successful for all
FIFO systems in which the processor network is a tree.
In the non-FIFO case, global inhibition is necessary to any consistent-cut
92
93
protocol. As in the FIFO case, a globally inhibitory protocol requires 0(N)
messages, 2N --H 2 to be precise. However, unlike for FIFO systems, this bound
holds for all classes of non-FiFO systems.
Our results assume that in each run of the protocol there is a distinguished ini-
tiator, and a first protocol event that causally preceded all other protocol events.
Note that this is the case in the three protocols of Chapter 3. One could imagine
a protocol which did not have this property; for instance, one could construct
a protocol resembling Tree of Chapter 3 but in which the PrepareCut phase
of messages was omitted. In this protocol, each leaf of the network spanning
tree would eventually have to start its protocol events spontaneously and inde-
pendently. Given the potential time lags between the points at which each leaf
processor chose to start sending Cut messages, such a protocol seems of limited
practical use or interest.
We start our chapter with a brief discussion of the ideas and techniques that
we will use for the protocols and lower bound proofs for both non-FiFO and
FIFO systems. We then turn our attention non-FiFO systems first. We present
a consistent-cut protocol that uses 2N --H 2 messages and argue that it is indeed
successful for all non-FiFO systems. This argument is followed by our proof that
2N --H 2 is indeed a lower bound on the message complexity of CCPs for non-
FIFO systems. For FIFO systems, we look first at the case of global inhibition.
We present a globally inhibitory CCP for FIFO systems which requires 2N --H 3
messages in the worst case. We argue that the protocol is correct, and that any
other would require at least as many messages. We then turn our attention to
CCPs which use local inhibition only, and prove that any such protocol requires
94
O(ICI) messages. We close the chapter with a discussion of precise lower bounds.
8.1 Results for Non-FIFO Systems
8.1.1 Preliminaries
We start by making precise an idea that has played a part in the non-FiFO
correctness and impossibility proofs which have appeared in previous chapters,
and which plays a key role in our arguments below.
Lemma 14. Let 5 be any system, P be a consistent-cut protocol, and r be a run
of P(S). Let ? and j be any pair of neighbouring processors. Let ii be the last
proper prefix of i's cut state in run r in which receives are enabled, and let e? be
the event following ii. Let t'? be the first extension of i's cut state in run r in which
sends to j are enabled, and let C1j be the event following I:' Define i?,e?,i;., and
similarly with respect to i. Then (1) if e'? is send(i,j, m) for some message m, it
must be the case that ? H e'?; and (2) if is send&, i, m') for some message
it must be the case that e? H
Proof: Suppose that e? send(i,j,rn) but e? 74 e'?. Then by Lemma 2 of
Chapter 5, there exists another protocol run in which l'? e'? is a prefix of i's
local history and i? receive(j, i, m) is a prefix of 5's local history. In this run,
5's cut state cannot occur until after the event receive(j, i, m) and we have an
inconsistent message, and a contradiction. Thus e? H e'?. The argument for the
second conclusion of the lemma proceeds identically. 1
In effect, the lemma states that in any run of a CCP, and for any pair of
neighbours i and 5, the last pre-cut event of i that could be a receive from 5 must
happen-before the first post-cut event of 5 that could be a send to i. We proved
95
protocol Tree correct in Chapter 3 by showing that this property holds in every
protocol run. In an attempt to determine whether or not the 3(N --H 1) messages
of protocol Tree are necessary to guarantee that this property hold, we look at
the protocol a bit more closely.
Why are three phases of messages necessary? What would happen if proces-
sors reached their cut state after forwarding the first-phase message, and then
disabled system sends until receiving a second-phase message from each child?
(See Figure 8.1.) Upon examination, one can see that this would be sufficient
to guarantee the consistency of any message between a pair of processors where
one is an ancestor of the other in the spanning tree, provided that receives were
disabled between each processor's first protocol event and its cut state. (See
Figure 8.1 (b). The inconsistent message m would produce circular causality.)
However, it would still be possible to have an inconsistent message between neigh-
bours where neither is an ancestor of the other (for example, the message m1in
Figure 8.1 (b)). The question then arises: for any giyen system, is there a
spanning tree of the processor network with the property that any neighbouring
processors share an ancestor/descendant relationship in that tree?
Lemma 15. In any connected graph G there is a spanning tree T such that for
any pair of nodes u and v, if u and v are neighbours in & then either u is an
ancestor of v in T or vice-versa.
Proof: Such a tree can be built using a depth-first search of the graph. We
start with an arbitrary node vi and set V' = 4). At each step, we look at all
of the neighbours of the last node v? to have been added to V'. If the node v?
has neighbours which have not yet been added to V,, then we arbitrarily choose
96
(a)
x
I I
k1
Figure 8.1: The protocol Tree (a) and the proposed modification (b).
97
a Vn+j from amongst them. If there are no such neighbours, then we back up
the current tree to the first of V??5 ancestors which does have such neighbours,
and choose Vn+1 froin? amongst them. In either case, the node Vn+1 is added to
V' and the edge (vn, Vn+i) to T. We continue this procedure until all nodes have
been added to V', which will eventually be the case since U is connected.
We claim that T satisfies the requirements of the lemma. For suppose that
there exist nodes u and v which are neighbours in U and such that neither is
an ancestor of the other in T. Of u and v, let u be the node which is added
to V' earlier in the construction. Let v' be the last node that is added to u's
subtree in the construction of T. It must be the case that every node added to V'
between the addition of u and the addition of ?? is a member of u's subtree. By
hypothesis, v is not in u's subtree. But v ? V' at the completion of u's subtree,
or it would be added to V' as a child of u. This contradicts our assumption that
v is not a descendant of u in T. I
We refer to a tree which has the property in the statement of Lemma 15
as a depth-first-search tree. These trees will prove to be instrumental in the
construction of minimum-complexity consistent-cut protocols.
8.1.2 A Minimizing CCP for Non-FIFO Systems
Using a depth-first-search spanning tree, we have for non-FIFO systems the
consistent-cut protocol Minimize1, described in Figure 8.2.
Proofofcorrectness ofprotocol Minimize1: Suppose that in some run r of
Protocol 8.2 there is an inconsistent message m from processor i to processor j.
Because of the form of the spanning tree T, it must either be the case that i is
an ancestor of j in T or vice versa. Suppose first that j is an ancestor of i. Let
98
o+ Initiator I:
--H start CCP; disable all receive events;
--H send to each processor in children(I) a Cut message; (Cut)
--H disable all system send events; re-enable all disabled receive events.
0 Internal node t upon reception of a Cut message:
--H disable all receive events;
--H send to each processor in children(i) a Cut message; (Cut)
--H disable all system send events; re-enable all disabled receive events.
o+ Leaf node i upon reception of a Cut message:
--H (Cut) disable all system sends;
--H send(?, parent(i), Resume);
--H re-enable system sends.
o+ Internal node i, having received Resume messages from all children(i):
--H send(z, parent(i), Resume);
--H re-enable system sends.
0 Initiator I, having received Resume messages from all children(I):
--H re-enable system sends.
Figure 8.2: Minimizel T is a depth-first-search spanning tree of the network
rooted at 1; parent(i) and chitdren(i) are the parent and children of i in T.
99
k be the child of 5 which is also either an ancestor of i or ? itself. Then, since
there are no receive events between send(j, k, Cut) and the end of 5's cut state,
we have
send(j,k,Cut) receive(i,parent(i),Cut) H send(i,5,m) H
recezve(5, i, m) H send(j, k, Cut).
Thus an inconsistent message from i to its ancestor 5 would produce causal
circularity. Therefore it must be that ? is an ancestor of 5.
The processor z cannot send any messages after its cut state until it has
received Resume messages from all of its children in T. Let k be the child of i
which is also either an ancestor of 5 or 5 itself. Then
send(5, parent(S), Resume) H receive(?,k,Resume) send(i,5,m) H
recezve(5, ?, m) H send(j, parent(j), Resume)
and once again we have causal circularity. Therefore there cannot be an incon-
sistent message in any run r of protocol Minimize1. 1
Note that Minimizel uses 2N --H 2 messages in any system, regardless of
network topology, or of the shape of the spanning tree itself. We prove in the
following subsection that 2N --H 2 is a lower bound on the message complexity of
CCPs for non-ElFO systems.
8.1.3 Lower Bound Theorem for Non-FIFO Systems
Theorem 4. The lower bound on the message complexity of consistent-cut pro-
tocols for non-FiFO systems is 2N --H 2.
100
In order to prove our theorem, we need to prove that for any CCP for non-
FIFO systems, there is some protocol run in which 2N --H 2 protocol messages are
required to guarantee the consistency of the designated cut. We can therefore
make any worst-case assumptions about enabling relations on sequences of system
events, and about the occurrence of system events outside of any intervals in
which they are inhibited by the protocol.
Let P be a consistent-cut protocol for non-FiFO systems, and let r be a run
of the protocol with respect to some system 5. We can assume that no system
message is sent in r until after the completion of the protocol (i.e. there is no
system send event which happens-before any protocol event, or which is included
in a local state in which events are still disabled by the protocol). We can also
assume that a series of system sends to i's neighbours are the first post-cut
system events of each processor i. (These sends may occur only after an interval
of protocol events during which system sends are inhibited by the protocol.)
Now, by Lemma 14 there must, in run r, be causal paths, in both directions,
between every pair of neighbouring processors. Moreover, all of the messages on
these paths must be protocol messages. We assume that the protocol uses the
minimum number of messages necessary, and show that this number must be at
least 2N --H 2.
Firstly, let I be the initiator of the protocol in run r. The event Start CCP
occurs at I, and Start CCP happens-before every other protocol event in run r.
Because the processor network is connected, and because there are paths of pro-
tocol messages linking every pair of neighbours, there must be message paths
from the initiator to every other processor in the network. For each processor
101
other than the initiator, consider the first message received by that processor in
the run r. There are N --H 1 such messages, which we refer to as original messages.
If G is a directed graph where the nodes correspond to the processors in 5, and
the edges correspond to messages sent in r, then the edges corresponding to the
original messages together form a spanning tree T, rooted at 1, of G.
If i is j's parent in the tree T, then clearly there must be a channel between
them in the processor network. The original message from i to j could enforce
the necessary causal precedence in the i --H j direction; i.e. this message may
ensure that the last event before i's cut state which could be a receive from j
happens-before the first post-cut system send from j to i. However, we must still
show that the corresponding 5 --H i causal relationship also holds.
Lemma 16. Every internal processor in T must receive a message in addition
to the original one.
Proof: To guarantee that there is an appropriate causal precedence between a
processor i and i's parent in T, there must be a message path from i to parent(i).
The first message m in this path must be sent after the reception of the original
message mo to i, as receive(i, parent(i), mo) is the first protocol event occurring
at i. Therefore, the path from i to parent(i) cannot use the original message m
into parent(i): if it does, then
receive(i,parent(i),mo) H send(i,q',m') H . receive(parent(i),q,m) H
send(parent(i), i, mo)			receive(i, parent(i), mo)
and there is causal circularity in run r, which is impossible. Therefore pareni(i)
must receive some message other than the original one. 1
102
We have thus shown that, in addition to the N --H 1 original messages, there
must be at least as many more messages received as the number Nint of processors
which are internal in T. We also know that, in addition to the original messages,
there must be as many more sent as the number Nicaf of processor which are
leaves in T: each leaf must send one, otherwise there could not be a path from
it to its parent. Unfortunately, some of the additional received messages will of
course be the same as some of the additional sent messages, and so we have not
yet proved that there must be N --H 1 messages in addition to the N --H 1 originals.
In order to prove that there must indeed be N --H1 additional messages we look
at the messages received by each processor in the network. We know that each
internal processor must receive an additional message; if it receives more than
one, we refer to all but one as extra messages. Any additional message received
by a leaf processor we also refer to as extra. Our object is to show that if there
are Nleaf leaves in T, then there are at least N?eaf --H 1 extra messages received
in r; then we will have shown the presence of Nint + Nieaj --H1 = N --H 1 additional
messages in r.
We proceed by showing that if, in run r, there are message paths to a single
processor p from processors in n different branches in T, then those paths must
contain at least n extra messages if p is a leaf in T, and at least n --H 1 extra
messages otherwise. We start with the case where p is a leaf.
Lemma 17. Let p be a leaf processor in the tree of original protocol messages T.
Suppose that there are, in r, paths to p from n processors pi, .., pn where each
p? is a leaf in T. Then there must be at least n extra messages on these paths.
Proof: Let m? be the first message in a p? to p path (there may be more than
103
one such path); m? is not an original message, since p? is a leaf in T. Let qi, ...,
be the (not necessarily distinct) processors which receive these messages.
How could m? not be extra? It would require that q? be an internal node, and
that all non-original messages to q? be included amongst m1, .., m?. Suppose that
there are n1 such q?'s. Then there are n --H n1 extra messages among .?,..., m?
(all but one message to each of the ni q?'s are extra).
We claim that each of the n1 q?'s must send an additional message; i.e. there
must be a path from q? to p whose first edge is non-original. For let c be the
child of q? whose original message m from q? is the last original message sent
by q?. The path from c to q? must use (end with) one of m1, ..., m?, since these
include all non-original messages to q?. This message m5 must be received after
all original messages from q have been sent. However, m5 is also a message in a
path from Pi to p. Therefore the path must continue on the other side of q?. (See
Figure 8.3; straight arrows are direct messages, curved arrows represent message
paths.) The path cannot continue through an original message since m is the
last original message sent by q?; thus q? sends an additional message. Hence there
are n1 additional messages from the q?'s in the paths to p. Again, all of these
are extra unless there is an internal processor q which receives no non-original
messages other than those sent in the first two "rounds" of messages. If there are
n2 such processors, then there will be n1 --H n2 messages which are not extra. We
argue as above that all ?2 such processors must again send an additional message
each, and repeat. If k is the length of the longest path from any p? to p, then
we will have (n--Hn1)+(n1 fl2)+(n3--Hn2)t...t(nk-1 --Hnk)+nk =n extra
messages in total. 1
104
c
qL
Proof: We proceed as in the proof of Lemma 17, except that at each stage,
one of the n? q?'s could be p itself, in which case no additional message need
be sent. This can occur at at most one stage, since all non-original messages
to p must be included among the messages already considered at that stage.
Therefore it is not possible that p be a receiving processor for any message sent
in a subsequent stage. I
Lemma 17 and Lemma 18 give us the minimum number of extra messages that
must be present in paths such as those described. This minimum was determined
Lemma 18. Let p be an internal processor in the tree of original protocol mes-
sages. Suppose there are paths to p from n processors pi,..., p? such that each pi
is a leaf in the tree of original messages. Then there must be n --H 1 extra messages
on these paths.
PI
The case where p is an internal node is argued in a similar manner.
Figure 8.3: The q? to c message is the last original message sent by q?; the
message m5 lies on both the c to q? and pj to p paths.
m
105
by looking at the number of processors that must send non-original messages. In
fact, other processors on the paths may also send such messages, resulting in a
higher count of extra messages, as explained below.
Lemma 19. Let p be a leaf processor in the tree of original protocol messages.
Suppose that there are, in run r, message paths to p from n processors Pi,???, Pn
where each p? is a leaf in T. Let M be the set of all messages appearing in
these paths. Suppose there are processors q such that q receives a non-original
message m ? M, and q sends a non-original message m' ? M. Then there are at
least n t ?? extra messages in the paths.
Proof: The number of extra messages on the paths is the total number of
non-original messages, minus the number of processors all of whose incoming
non-original edges are included in that total. If there are more non-original
edges in the paths than those counted in the proof of Lemma 17, then there are
more extra. I
Lemma 20. Let p be an internal processor in the tree of original protocol mes-
sages. Suppose that there are, in run r, message paths to p from n processors
Pi,..., pn where each p? is a leaf in T. Let M be the set of all messages appearing
in these paths. Suppose there are processors q such that q receives a non-
original message m ? M, and q sends a non-original message Ei M. Then
there are at least n + n? --H 1 extra messages in the paths.
Proof: As above.
We call a protocol message m from processor q to processor p a terminal
message in the run if there is no processor pt and protocol message m' such that
106
receive(p, q, H send(p,p', m'). Note that no original message is terminal.
Also, since the number of messages involved in the run has been assumed to be
the minimum required by the protocol then there will be no terminal message
into a leaf processor (such a message could not be necessary to any child-parent
path).
Suppose there are k terminal messages .1...., mk in r, where m? is a message
from p'2 to p? for i = 1,..., k. For each? we consider the message paths in r which
end in m?; i.e. all those messages whose receptions happen-before send(p'?, pi,
We want to count the leaves of T and the extra messages on these paths; however,
this task is rendered difficult by the fact that some of these may be duplicated.
(See Figures 8.4 and 8.5. The first shows a hypothetical configuration of pro-
tocol messages in a run; the second illustrates the paths ending in the terminal
messages m1 and m2.) To ensure that our count is accurate, we perform the
following pruning operations. For each processor p in the system, let m? be the
last message sent by p that is contained in any of these paths. For any message m
which appears in these paths, m ? m? for any p, and for each path in which m
occurs, delete from the path all messages leading up to m. That is, delete all
messages ?? such that the reception of ?? happens-before the sending of m. If
there is an original message with no non-original messages leading up to it, then
delete that as well. (Figure 8.6 illustrates the effect of the pruning operations
on the message paths of Figure 8.5.) For each ? let Tj be the tree consisting of
what is left of the paths through m?. The Tj's now have the following properties,
which will allow us to count messages precisely.
107
0
`4
` 7n1
2
4'
3			I
m2
4Th
4
5
Fignre 8.4: A hypothetical run. The solid arrows indicate original messages and
dotted arrows additional messages.
108
.1
A
m1
A2
I ?
5			0
.4
A
A2
4			1			5			0
3			2
3
3
Figure 8.5: The messages leading up to m1 and m2.
m2
.1
A,			A
A2
5'
Figure 8.6: The pruned message paths to m1 and m2.
109
Lemma 21. Every non-original message in r has a corresponding edge in exactly
one of the Tj's.
Proof: Every non-original message must have appeared in one of the paths before
the pruning; none of these messages can have been completely excluded after it.
For suppose m is a non-original message from processor q0 to processor q'. Let m1
be the last message sent by q' and q2 be the processor to which it is sent, m2 be
the last message sent by q2 and q3 be the processor to which it is sent, and so
on to m3 = m for some i. None of these messages is deleted under the pruning
instructions, and so m will appear in Tj. No message can appear twice in the T?'s:
each m? appears only once since receive(pi, p?j, mi) does not happen-before any
protocol send event and hence does not appear in 7) for any 5 # i; each other
message m will appear only in the same Tj as the last protocol message that is
sent by the processor which receives m. I
Lemma 22. If processor p receives a message in more than one 7), thenpispi
for some i, i.e. p receives a terminal message m?.
Proof: Suppose p is not one of the ?j'5. Then all messages received by p are
received before p sends its last message m and will thus be included in the same 7)
as m. By Lemma 21, no message appears in more than one of these pruned trees,
and so p will receive messages in that single 7) only. 1
Lemma 23. No Tj can be processor-disjoint from each other 7). Moreover, the
Tj's cannot be partitioned into groups such t;hat each Tj belongs to some group,
but one group is processor-disjoint from all the others.
110
Proof: All the non-original edges are present in the Tj's, and there must be
paths from every child in the tree of original messages to its parent in that tree.
Therefore if Ti has no processor in common with any other 7), Tj must either be
empty or else contain all processors. The argument in the case of groups of Ti's
is identical. I
Recall that T is the tree of original messages in run r, with Nint internal nodes
and Nicaf leaves. We know that there must be Nint messages in addition to those
of T, and are trying to prove that there must be Nieaf --H 1 extra messages, for a
total of 2N --H 2 messages. Having established the properties of the Ti's, we now
endeavour to count the extra messages contained amongst them. We present two
final lemmas which will aid us to accomplish this task.
Lemnia 24. For each i 1,..., k, if flj > 0 is the number of processors contained
in Tj which are leaves in the tree T of original messages, then Ti contains Uj --H 1
extra messages, and these messages do not include m?.
Proof: Recall that each message m? is sent from processor p?j to processor pi, and
that no p? is a leaf in T. Thus the flj leaves of T which are included in Tj lie on
the paths leading up to p'?; they may include p'? itself. in either case, Lemmas 18
and 17 guarantee that there will be flj --H 1 extra messages leading up to p'?, i.e. m?
is not included in the count. By Lemma 21, the edge sets of the Tj's are disjoint,
so no edge can have been counted as extra twice. Also, by Lemma 22, the only
processors which may receive messages in more than one Tj are the ?j'5, and so
we cannot have counted all the non-original edges into some processor as extra. I
Lemma 25. Suppose there are terminal messages Pi1,.. ,PiK which are all re-
ceived by the same processor p. For j = 1,..., K, let n?? be the number of leaves
111
of T which appear in Tij Then there are flj1 + ... + ??K --H 1 extra messages
contained in Tj1 ?... U TiK
Proof: By Lemma 24, each Tij contains flj1 --H 1 (or 0, if flij = 0) extra messages,
with no m?? being counted as extra. Since each is received by the same
processor p, IC --H 1 of these can also be counted as extra. As a result, there are
at least flj1 + ... + ??K --H K + (K --H 1) or ??i +?? + ??K --H 1 extra messages in
TilU...UTiK.I
Proof of Theorem 4: Suppose that k = 1; i.e. there is only one terminal
message m1 in run r, where m1 is sent by p?j and received by Pi Then there are
paths from each of T's leaf processors to pi which end with the message mi, since
by Lemma 21, all non-original messages (and hence all leaves of T) will appear
in the paths through m1. Thus if there is only one terminal message in r we are
done: there will be Nicaf --H 1 extra messages on the paths to pi
If k > 1, then for each?, let Tj consist of the message paths into rflj constructed
as described above, and let Li be the set of leaves of T which appear in Tj. We
can assume without loss of generality (by Lemma 25 and the second part of
Lemma 23) that the processors Pi,..., Pk which receive the terminal messages
mi,...,mk are all distinct; we can also assume that each Tj contains at least one
leaf of T. We set ?i = 1 and pick an ?2 such that Ti1 and Ti2 are not processor-
disjoint; by Lemma 23 it is possible to find such an ?2 By Le?nma 22, there are
four possible relationships between the processor set of Tj1 and that of T?2:
1. some processor p in Ti2 sends (but does not receive) a message in Ti?;
2. some processor p in Ti1 sends (but does not receive) a message in T?2;
112
3. p?2 appears in T?1;
4. pi1 appears in Tj2
(Note that these possibilities are not mutually exclusive.)
We set ni = Lii I, and ?2 = ILi2 --H Li1 I: this latter quantity is the number of
leaves which appear in Ti2 but not in Tj1. We know that there must be at least
n1 --H 1 extra messages in Tj1 and at least n2 --H 1 extra messages in Tj2. We wish
to show that there are in fact at least fli t ?2 --H 1 extra messages altogether. (To
be scrupulously accurate, we remark that if one or both of n1 and n2 is 0, then
any quantity appearing in this argument which might be negative (for example,
--H 1) should be replaced by the number 0.)
Suppose (1) holds, i.e. there is some processor p in Tj2 which sends (but does
not receive) a message in Ti1. If p is a leaf in the tree of original messages T,
then ?2 is actually one less than the number of T's leaves which appear in Ti2.
Therefore there are at least n2 extra messages in Tj2, and at least n1 + n2 --H 1
altogether. Suppose p is an internal node in T. It appears (by assumption) in Ti1;
however, not all the non-original messages received by p are included in Ti1 (none
are). It is not possible that the message from p which is included in Tj1 be original,
or it would have been deleted in the construction of Ti1. Therefore by Lemmas 19
and 20, there is actually one more extra message included in Ti1, and hence there
are at least n1 + n2 --H 1 extra messages altogether.
If (1) does not hold but (2) does, then we use a symmetric argument to show,
as above, that there are at least ?i +?2 --H 1 extra messages in Ti1 and Ti2 together.
Suppose that (3) is the first of the four cases to apply: pi2 appears in Tj1.
There is a non-original message (mi2) which is received by pi2 but which is not
113
included in Tjj. If the message sent by p?2 in Tj1 is non-original, then (by Lem-
mas 19 and 20) there are at least n1 extra messages in Tj1 and we are done. We
claim that the message m sent by pj2 in Tjj must be non-original. For suppose
otherwise. Let c be the child of p?2 which receives m. The message m would
have been pruned out of Tj1 if it were not the last message sent by p?2. We know
that there must be a message path from c to p?2; since m is the last message sent
by pi2, this path must end in a terminal message to pj2. However, c does not
appear in Tj2 (or we would have fallen into case (2) above), and we have assumed
that the processors which receive terminal messages are distinct. This leads us
to a contradiction.
Case (4) is argued symmetrically to case (3).
We continue in this fashion: at the jth stage, we choose Tij+1 which is not
processor-disjoint from Tj1 U ... U Tij The argument above is repeated, with
Tj1 .... U Tj1 in the place of Tj1, Tij+1 in place of Tj2, etc.; thus we show that
there are at least ni ?... + n?+i --H1 extra messages contained in Ti1 .... U Tj1+1.
Finally, we have n1 + . + ?k = Nleaf, and n1 + . + ?k --H 1 = Nleaf --H 1 extra
messages, as desired. Our proof is now complete. I
8.2 Results for FIFO Systems
In this section we look at lower bounds on the message complexity of consistent-
cut protocols for FIFO systems. We consider bounds on both locally and globally
inhibitory protocds. We start by examining the latter, as the protocols and
arguments involved closely resemble those for non-FIFO systems contained in
the preceding sections.
114
8.2.1 Globally Inhibitory Protocols
The ideas behind the globally inhibitory protocol we present for FIFO systems
are much the same as those we discussed in the context of non-FIFO systems in
the first section of this chapter. Once again we use a depth-first-search spanning
tree to minimize communication, and again we are constrained by the necessity
of having causal paths between neighbouring processors. There is, however, one
major difference: a single message m from processor i to processor j may be
sufficient to prevent an inconsistent message between? and j in either direction.
It may prevent an inconsistent j --H? message as in the non-FIFO case; in addition,
if receive events from ? are disabled between the reception of m and the end of
i's cut state, then there can be no inconsistent ? --H j message either, due to the
FIFO condition. Thus in protocol Min?mize2 below, a processor need not disable
system sends to its children, and no processor need send a Resume message to
its parent. Minimize2 uses global send inhibition and local receive inhibition, is
a consistent-cut protocol for FIFO systems which guarantees consistency of all
messages.
Proof of correctness of protocol Minimize2: Suppose that in some run r
of Protocol Min?mize2 there is an inconsistent message m from processor i to
processor j. Because of the form of the spanning tree T, it must either be the case
that i is an ancestor of j in T or vice versa. Suppose first that j is an ancestor
of i. Let k be the child of j which is also either an ancestor of ? or i itself.
Then, since there are no receive events between send(j, k, Cut) and the end of i's
cut state, we have send(j, k, Cut) receive(i, ?rent(?), Cut) H send(i, j, m) H
receive(j, i, m) H send(j, k, Cut).
115
. Initiator I:
--H start CCP;
--H send to each processor in ch?tdren(I) a Cut message; (Cut)
--H disable system sends to all processors other th? those in children(I).
. Internal node i, upon reception of a Cut message:
--H disable all receive events;
--H send Cut messages to children(i); (Cut)
--H re-enable the disabled receive events;
--H disable system sends to all processors other than those in children(?);
--H for each k such that k is one of i's ancestors other than parent(i): if k
is not a neighbour of one of i's descendants, then send(i, k, Resume).
0 Leaf nodes i, upon reception of a Cut message:
--H (Cut); for each k such that k is one of i's ancestors other than
parent(i): if k is a neighbour of i, then send(i, k, Resume);
. Internal node or initiator i, upon the event receive(i,j, Resume):
--H re-enable sends to all ancestors of j in T.
Figure 8.7: Minimize2 T is a depth-first-search spanning tree of the network,
rooted at 1; parent(i) and children(i) are the parent and children of i in T
116
Since there cannot be causal circularity in a run, there cannot be an incon-
sistent message from processor i to its ancestor j.
Suppose then that j is a descendant of i. If? is j's parent in T, then j reaches
its cut state after receiving a Cut message from i and before receiving any other
messages from i. Any message sent by? after its cut state is sent after the event
send(?,j, Cut) and hence will be received at j after the event rece?ve(j,?, Cut).
Thus it will be received after j's cut state and cannot be inconsistent. So i and
j cannot be parent and child. In this case, in accordance with the protocol,
i suspends system sends to j upon reaching its cut state. Either 5 or one of
its descendants in T must send a Resume message to ?, and i must receive this
message before performing any post-cut system sends to 5. Let k be the processor
(either 5 or one of its descendants) which sends the Resume message to?. Then
send(k,?, Resume) rece?ve(?, k, resume) H send(?, 5, m) H
rece?ve(j, ?, m) H send(k, ?, Resume)
and we have causal circularity. Therefore it is impossible that there be an incon-
sistent message from any processor? to any processor 5. I
If the spanning tree used by M?n?m?ze2 has K leaves, then the protocol
requires at most 2N --H 1 --H2K messages, since neither a leaf, nor an internal node
whose children are all leaves, need receive any message besides the original one.
Thus the maximum number of messages that Protocol 8.7 can use in any FIFO
system is 2N --H 3; this occurs when every internal node of the depth-first-search
spanning tree has only one child. This is the case, for example, in any system for
which the channel set C is completely connected or in which there is a Hamilton
117
circuit. We claim that 2N --H 3 is the least number of messages that any CCP can
use under those circumstances.
8.2.2
Lower Bound Theorem for Globally Inhibitory
Protocols
Theorem 5. Any consistent-cut protocol for FIFO systems requires at least
2N --H 3 messages in the worst case.
Proof: We consider a system 5 where the processor network is completely
connected, and a run r of the protocol. Let T be the spanning tree of the network
that is created by the set of messages m? such that m? is the first protocol message
received by processor z. Suppose that there are two processors i and 5 such that
is not parent(j), 5 is not paren!(i), and neither receives an additional message.
Since neither processor sends a message directly to the other, there must be a
message path from i to 5, and also from 5 to ? The message path from i to 5 must
use the original parent(5) to 5 message m5, and the path from 5 to i must use
the original parent(i) to? message m?. Then
send(parenl(i),?,mi) H rece?ve(?,parent(i),mi) H send(?,k,m)
send(parent(5), 5, m?)			receive(5, parent(5), m?) H
send(5, k', m') H send(parent(i), i, m?)
and there is causal circularity in the run. Therefore there can be at most two
processors?ne the parent of the other in the tree of original messages that
do not receive an additional message. There are thus N --H 1 original messages
and at least N --H 2 additional messages, for a total of at least 2N --H 3 messages
altogether.I
118
Note that this proof is much simpler than that given for the lower bound
for non-FiFO protocols. Using the same type of argument as that here (i.e. by
looking at the case of a completely connected network), we could easily have
shown that, in the worst case, 2N --H 2 messages are required. We chose to prove
the stronger result of Theorem 4, which gives a lower bound of 2N --H 2 without
relying on the features of the processor network topology, and hence holds for
any class of non-FiFO systems.
8.2.3 Lower Bound Theorem for Locally Inhibitory
Protocols
In our final section we turn our attention to locally inhibitory CCPs for FIFO
systems. We first prove a general lower bound theorem for these CCPs, and then
discuss exact bounds.
Theorem 6. Any locally inhibitory consistent-cut protocol for FIFO systems
uses O(ICI) messages.
Proof: Let P be a consistent-cut protocol which uses local inhibition (of
send events, receive events, or both) only. It is possible that, in a given run,
and for some processor, the protocol inhibits receive events for some interval
immediately preceding that processor's cut state. However, since the protocol
uses local inhibition only, there is some run where no communication is required
to re-enable any receive events that may have been disabled; thus there will be
no receive events at any processor between the point at which any receive events
are first disabled, and the point at which they are re-enabled. Similarly, in this
particular run there may be send events of some processor which are inhibited by
119
the protocol for some interval immediately following that processor's cut state.
Again, there must be some other run in which these events are re-enabled with
no intervening receives. It is this last run which we wish to examine.
Let r be a run of the protocol in which, for each processor i, there are no
receive events between the first pre-cut state in which there are events disabled
by the protocol, and the first post-cut state after which no events remain disabled.
We may assume that the first system events following a processor 5 cut state are
a series of sends to its neighbours, and that these are the first system sends
occurring at that processor in run r. Suppose that i and j are neighbours such
that i does not send a message to 5 in r, and vice versa. Let ij be the 1st proper
prefix of ?`s cut state in which receives are not disabled by the protocol; we
can assume that receives from 5 are enabled at this point. Let e? be the event
occurring in i?. Define i? and ?` similarly for 5. Finally, let send(?,5, m) be the
first post-cut system send from? to 5, and let send(j, i, m') be the first post-cut
system send from 5 to i. Since there are no messages, either system or protocol,
sent to 5 by i prior to m, Lemma 14 of Section 8.1.1 can be invoked to show that
send((j,i,m')); the same kind of argument shows that e? H send((i,j,m)).
Since there are no receive events between e> and send(j, j, rnI), or between e? and
send(i, 5, m), Lemma 5 implies e? ? and e1 H e?. Thus we have reached a
contradiction, and therefore either? must send a message to jin r or vice versa.l
Recall that if (?,5) Ei C then (5, i) ? C; i.e. C is actually twice the number
of channels in the system. The protocols Ftoodl and Flood2 of Chapter 3 both
then use			--H			--H 1). To see this, note that the initiator sends a message along
each incident channel; every other processor sends a message along all but one
120
incident channels. Thus there will be two messages on all but N --H 1 channels.
These protocols can be modified to perform better in most cases. For example,
in Flood2, let the initiator add the list fIJ to the Cut messages it sends out; let
every processor p add itself to the list L which accompanies the first Cut message
it receives and append this new list to all Cut messages it sends out. With this
modification, a processor need send Cut messages and processor list only to those
of its neighbours which do not appear in L. This modified protocol will require
fewer messages in most instances; however, in any run in which the first Cut
message received by each processor is the one sent by the initiator, there will still
be C --H			--H 1) protocol messages sent. The quantity C --H			--H 1) does provide
an exact lower bound on locally inhibitory CCPs which must work for all FIFO
systems: in any system, N --H 1 messages are required simply to ensure that a
protocol event occurs at each processor, and in a system in which the processor
network is a tree N --H 1			--H			--H 1). Though we strongly suspect that this
quantity is indeed a precise lower bound for all classes of FIFO systems, we have
asyet noproofofit.
8.3 Summary
We have analyzed the cost in message complexity of consistent-cut protocols
for both non-FiFO and FIFO systems. We have proved an 0(N) lower bound
on globally inhibitory CCPs in both cases. We have also illustrated a trade-off
between the degree of inhibition used and the number of messages required for
FIFO protocols: any locally inhibitory CCP must use O(ICI) messages.
Chapter 9
Conclusions and Related Work
In this work we have analyzed the cost of achieving causal consistency in asyn-
chronous distributed systems. We have assessed this cost in terms of the degree
of inhibition and the message complexity of consistent-cut protocols. Concerning
inhibition, we have distinguished local versus global inhibition , and have con-
sequently defined a spectrum of protocol capabilities with respect to inhibition.
We have given a complete analysis of the existence of consistent-cut protocols
as a function of these capabilities, while also considering the FIFO or non-FiFO
nature of the system and whether or not protocol messages are allowed to be
inconsistent with respect to a cut. We have shown that local inhibition is neces-
sary and sufficient to develop a CCP for FIFO systems, while global inhibition
of send events is necessary and sufficient for non-FiFO systems. Concerning
message complexities, we have established that for non-FIFO systems, and CCP
must use 0(N) messages. Any CCP for non-FIFO systems must use global inhi-
bition; we show that g?obally inhibitory protocols for FIFO systems also have a
0(N) lower bound. Locally inhibitory CCPs, on the other hand, require 0(ICI)
121
122
messages, where C is the channel set of the system; this illustrates a trade-off
between message complexity and the degree of inhibition used. Related work
[Awe85] examines message versus time complexity trade-offs in a class of proto-
cols, called synchronizers, which resemble CCPs although they differ somewhat
in their causal constraints. A message-complexity lower bound for synchronizers
is given as a decreasing function of time complexity. However, in that work inhi-
bition is not considered directly, although it is a potential source for increasing
the time complexity of protocols.
Another related issue is that of piggybacking. Non-FIFO channels in which
messages can be "piggybacked" differ from the pure non-FIFO case, because an
order is imposed on messages that are packaged together. The "red-white" algo-
rithm of [LY87,Mat89] is an example of a protocol which piggybacks a protocol
marker onto system messages in order to determine consistent cuts. In that pro-
tocol, a color?ither white or red is associated with each processor at each
point in its local history. A processor starts out white, but may spontaneously
turn red, after which it adds a red marker to each message it sends to a neighbor.
Any white processor about to receive a red marker turns red immediately before
doing so. The white states of the processors form a consistent cut.
Our definition of "inhibitory" assumes a model in which system and protocol
events are distinct, therefore it is difficult to apply it to the red-white protocol.
However, since the protocol only adds to, and does not impede, the sending
or reception of any system message, it could reasonably be considered "non-
inhibitory." Thus one could take the viewpoint that the red-white protocol is
a non-inhibitory, consistent-cut protocol for non-FiFO systems; although, as we
123
will observe below, it does not quite satisfy our conditions on CCPs. Regardless,
we have shown in Theorem 1 that without allowing piggybacking or some other
mechanism stronger than those in our model, such a protocol is not possible to
attain.
The red-white protocol, while certainly determining consistent global states,
does not satisfy the "distinguishability" requirement (condition (2)) of our defini-
tion of a CCP. A particular local state may be the cut state of processor p in one
run (because the next event at p in that run is the reception of a red message)
but not in another (because the following event is, say, some internal event en-
abled by that local state). In effect, the protocol designates a previous state as a
processor's cut state; the processor does not know until after the fact that it was
at the cut. Alternatively, one could assume that a processor is able to "peek" at
the contents of arriving messages. Then a processor could, immediately before
the reception of a red message, designate its current state as the cut state. This
requires stronger capabilities for processors than those granted in our model.
Another approach is to model the reception of a piggybacked message in the
red-white protocol as two separate but consecutive receive events, the first of the
marker and the second of the system message (note that this is FIFO behavior).
The red-white algoritlim could then be modified to designate the state ending
with the reception of the red marker as a processor's cut state. This protocol
satisfies condition (2) of our CCP definition, and has inconsistent protocol mes-
sages. However, if we do model the reception of piggybacked messages as distinct
receive events, guaranteeing their consecutive reception would seem inherently to
require some form of inhibition. This becomes more acute if one looks at a final
124
approach to designing CCPs using piggybacking: that of simulating FIFO chan-
nels by including the entire message history with every message. Note that this
also requires unbounded message size. In sum, the modeling of piggybacked mes-
sages, and their interaction with distinguishability and inhibition, involve many
subtle issues.
Message complexity issues are also affected if piggybacking is introduced.
There need be no distinct protocol messages of course; instead, we might count
the number of protocol markers used. However, any theorems which invoke
arguments on the lines of "there must be a protocol message to prevent the
inconsistency of a system event which might occur later" are rendered invalid: a
protocol need not guard against possible system sends, as it has only to piggyback
protocol markers on actual system messages. The question of what is meant by
the termination of such a protocol must then also be addressed.
Bibliography
[Awe85] Baruch Awerbuch. Complexity of network synchronization. Journal of
the A.C.M., 32(4), October 1985.
[BF88]
(BJ87J
L. Bouge' and N. Francez. A compositional approach to superimposi-
tion. In Proceedings of the A.C.M. Symposium on Principles of Program-
ming Languages, January 1988.
Ken Birman and Thomas Joseph. Reliable communication in the pres-
ence of failures. A.C.AL Transactions on Computer Systems, 5(1),
February 1987
[Bra80j W. Brauer(ed.). Net Theory and Applications, volume 84 of Lecture
Notes in Computer Science. Springer-Verlag, 1980.
[BT87] G. Bracha and 5. Toueg. Distributed deadlock detection. Distributed
Computing, 2(3):127--H138, 1987.
[Cha82] E.J.H. Chang. Echo algorithms: Depth parallel operations on graphs.
LE.E.E. Transactions on Software Engineering, SE?8(4):391--H4OO, 1982.
[CL8s] M. Chandy and L. Lamport. Finding global states of a distributed
system. A.UM. Transactions on Computer Systems, 3(1):63--H75, 1985.
[CM85]
[Cri89]
M. Chandy and J. Misra. How processes learn. In Proceedings of the
Fifth A.C.M. Symposium on Principles of Distributed Computing, pages
204--H214, 1985.
Carol Critchlow. On inhibition and atomicity in asynchronous
consistent-cut protocols. Technical Report 89-1069, Cornell University
Department of Computer Science, December 1989.
[CT9Oa] C. Critchlow and K. Taylor. The inhibition spectrum and the achieve-
ment of causal consistency (extended abstract). In Proceedings of the
Ninth A. C.M. Symposium on Principles of Distributed Computing, pages
31--H42, 1990.
125
126
[CT9Ob] C. Critchiow and K. Taylor. The inhibition spectrum and the achieve-
ment of causal consistency. Technical Report 90-1101, Cornell Univer-
sity Department of Computer Science, February 1990.
[DDS87]
[Dil88J
D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism
needed for distributed consensus. Journal of the A.C.M., 34(1):77--H97,
January 1987
D.L. Dill. Trace Theory for Automatic Hierarchical Verification ofSpeed-
Independent Circuits. Ph.D. dissertation, Carnegie Mellon University,
1988.
[Fel79j J.A. Feldman. High level programming for distributed computing.
Communications of the A.C.M., 22(6):353--H368, June 1979.
[Fra80] N. Francez. Distributed termination. A.C.M. Transactions on Program-
ming Languages and Systems, 2(1):42--H55, January 1980.
[Gel85] David Gelernter. Generative communication in Linda. A.C.M. Trans-
actions on Programming Languages and Systems, 7(1):80--H112, 1985.
[HM9O]
J. Y. Halpern and Y. Moses. Knowledge and common knowledge in a
distributed environment. Journal of the A.C.M., 37(3):549--H587, July
1990.
[Hoa78j C.A.R. Hoare. Communicating sequential processes. Communications
ofthe A.C.M., 21(8):666--H677, February 1978.
[Hoa85] C. A. R. Hoare. Communicating Sequential Processes. Series in Com-
puter Science. Prentice-Hall International, London, 1985.
[Jos91j
[KT87]
Mark B. Josephs. Receptive process theory. Technical Report (to ap-
pear), Department of Mathematics and Computing Science, Eindhoven
University of Technology, 1991.
Richard Koo and Sam Toneg. Checkpointing and rollback-recovery for
distributed systems. LE.E.E. Transactions On Software Engineering,
SE-13(1):23--H31, January 1987.
[Lam78] Leslie Lamport. Time, clocks, and the ordering of events in a dis-
tributed system. Communications of the A.C.M., 21(7):558--H565, 1978.
N. A. Lynch and M. Tuttle. Hierarchical correctness proofs for dis-
tributed algorithms. Technical Report MIT/LCS/TR-387, M.I.T. Lab-
oratory for Computer Science, April 1987.
[LT87]
127
[LT89] N. Lynch and M. Tuttle. An introduction to input/output automata.
C. W.L Quarterly, 2(3):217--H246, September 1989.
[LY87] T.H. Lai and T.ll. Yang. On distributed snapshots. Information Pro-
cesszng Letters, 25:153--H158, 1987.
[Mat89]
Friedemann Mattern. Virtual time and global states of distributed sys-
tems. In Parallel and Distributed Algorithms (Proceedings of the Inter-
national Workshop on Parallel and Distributed Algorithms, Chateau de
Bonas, France, October 1988), M. Cosnard, Y. Robert, P. Quinton, and
M. Raynal, Editors, pages 215--H226. Elsivier Science Publishers (North-
Holland), 1989.
[Mil8O] R. Milner. A Calculus for Communicating Systems, volume 92 of Lecture
Notes in Computer Science. Springer-Verlag, 1980.
[Pet66] C.A. Petri. Kommunikation mit Automaten. Ph.D. dissertation, Insti-
tut fiir Instrumentelle Mathematik, BONN, 1966.
[PT88] P. Panangaden and K. Taylor. Concurrent common knowledge: A new
definition of agreement for asynchronous systems. In Proceedings of
the Seveth A.C.M. Symposium on Principles of Distributed Computing,
pages 197--H209, 1988.
[PT89]
P. Panangaden and K. Taylor. Concurrent common knowledge: A new
definition of agreement for asynchronous systems. Technical Report 89-
1011, Cornell University Department of Computer Science, May 1989.
[Rus77] D. L. Russell. Process backup in producer-consumer systems. In Pro-
ceedings of the A.C.M. Symposium on Operating Systems Principles,
November 1977.
[Tay89]
Kim Taylor. The role of inhibition in asynchronous consistent-cut pro-
tocols. In Lecture Notes in Computer Science 392: Distributed Al-
gorithms (Proceedings of the Third International Workshop on Dis-
tributed Algorithms, Nice, France, September 1989), J.-C. Bermond
and M. Raynal, Editors, pages 280-291. Springer-Verlag, 1989.
[Win86] Glynn Winskel. Event structures. Technical Report 95, University of
Cambridge, Computer Laboratory, 1986.
