BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1351
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Resource Bounds and Combinations of Consensus Objects
AUTHOR:: Kleinberg, Jon M. 
AUTHOR:: Mullainathan, Sendhil
DATE:: May 1993
PAGES:: 17
ABSTRACT::
The shared-memory model of computation typically provides processes with an 
arbitrary number of copies of the available object types; yet a simple 
argument shows that any consensus protocol can only make use of some finite 
subset of these. Thus we believe it is useful to consider the problem of 
consensus from the point of view of resource bounds, determining whether 
consensus can still be solved when the number of copies of the system's 
shared objects is limited. This approach leads to a general technique which we 
call the combination protocol, in which the number of processes that 
can achieve consensus with a given object increases as more copies of it are 
made available. Such a phenomenon brings up questions about the robustness of 
Herlihy's consensus hierarchy, in that objects are being combined to 
solve $n$-process consensus, even though no single copy can do so 
individually. We show how the ideas in the combination protocol appear even in 
situations where objects are not explicitly being combined with one another; 
we also consider the general question of resource bounds in several known 
consensus protocols. We analyze two such protocols that use seemingly similar 
primitives, achieving a substantial improvement in one case and showing a 
tight lower bound in the other.
END:: CORNELLCS//TR93-1351
BODY::
Resource Bounds and
Combinations of Consensus Objects
Jon Kleinberg
Sendhil Mullainathan
TR 93-1351
May1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
Resource Bounds and
Combinations of Consensus Objects
Jon Kleinberg
Sendliji Mullainathan
Abstract
The shared-memory model of computation typically provides processes
with all arbitrary number of copies of the available object types; yet a simple
argument shows that any consensus protocol can only make use of some finite
subset of these. Thus we believe it is useful to consider the problem of consen-
sus from the point of view of resource bounds, determining whether consensus
can still be solved when the number of copies of the system's shared objects is
limited. This approach leads to a general technique which we call the combi-
nation protocol, in which the number of processes that can achieve consensus
with a given object increases as more copies of it are made available. Such a
phenomenon brings up questions about the robustness of Herlihy's consensus
hierarchy, in that objects are being combined to solve fl-process consensus,
even though no single copy can do so individually. We show how the ideas
in the combination protocol appear even in situations where objects are not
explicitly being combined with one another; we also consider the general ques-
tion of resource bounds in several known consensus protocols. We analyze two
such protocols that use seemingly similar primitives, achieving a substantial
improvement in one case and showing a tight lower bound in the other.
1 Introduction
Any pessimistic model of distributed computing must take into account the perils of
asynchrony. The traditional approaches to process coordination semaphores and
critical sections [CllP] are based on the notion of a shared-memory system in
which processes explicitly take turns modifying a globally accessible data structure
(object). But processes on such a system are running asynchronously, so that the
fast processes must wait for the slower ones to complete their operations. And if a
process should crash while accessing a shared object, that object becomes unusable
by the rest of the system.
A relatively recent development in this area has been the notion of "wait-free
synchronization." One constructs objects in which each invocation is guaranteed
to return in a finite number of steps of the invoking process, regardless of the
actions (or failures) of the other processes; such an object is said to have a wait-
free implementation. The natural question in this setting is the following: one has a
system with a number of primitives that behave atomically (such as atomic registers,
test-and-set registers, and the like) and wishes to know whether a given object type
has a wait-free implementation in this system.
In [111], Herlihy proposed an approach to this problem in which one classifies
shared objects according to their consensus number. The consensus number of an
object 0, as defined in [111], is the maximum number of processes that can achieve
asynchronous consensus using atomic operations on 0 and any number of atomic
registers (recall that the results of [FLP, DDS, LAA] show that even two processes
cannot achieve consensus using only registers). If any number of processes can
achieve consensus using 0 and atomic registers, then 0 is said to have infinite
consensus number. The resulting classification of shared objects is referred to as
the consensus hierarchy. In (111], it is shown that if the consensus number of 0 is
k, the consensus number of 0' is k', and k> k', then 0' cannot provide a wait-free
implementation of 0 in a system of k or more processes.
We are interested the structure of this hierarchy, centering on the question of
whether consensus number, as defined above, is truly a robust way of classifying
shared objects. Specifically, we analyze the effect of combining shared objects is
it possible to combine objects having consensus number k, and obtain a system in
which k+ 1 processes can solve consensus? And when the objects we combine are all
of the same type, this becomes a question of resource bounds how many copies
of an object are needed for some fixed number of processes to achieve consensus?
In order to carry out this analysis, we find it useful to make the model of [Hi]
somewhat more flexible. First of all, we will consider the "consensus number" of
arbitrary (possibly infinite) collections of objects; i.e. we consider the maximum
number of processes that can achieve consensus using precisely objects ..... . , on.
Thus, we do not a priori assume that the system contains an arbitrary number of
2
atomic registers; in this way, we make more explicit the resource requirements of our
consensus protocols. With this variation of the model comes an additional subtlety:
when the system contains an underlying infinite set of registers, certain natural forms
of the consensus problem are reducible to others. With limited resources, however,
this is no longer necessarily the case, and we identify two forms of consensus on
which to concentrate: binary consensus and leader election (defined below). The
definition of consensus number carries over in the obvious way to these problems,
and we refer to the corresponding quantities as bc-number and le-number. We note
that leader election, using unlimited registers, coincides with Herlihy's notion of
consensus (and consensus number); thus, when we speak simply of consensus, we
mean just this leader election with registers.
Our main results center around a technique that we call the combination protocol.
In the basic combination protocol, we consider an object 0 with bc-number and le-
number equal to 2, and show that a collection of k copies of 0 has le-number at
least k + 1 (and bc-number at least [k+21J) Thus, although the consensus numbers
of an individual copy of 0 are very low, an infinite set of them can solve consensus
among an arbitrary set of processes. Hence, for example, even one copy of 0 cannot
be implemented by any of the 2-consensus objects considered in [111].
We believe the protocol itself represents an interesting technique for designing
wait-free algorithms. It consists essentially of an "elimination tournament," which
processes enter in a staggered fashion so that by the end, many processes are able
safely to access a single copy of 0 simultaneously. Moreover, the view of objects
as discrete units represents a departure from a number of previous approaches, in
which all the objects in the system were in a sense "wired together." Our protocol,
on the other hand, is modular; the (n +1)-process version of the protocol is obtained
simply by adding an additional copy of the object and an additional system step to
the fl-process version.
Jayanti has subsequently proved several strong statements about the structure
of Herlihy's hierarchy, using objects that can implement our combination protocol
and were amenable to a number of highly novel "impossibility" arguments [Jaj.
Specifically, he considers the four consensus hierarchies one obtains by varying the
following two parameters:
o+ Do processes have access to just one copy of the object 0, or arbitrarily many
copies?
o+ Do processes have access to an infinite set of atomic registers, or just (the
copies of) 0?
He denotes these by h1, hrn, hri, h?m, where r indicates that registers are available,
and m indicates that multiple copies of 0 are available. A hierarchy is robust if no
set of objects lying below a given level k can provide a wait-free implementation
3
of an object on level k for k or more processes. In his terminology, our results in
Section 3 provide the most basic of these results: h1 is not robust. [Jaj extends this
to the more natural structures hm and h?1 (Herlihy's original hierarchy), showing
that neither is robust. The robustness of hrm is an open question.
The combination protocol appears to be applicable in a number of non-trivially
different contexts. In Section 4, we show an application to the problem of consecutive
m-register assignment, under the definitions and hierarchy of [Hi]. This is based
on an atomic primitive considered by Herlihy in [Hi]: simultaneous assignment to
m registers, which he showed has consensus number exactly 2m --H 2. However, his
protocol relied on the ability of one process to assign to registers very far apart in
memory. We show that this is in fact necessary; when the registers in a simultaneous
assignment must be contiguous in memory, the consensus number is exactly 3, for
all m > 3. We use a version of the combination protocol to show the lower bound
on consensus number the elimination tournament in this case can be continued
for two rounds, but then is halted, in effect, by a combinatorial property of the
one-dimensional array of registers.
Finally, in Section 5, we analyze two other protocols presented in [Hi] from the
point of view of resource bounds. It is worth noting that both of the primitives
involved are simply classified as having infinite consensus number; yet our results
reveal a strong sense in which one is more powerful than the other. Consider a
system with an infinite set of atomic registers, as well as some number of special
pre-initialized registers ..... . , xk. The registers Xj can be read atomically, and the
value in Xj can be atomically copied into Xj. [111] gives a protocol for n processes to
solve consensus using 2n such registers, allowing atomic writing to each Xj as well.
We give an n-process protocol which does not require atomic writing to the Xj and
uses only [2 log n + 2J such registers; it is based on the surprising fact that a set of
two such registers has infinite bc-number.
In contrast to this exponential improvement in resource requirements is the su-
perficially similar case of registers with atomic swap. Here, the contents of the
special registers Xj can be atomically swapped or read. Without assuming the abil-
ity to write atomically to the Xj, [Hi] gives an n-process consensus protocol using
n + 1 such registers; we show that under this model, the protocol is resource-optimal
any n-process consensus protocol in this system requires n+1 registers-with-swap.
In summary, it is our belief that analyzing consensus protocols from the point
of view of resource requirements reveals a number of subtleties in the structure of
the hierarchy of shared objects. We hope that further analysis along these lines
will resolve a number of important questions about the relative power of wait-
free objects, as well as the problem of whether there exists a robust hierarchy for
classifying them.
4
2 Definitions and Initial Results
2.1 The Model
We use essentially the model of computation presented in [111, H2]. A concurrent
system consists of a set of n processes which can access a collection of shared objects.
In what follows, we consider only deterministic processes and objects.
A deterministic object is a possibly infinite-state automaton; in one step, a
process can issue an input to the object, causing it to change state and return an
output. Thus, we will often view an object 0 as a labeled directed graph (Q, ?),
where the vertex set Q is the set of states, and each element of the edge set ? is
labeled by a pair (input, output). When the object is in a given state q E Q and
receives input i, it follows the edge with input label i. Since we are dealing with
deterministic objects, each input label will appear on at most one edge out of q;
if it appears on no edge, the operation is not enabled in q. A finite-state object is
one for which Q and ? are finite sets. We assume that all objects are oblivious:
in a given object state, each process has the same set of possible (enabled) inputs.
Finally, by a schedule for the system we mean simply a finite or infinite sequence of
process names; at each entry in the sequence, the named process is allowed to take
one step (i.e. apply a single input to an object, receive the output, and perform
some internal computation).
The basic task we consider is that of consensus. A consensus problem for a set
....... , pn 1 of processes is characterized by a collection of sets ....... , Un 1. In a
protocol to solve this problem, each participating process p? begins by proposing a
value v E U??; the processes then try to agree on a common value using operations
on the shared objects in the system. Each process decides on a value, then halts.
The protocol must satisfy the following properties.
o+ Wait-freedom: Each process decides and halts in a finite number of steps,
regardless of whether other processes have crashed.
o+ Agreement: Each process decides on the same value.
o+ Validity: This common value is the proposal of some process.
When we simply use the term "consensus," we will mean specifically leader election,
which is the consensus problem in which each process proposes its own name (Ui =
fpjj; the decision can be viewed as agreement on a "leader"). Thus, the validity
condition can be viewed as follows: the decision value must be the name of a process
that has taken at least one step. This essentially is the type of consensus considered
in [Hi]. We will also consider the other most common variation --H binary consensus,
in which all proposals must be 0 or 1; i.e. Uj = [0,11 for each i. The differences
between these two problems will be considered below in Section 2.2.
5
Let us say that a consensus problem is finite-choice if each set Uj is finite. Con-
sider a protocol for this problem which uses an object 0. Moreover, assume that
the state of a process beginning the protocol depends only on the value of its pro-
posal v. Then for a fixed choice of proposals, one for each process, we can view
the set of schedules as a finitely branching tree in which every path is finite. By
K6nig's Lemma, the tree itself must be finite. Since there are only H?i=i lUil possible
trees, we have the following fact; it says in effect that if 0 is infinite-state, it can
be "pruned" down to a finite-state object on which the identical protocol works.
Lemma 1 (Pruning Lemma) Let ? be a protocol for a finite-choice consensus
problem which uses object 0. Then there is a finite-state object On which is formed
by taking a finite subgraph on the states of 0, on which P is still an fl-process
consensus protocol.
2.2 Two Kinds of Consensus
In a system with infinitely many atomic registers, any consensus problem can be
reduced to leader election; if n processes can solve leader election, then they can
solve any consensus problem. Without registers, however, this is not necessarily the
case, so it makes less sense to speak of a single consensus number for an object.
Thus, for a collection C = ....... , OkJ of shared objects, we will define the le-
number and bc-number of this collection to be the maximum number of processes
that can solve leader election and binary consensus, respectively, using C.
It is possible for object's bc-number to exceed its le-number by an arbitrary
amount. Consider the sticky bit of [PI]; initially it is empty, and the first 0 or 1
written to it "sticks" permanently. A single copy of this object has infinite bc-
number, but an le-number of 2.
In the other direction, however, we can show the following tight bound.
Theorem 1 If an object 0 can solve leader election among n processes, then its
bc-number is at least [n/2J. For each n > 4 this bound is achieved by an object On
with le-number n and bc-number [n/2J.
Proof. One direction is easy. Consider an arbitrary object 0 which can solve n-
process leader election, and let k = tn/2J. k processes pi,.. , Pk can use 0 to
solve binary consensus by collectively simulating the execution of a leader election
protocol among the 2k processes q?,... , q?0, q11,. . . , q?1, as follows. In order to propose
u, p? simulates the behavior of q?". When the protocol terminates with leader qy, the
processes decide on v. It is easily verified that this protocol satisfies the properties
of binary consensus.
Now let n > 4, and consider the following object On. The states of On are
q?J; its operations are ....... , XnJ. The operations are defined as follows:
6
in state c, operation Xj leads to q?; in state q?, operation Xj leads to c. For j $ i,
performing Xj in state q? leaves the object in state q?. Finally, On supports an atomic
read operation which returns the current state. It is clear that n processes can elect
a leader using On. The object is initialized to state c; process p? performs operation
Xj and then reads the current state. It is also easily verified that the le-number of
On is in fact exactly n.
We claim that with k = [n/2J, k + 1 processes cannot solve binary consensus
using On. Assume by way of contradiction that such a protocol ? exists. We
consider here the case in which On begins in state c; the case in which it begins in
some q? is analogous. If process p? proposes v and all other processes crash, then we
say p? is executing a solo run; let rtv. denote the state of 0n in which this solo run
terminates.
Clearly, we cannot have rtv. = c for any i, v, since otherwise some p? $ p? would
not realize when it first woke up that v had already been decided. For a similar
reason (since k+ 1 > 3), we cannot have r?9 = rJ for any i,j. Thus by the pigeonhole
principle there is a state q? = rtv. = r,v. for some?$j.
Consider the following execution. Let p? and p? both propose v, and some third
process Pr (since k + 1 > 3) propose v = 1 --H v.
o+ Run p? alone until it is about to leave 0n in state q? = rtv.. Thus 0n must be
currently in state c.
Now schedule p? alone; since it first sees On in state c, it will attempt to leave
the object in state q? as well. Run p? until it is about to execute operation
Xm from state c.
o+ Run p? to completion; it will execute Xm and halt, deciding v.
o+ Let p? take a single step --H the operation Xm --H and crash it. The object has
now returned to state c.
o+ Wake up Pr and let it run to completion, deciding v.
This contradiction completes the proof. I
A similar result clearly holds for m-valued consensus for each 1 < m < n. The
theorem above can also be viewed as an extension of the result of [JT], that there
exists an object which can solve 2-process leader election but not 2-process binary
consensus.
7
3 The Combination Protocol
As noted in the Introduction, there are a number of possible objects one could use
in implementing the combination protocol; we focus on the finite-state object type
S2, defined below, throughout this section.
Define a monotone bit to be a single bit that supports atomic reading and the
writing of the value 1 --H the value 0 cannot be written to a monotone bit. An
object of type S2 consists of two monotone bits (the "left" and "right" bits) with
the following operations:
o+ read: atomically read the value of the two bits simultaneously
o+ R: write 1 to the right bit.
o+ L: write 1 to the left bit.
o+ S: atomically swap the values of the two bits.
We now outline the combination protocol to solve fl-process leader election using
n --H 1 copies of S2. Note that by Theorem 1, this means that [n/2J processes can
solve binary consensus using n --H 1 copies of S2; i.e. there is a protocol to solve
fl-process binary consensus using 2n --H 1 copies of S2. Thus, these results show that
an infinite set of copies of S2 has infinite le- and bc-numbers.
Let ....... ,p? J be the set of processes and ....... ,Sn?i J the set of copies of
S2. Each Sj is initialized with the value 01(0 in the left bit and 1 in the right). The
protocol consists of n --H 1 numbered phases, followed by a final phase. Process p?
participates in phases maxf 1, j --H 1) through n --H 1 in succession, followed by the
final phase; at the end of the final phase, it reaches a decision. Note that there is
no waiting involved; a process begins executing its first designated phase as soon as
it takes its first step.
Phase i (1 < i < n --H 1) involves only operations on s?; the code for phase j is
as follows:
Phase j
Pi,.. ,Pj executes L on
Pj+1 executes 5 on
The code for the final phase is as follows:
Final Phase
each p? reads si, . . ,5n-1 one at a time
if all values read are 11 then
elect Pi
else
8
elect p?, where sj-i is the
highest-indexed copy of S2
whose value was read as 10
Theorem 2 The above protocol is a solution to fl-process leader election.
Proof. It is obvious that it is wait-free. If Pi is elected, then L must have been
executed on s?, which only Pi can do. If p? is elected, then S must have been
executed on si-i, which only p? can do. Thus the elected process has taken a step.
Finally, we must verify agreement. First observe that each copy of Sj assumes at
most two different values over the execution of the protocol. It is initially in state
01. If the first operation invoked on it is 5, then it assumes the value 10, and all
further executions of L will have no effect. If the first operation is L, then its value
becomes 11, and no subsequent operation can have an effect.
Now let a denote the schedule associated with a given execution of the protocol,
and a* denote the prefix of a up to the first point at which some process has
completed all its numbered phases (i.e. is about to begin the final phase). We claim
that the system state following the execution of a* satisfies one of the following two
properties.
(i) Allsj have the value 11, i = 1,...,n--H 1,or
(ii) There is some j > 1 such that s? has the value 10, and 5j+i,...,5n-i have the
value 11 (or j = n --H 1).
To see this, assume that the system does not have property (i), and let j be the
maximal index for which 5? does not have the value 11. We claim it has the value
10, satisfying property (ii). For if not, then it must have the value 01, in which
case no process has invoked an operation on Sj. If j = n --H 1, this contradicts the
definition of a*, since no process could have completed phase n--H 1. And if j <n--H 1,
then Sj+i has the value 11, which can only result from the execution of L by one of
Pi... , p??+i But each of these processes must have taken an earlier step in which
it invoked an operation on 55 again, a contradiction. Thus, 55 has the value 10,
and property (ii) holds.
Hence, if property (i) holds after the execution of a*, then all processes that
complete their final phase will decide on Pi. If property (ii) holds after a*, then
all processes that complete their final phase will decide on Pj+i. In both cases,
agreement is satisfied. I
Let us consider the power of a single copy O of S2. Using fairly straightforward
protocols, two processes can solve either leader election or binary consensus using
0. (In both cases, initialize 0 to 01. For leader election, let Pi execute L and P2
execute 5; for binary consensus, let a process proposing 0 execute 5 and a process
proposing 1 execute L.) Indeed, we can show the following tight bounds.
9
Theorem 3 A single copy of S2 has le-number and bc-number equal to 2.
Proof. Given the discussion above, the result will follow if we show that 3 processes
can solve neither leader election nor binary consensus using a single copy of S2. We
present the proof for binary consensus; the proof for leader election is similar but
easier, and is left to the reader.
Assume by way of contradiction that there is such a protocol ?. To avoid
notational confusion, we will let 0 and 1 denote the values of the bits in the copy of
S2, and use v and u = v to denote the outcomes of binary consensus. Let p, q, and
r denote the three processes.
First consider the case in which the initial state is Ol (or analogously 10). We
again make use of the notion of a process p's solo run, in which it first "wakes up"
with the system in its initial state, and runs to completion, deciding its own value.
Clearly p cannot distinguish between the scenario in which it is the only live process,
and the one in which other processes have executed but left the system in its initial
state.
In view of this, it must be the case that the solo run of a process proposing u
leaves the object in state 10, and the solo run of a process proposing v leaves it in
state 11 (or vice versa). Consider an execution in which p and q propose u, and r
proposes v. We have the following two scenarios.
In Scenario 51, process r runs to completion, deciding v. Scenario 52 is as
follows (see Figure 1).
o+ Run p and q until they are about to execute 5 (which they must do) for the
first time.
o+ Run p to completion; it will decide u.
o+ Now run q for a single step --H the execution of 5 and crash it, leaving the
object in state 10.
o+ Finally, run r to completion.
These two scenarios are indistinguishable to process r, so it will decide v in 52 as
well but this violates agreement.
In the other case, the initial state is 00. If two processes have opposite proposals,
their solo runs clearly cannot leave the object in the same state. Let us therefore
assume that there is a solo run of some process proposing u which leaves the object
in state 10. Furthermore, we consider the case in which a process waking up to find
the object in states 01 or 11 will decide v; all other cases are analogous.
Assume that p proposes u, and its solo run leaves the object in state 10. Schedule
p alone until it is about to write to a register. It cannot execute R, since we then
crash it and wake up q, which also proposes u. But q must decide v, since the object
10
p
run to comp?tion
q
solo run
Figure 1: An indistinguishable scenario
is in state 01, violating validity. Thus, p must first perform operation L. But in this
case, we have the following execution:
o+ Run p until just before its first write operation, which will be L.
o+ Let q propose v and run it until just before its first write operation this
must be R by the same reasoning.
o+ Run p to completion (it decides u).
o+ Schedule q for the single step R, and crash it.
o+ Finally, wake up r. Since the object is in state 11, it must decide v.
This execution violates agreement, which completes the second case and the proof.
As noted earlier, this means that although the consensus numbers of a single copy
of S2 are equal to 2, an infinite set of them can solve consensus among any number
of processes. Hence our belief that consensus number, considered by itself, does not
give a complete description of the power of a shared object to support consensus
protocols. In the terminology of [Ja], the following is an immediate corollary of
Theorems 2 and 3 (recall that h1 is the hierarchy of shared objects in which the
consensus number of 0 is the number of processes that can solve consensus using a
single copy of 0 and no registers).
Corollary 1 The hierarchy h1 is not robust.
11
4 A Variation on the Protocol
For the remainder of this paper, we consider the combination protocol, and resource
bounds in general, from the point of view of the original model of Herlihy [111].
Thus, we will assume that our system contains an infinite set of atomic registers;
again, we will use the terms consensus and consensus number to refer to the problem
of leader election in such systems.
[111] considers the primitive simultaneous m-register assignment. A system with
this primitive consists of an infinite set of atomic registers, and supports an addi-
tional atomic operation in which a process can simultaneously assign m (potentially
different) values to m distinct registers. It is shown in [Hi] that a system with
this primitive has consensus number 2m --H2. It seems natural to consider what the
consensus number would be if we did not allow simultaneous assignment to registers
that can be arbitrarily far apart in memory. Thus we examine the case in which the
m registers to which a process writes must be contiguous in memory.
The technique of the combination protocol finds application here, as we use
it to give a 3-process consensus protocol when m > 3. The idea is as follows.
The advantage of simultaneous assignment is that by writing to sets of registers
that intersect, two processes can determine which took the first step by reading
the registers in the "overlap." So in the first phase, two processes can write to
intersecting intervals in memory. They then move to a different part of memory,
and write to intervals that intersect the opposite ends of the third process's interval.
In this way, a consistent leader can be elected. See the proof below for the actual
protocol.
Unlike the case of the basic combination protocol, one must contend here with
the "geometry" of the array of registers. Specifically, a third phase in this spirit is
not possible; an interval has only two endpoints. In fact, employing a combinatorial
argument about the intersections of four intervals on a line, we can show that 3-
process consensus is the best that can be achieved in this system.
Theorem 4 An infinite set of registers with atomic read, write, and consecutive
m-register assignment, has consensus number 2 if m = 2, and 3 if m > 3.
Proof. The case in which m = 2 is straightforward. Thus let us assume m > 3.
first we give a protocol for 3 processes to solve leader election. Let Pi, P2, P3 be
the names of the processes, and let the registers be numbered 0,1We will
say that pi marks an interval [a, b] by simultaneously writing its name in registers
The protocol is as follows.
Phase 1
Pi marks [0, m --H 1]
12
P2 marks [1,mj
Phase 2
Pi marks [m+1,2mJ
P2 marks [3m--H1,4m--H2]
p3 marks [2m,3m--Hlj
Final phase
Each p? reads registers O,...,4m--H2,
following the technique of Hi]
if p? completed Phase 2 before Pi or P2
then elect p?
else if P2 completed Phase 1 before Pi
then elect P2
else
elect Pi
It is straightforward to verify that this satisfies wait-freedom, validity, and agree-
ment.
To show the impossibility of a solution to 4-process consensus, we follow the
standard structure for impossibility proofs of this sort (see e.g. [FLP, DDS]). As-
sume that such a protocol ? exists for 4-process consensus in this system. Let a
denote the schedule of a partial execution of ? and q(a) the system state follow-
ing the execution of a. We say that q(a) is v-determined if in all extensions of a,
the decision v is reached. A state is determined if it is v-determined for some v,
and undetermined otherwise. By the validity requirement, the starting state q(A)
is undetermined; but a state in which all processes have decided is clearly deter-
mined. Thus, a simple and oft-repeated argument shows that there exists at least
one boundary state q(a): q(a) is undetermined, but every extension of a by one step
results in a determined state.
Thus, we consider a boundary state q(a) in the execution of our supposed 4-
process protocol P. Let the "preference" Vj of Pi be the value it would decide if
all other processes crashed at this point. As in [111], observe that all four steps ex-
tending a must be consecutive assignments (in this case, to "intervals" in memory),
with the properties that the interval of each pi must include a register that no other
process writes to, and that if p? and Pi have different preferences, their intervals
must intersect in some register that no third process writes to. Assume without loss
of generality that Vi # V2, Vi $ v3 (i.e. the preference of Pi differs from those of P2
and P3) Then the interval of Pi intersects those of P2 and P3. There are two cases
to consider.
1. v4 $ v1. Then if ?4'S interval intersects Pi `5, one of P2, P3, or P4 will have no
register that it alone writes to.
13
2. v4 = v1. Then if ?4'5 interval intersects those of both P2 and p?, one of Pi or
p? will have no register that it alone writes to.
As both cases lead to contradictions, the proof is complete. I
5 Resource Bounds
Finally, we analyze the resource requirements of fl-process consensus using two other
primitives considered in [111]. Consider a system with an infinite set of atomic reg-
isters, and also some special pre-initialized move-registers xi,... , xk. These special
registers support the atomic operations read(??), which returns the contents of Xj,
and move(x?, xj), which atomically copies the contents of Xj into Xj. These are
the only two operations allowed on the registers Xj. Alternately, we can consider
a system with additional pre-initialized swap-registers ..... . , y?; here the atomic
operations are read(y?) and swap(?i?, y?), which atomically exchanges the contents of
yj and y?. In the latter case, note the difference between these swap-registers and
the object S2 of Section 3; here, a process can swap between any pair of registers.
In [Hi], protocols are given for n processes to solve consensus using either kind of
primitive; the conclusion is that they both have infinite consensus number (actually,
the protocol using move in [Hi] required the ability to write directly to the move-
registers; below, we give a protocol that does not need this assumption). These
protocols with move and swap require 2n and n + 1 special registers, respectively,
suggesting that there is not much difference in the power of these two primitives.
The results of this section show a sense in which move is actually significantly
more powerful than swap; they reveal an exponential gap between the number of
move-registers and the number of swap-registers needed to solve fl-process consensus.
We give an fl-process consensus protocol which requires only [2 log fl + 2j registers
with move; on the other hand, we show that any fl-process consensus protocol using
swap-registers requires n + 1 such registers; i.e. the protocol of [Hi] is resource-
optimal. We note that the protocol using move still requires n additional ordinary
atomic registers; thus, the exponential gap does not extend to the total memory
requirements of the protocols.
Theorem 5 There is an fl-process consensus protocol in a system with infinitely
many atomic registers, and move-registers x1,... , xk, where k = [2 log fl + 2j.
Proof. The key fact is that two move-registers are in fact sufficient for any number
of processes to solve binary consensus. Let x1 and x2 denote the two registers;
initialize x1 with 0 and x2 with 1. To propose 0, a process copies from x1 to x2; to
propose 1, a process copies from x2 to x1. After the first step taken by any process,
the values of the two registers become the same subsequent move operations have
14
no effect. Thus it is straightforward to verify that it is a solution to fl-process binary
consensus for any n.
Now, in (Pi] a leader election protocol using [log n + iJ sticky bits is presented
(the process is elected "one bit at a time," with n atomic registers used to ensure
that validity is satisfied). It is easy to adapt this technique to our case; here each
sticky bit is replaced by a pair of move-registers. Consequently, we have the stated
bound. I
Theorem 6 Consider a set of arbitrarily many atomic registers, and swap-registers
,Yk. If n processes can solve consensus in this system, then k > n + 1.
Proof. Let ? be any protocol for n processes to solve consensus in this system, and
let q(a) be a boundary state in the execution of this protocol (as in the proof of
Theorem 4). By the usual arguments, we see that all operations scheduled out of a
must be swaps. Let the preference Vj of p? be the value it would decide if all other
processes crashed at this point. Also, it is easy to verify that 2 swap registers will
not suffice for 2 processes; thus, in what follows, we can assume n > 3.
We define a graph G = (V? ) as follows. The vertex set V = ....... , skJ.
There is an (undirected) edge (si, 5?) if and only if some process p? is executing
swap(s?, s?) from a. Clearly two processes cannot perform the same swap operation,
so this is a graph without loops or multiple edges. Hence, we label each edge with
the preference v of the process performing the associated swap.
A straightforward indistinguishability argument establishes the first of the fol-
lowing claims.
Claim 1 If e and e' are edges with different labels in G, then they must share an
endpoint.
Claim 2 If 0 contains a cycle, all edges on the cycle must have the same label.
Proof. Let C be a cycle in 0 containing edges of both labels; we obtain contradictions
in the following three cases.
1) C is a 3-cycle. Let e1,e2,e3 be the three edges of C, and pi,p2,p3 the three
processes performing the swaps. We can assume without loss of generality that v1
v3. Then scheduling p1p?p? from a results in the same system state as scheduling
p?p?m, yet these states are determined for different values.
2) C is a 4-cycle. Let ..... . , e4 be the four edges in "clockwis& order, and
..... . , p? the associated processes. Claim 1 implies that v1 = v3 and v2 = v4; hence
it must be that v1 $ v2. But from a, the schedules p1p?p?p? and p2?4p?p? leave the
system in the same state.
3) C is a ?cycle b> 5 Let e and e' be neighboring edges with different labels v,
v', respectively. Let e1 and e'1 be the other neighboring edges of e and e' respectively;
15
since e1 and e'1 do not share an endpoint, Claim 1 implies that they must have the
same label. But then one of the disjoint pairs fe1, e'J, ?e, e'iJ must have different
labels. I
Claim 3 G is acyclic.
Proof. Suppose C had a cycle C. By Claim 2, all edges of C must have the same
label, say v. Since a is undetermined, some edge e has label v' ? v. But e cannot
share a vertex with all the edges of C; this contradicts Claim 1. I
The theorem now follows easily. Since G is an acyclic graph with n edges, we
must have VI = k > n + 1.1
6 Conclusion
By viewing wait-free primitives as individual objects that can be replicated and
combined, we have been able to consider consensus from the perspective of resource
requirements. The combination protocol of Section 3 presents a general technique
for designing consensus protocols within this framework.
A number of questions remain open. First of all, we are interested in other
applications of the combination protocol in the design of wait-free algorithms. Our
work and that of [Ja] suggests that perhaps it is most natural to define the consensus
number of an object 0 as the maximum number of processes that can solve consensus
using an infinite set of registers and an infinite set of copies of 0. Nevertheless, we
do not know whether, even under this definition, there exist objects Oi and 02, with
consensus numbers k1 < k2 respectively, such that more than k2 processes can solve
consensus when the two objects are used together. A solution to this problem would
settle a basic question about the nature of asynchronous consensus, and would very
likely reveal a number of new ideas in the analysis of consensus protocols.
Acknowledgements
The authors wish to thank Prasad Jayanti, Keith Marzullo, and Sam Toueg for
helpful discussions on the topic of this paper. Thanks also to Cynthia Dwork for
helping to improve the presentation.
16
References
[CHP] P. Courtois, F. Heymann, D. Parnas, "Concurrent Control with Readers and
Writers," Communications of the ACM? 14(1971), pp. 667-668.
[DDSj D. Dolev, C. Dwork, L. Stockmeyer, "On the Minimal Synchronism Needed
for Distributed Consensus," Journal of the ACM, 34(January 1987), pp. 77--H99.
[FLPJ M.J. Fischer, N.A. Lynch, M.S. Paterson, "Impossibility of Distributed Con-
sensus with One Faulty Process," Journal of the ACM 32(April 1985), pp. 374--H
382.
[H2] M. Herlihy, "Impossibility Results for Asynchronous PRAM," Proc. Third
ACM Symposium on Parallel Algorithms and Architectures, 1991.
[111] M. Herlihy, "Wait-Free Synchronization," ACM Transactions on Programming
Languages and Systems, Jan. 1991, pp. 124--H149.
[Ja] P. Jayanti, "On the Robustness of Herlihy's Hierarchy," Proc. Twelfth ACM
Syposium on Principles of Distributed Computing, 1993.
[JTj P. Jayanti, 5. Toneg, "Some Results on the Impossibility, Universality, and De-
cidability of Consensus," Proc. Sixth Workshop on Distributed Algorithms, 1992.
[LAA] M.C. Loui, H.H. Abu-Amara, "Memory Requirements for Agreement Among
Unreliable Asynchronous Processes," Advances in Computing Research, vol. 4,
1987, pp. 163--H183.
[Pl] 5. Plotkin, "Sticky Bits and Universality of Consensus," Proc. Eighth ACM
Symposium on Principles of Distributed Computing, 1989.
17
