BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1357
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Randomized Distributed Edge Coloring via an Extension of the 
        Chernoff-Hoeffding Bounds
AUTHOR:: Panconesi, Alessandro 
AUTHOR:: Srinivasan, Aravind 
DATE:: June 1993
PAGES:: 21
ABSTRACT::
Certain types of routing, scheduling and resource allocation problems in a 
distributed setting can be modeled as edge coloring problems. We present fast 
and simple randomized algorithms for edge coloring a graph, in the synchronous 
distributed point-to-point model of computation. Our algorithms compute an 
edge-coloring of a graph $G$ with $n$ nodes and maximum degree $\Delta$ with 
at most $(1.6 + \epsilon)\Delta + \log^{2+\delta} n$ colors with high 
probability (arbitrarily close to 1), for any fixed $\epsilon,\delta > 0$.

To analyze the performance of our algorithms, we introduce an extension of the 
Chernoff-Hoeffding bounds, which are fundamental tools that are used 
very frequently in estimating tail probabilities. However, they assume 
stochastic independence among certain random variables, which may not always 
hold. Our results extend the Chernoff-Hoeffding bounds to certain types of 
random variables which are not stochastically independent. We believe that 
these results are of independent interest, and merit further study.
END:: CORNELLCS//TR93-1357
BODY::
Randomized Distributed Edge Coloring via an
Extension of the Chernoff-Hoeffding Bounds*
Alessandro Panconesi
Aravind Srinivasan
TR 93-1357
June1993
Department of Computer Science
Cornell University
Ithaca, NY 14853-7501
*A preliminary version of this work appears as "Fast Randomized Algorithms for
Distributed Edge Coloring", in the ACM Symposium on Principles of Distributed
Computing, pages 251-262, August 1992 This research was supported in part by
NSF PYl award CCR-89-96272 with matching support from UPS and Sun
Microsystems.
Randomized Distrubuted Edge Coloring via an
Extension of the Chernoff-Hoeffding Bounds *
Alessandro Panconesi & Aravind Srinivasan
Department of Computer Science, Cornell University
Ithaca, NY 14853
E-mail: fap, srinjccs.cornell.edu
June 9,1993
Abstract
Certain types of routing, scheduling and resource allocation problems in a distributed
setting can be modeled as edge coloring problems. We present fast and simple randomized
algorithms for edge coloring a graph, in the synchronous distributed point--Ht?point model
of computation. Our algorithms compute an edg?coloring of a graph G with n nodes and
maximum degree A with at mcst (1.6+E)A+log'+6 n colors with high probability (arbitrarily
close to 1), for any fixed ?,6 >0.
To analyze the performance of our algorithms, we introduce an extension of the Chernoff-
Hoeffding bo?nds, which are fundamental tools that are used very frequently in estimating
tail probabilities. However, they assume stochastic independence among certain random
variables, which may not always hold. Our results extend the Chernoff-Hoeffding bounds to
certain types of random variables which are not stochastically independent. We believe that
these results are of independent interest, and merit further study.
*A preliminary version of this work appears as "Fast Randomized Algorithms for Distributed Edge Coloring",
in the A CM Symposium on Principles of Distri6uted Computing, pages 251--H262, August 1992. This research was
supported in part by NSF PYl award CCR-89-96272 with matching support from UPS and Sun Microsystems.
1 Introduction
All importallt limitation for a distributed network without global memory ill computillg ally
function is that for all efficient algorithm, each processor can communicate only with those
processors that are within a small radius around it. Models of parallel computatioll like the
PRAM abstract this problem of locallty away by assumillg the existence of a global shared
memory with fast concurrent access. We are interested ill studying how fast individual processors
can compute their portion of the output ill a messag?passillg distributed system, with such
"local" information alone. The model we study is the synchronous distributed point--Ht?point
model, ill which the processors are arranged as the vertices of all n--Hvertex graph G = (V, ),
and where all communication is via the edges of G alone.
The edge coloring problem can be used to model certaln types of jobshop scheduling, packet
routing, and resource allocation problems in a distributed setting. For example, the problem of
scheduling 1/0 operations in some parallel architectures can be modeled as follows [6]. We are
given a set of processes ? and a set of resources ? such that each process p E P needs a subset
J(p) C 1? of the resources where: (i) each process p needs every resource in f(p) for a unit of
time each, and (ii) p can use the resources in f(p) in any order. From this, we can construct
a bipartite graph Gp,? = (P, ?, Ep,?) where Ep,? = ?(p, r)j p E P A r E f(p)Y. An edge
coloring of Gp,? with c colors yields a schedule for the processes to use the resources within c
time units. Optimal colorings correspond to optimal schedules.
Edge coloring can also be used in distributed models in situations where broadcasts are
infeasible or undesirable: an edge coloring of the network results in a schedule for each processor
to communicate with at most one neighbor at every step (at time step i, processors communicate
via the edges colored i only), and using a "small" number of colors reduces the wastage of time
in this schedule.
Note that A colors are necessary to edge color a graph with maximum degree A. Vizing
showed that it is always possible to edge color a graph with A + 1 colors and gave a polynomial
time algorithm [3]. Efforts to parallelize Vizing's theorem have falled so far. The best known
result is an RNC algorithm of Karloff & Shmoys that uses A + A0?5+' colors. The algorithm
can be derandomized in NC by standard techniques by Berger & Rompel, and Motwani, Naor
& Naor [2, 101. In the distributed model, the best known edge coloring algorithm is to apply a
vertex coloring algorithm to the line graph of G. There are fast (polylogarithmic) randomized
vertex coloring algorithms that use (A + 1) and A colors, which translate to (2A --H 1)--H and
(2A --H 2)--H edge coloring algorithms respectively [9, 12]. There are no known polylogarithmic
deterministic algorithms in the distributed setting for (2A --H 1)?dge coloring [1, 12]. Moreover,
distributed A?dge coloring requires ?(diameter(G)) time, even with randomization[12].
We present fast and simple randomized algorithms to edge color G with at most (1.6 + E)A +
0.4 log2+6 n colors with high probability for any fixed E, 6> 0, where A is the maximum degree of
the vertices of G. At the heart of our analysis is an extension of the Chernoff--HHoeffding bounds,
which are key tools in bounding the tall probabilities of certain random variables [4, 5, 13].
The edge coloring algorithm is based on a very simple randomized procedure to color bipartite
graphs. The algorithm can be explained in a few lines. Given a bipartite graph G = (A, B, E)
2
with maximum degree A, each vertex ill B picks distinct colors from (1,2,..., A) at random
for its edges without replacement, i.e., edges incident to the same vertex get different colors.
Then, each vertex v E A checks, for each color a, if more than one of its incident edges has
color a and if so, chooses one of them at random as the winner, and all the other edges of color
a which are incident to v are decolored. The key claim is that for every vertex, the number of
decolored edges incident to it is at most A(1 + ?)/e with high probability, for any fixed E > 0 (e
is the Euler number). Assuming that this holds, we can repeat the above iteration with a set of
A(1 + e)/e fresh colors, and so on. In spite of its simplicity the algorithm requires an interesting
probabilistic analysis based on our extension of the Chernoff-Hoeffding bounds to a certain case
of dependence among the random variables that we call A--Hcorrelation. We believe that these
results have the potential for further applications, and need further study.
2 Notation
A message-psssing distributed network is an undirected graph G = (V, E) where vertices, or
nodes, correspond to processors and edges correspond to bi--Hdirectional communication links.
Each node has its unique ID. The network is synchronous, i.e., computation takes place in a
sequence of rounds; in each round, each node reads messages sent to it by its neighbors in the
graph, does any amount of local computation, and sends messages back to all of its neighbors.
The time complexity of a distributed algorithm, or protocol, is given by the number of rounds
needed to compute a given function
Notice that in this model the cost of sending a message from one vertex to another is
proportional to the shortest path between the two vertices, There is no shared memory and
processors can communicate only by sending messages through the network. Hence, if we want
a protocol to run for t rounds, then each vertex can communicate only with vertices at distance
at most t from it. This is not so in the PRAM model, where the shared memory allows any
two processors to communicate in one unit of time
Given an undirected graph G = (V, E) we denote by A its maximum degree, i.e., the
maximum number of edges incident with any node; by dg we denote the degree of vertex u, and
by 6(u) we denote the set of edges incident with vertex u.
Given a positive integer n, [nj denotes the set (1,2,..., n). Given a matrix M with c
columns and r rows where c < r, the generic entry of M is denoted as Mjk where i denotes
the column number and k the row number. (This is the reverse of the usual definition but
it is more natural when we define the permanent of M.) The permanent of a (possibly non--H
square) matrix M is defined as the natural extension of the permanent of square matrices. Let
= (7r			: [c]			[r], ir is on?to?ne). Then,
perm(M) = ? Mi,?(i) M2,42) ... Mc,ir(c).
irEP
An event A is said to happen with high probability (w.h.p.) if Fr(A) > 1 --H 1/f(n) for some
superpolynomial function f(n) (i.e., ?C = o(f(n)) for all c > 0, and the failure probability of A
is bounded by 1/f(n)).
3
3 The Chernoff-Hoeffding Bounds Extension
In this section we introduce our extension of the Chernoff-Hoeffding bounds which are important
tools used in estimating the tail probabilities of random variables. Given n independentrandom
variables Xi,X2,... , Xn, these bounds are used in deriving an upper bound on the upper tail
probability Pr(X > (1 + E)I), where X = ?t%-i Xi, It = E[X], and E > 0. We extend these
bounds to a certain case of dependency among the Xi `s, which we call A--Hcorrelation. Since the
general definition of A--Hcorrelation is involved, we first introduce this notion in the case where
the Xj's are 0-1 random variables and prove our extension of the Chernoff-Hoeffding bounds.
This special case is simpler to handle and more intuitive but contains all the essential difficulties
of the general case. After the special case is deait with, we will take care of the more general
case.
Before giving our extension of the Chernoff-Hoeffding bounds let us review Chernoff's clas-
sical approach to upper bound the upper tail probability of a random variable X, which is the
sum of independentbinary random variables Xi,X2,... , Xn [4]. Chernoff's basic idea is to use
Markov's inequallty on the random variable etX for an arbitrary, but fixed, t > 0, and minimize
with respect to t. That is, to use the fact that
Pr(X > (1 + E)J) = Pr(etX >
<			E[etXj
--H			e1(1+?)? `
and minimize the last ratio for t > 0 [4]. This is achieved by finding a good upper bound for
the numerator E[etX] by using the fact that X is the sum of independentrandom variables. It
is standard (see, e.g., Raghavan[13]) to use this for showing that in this case,
min E[e?X] < F			e??			(1)
t>o et(l+?)? --H (It,E) = (1 +E)1+?
Hoeffding [5] considers a more general case where X is the sum of n independent and bounded
random variables Xi E [a?, bi], and uses the above approach to show that if E[X] = It? then for
E> 0,
min E[etX]			2j2E2
t>o et(1+?I < G(i,?,aA,t) = exp ??iE[n](bi--Hai)2J (2)
Equations 1 and 2 will be used in our proofs. Henceforth, we refer to these bounds of Chernoff
and Hoeffding as the CH-bounds. If E is a fixed positive quantity no greater than 1 (which will
be true in all our applications), then F(1, E) < e??243. Hence, if It = ?(log1+5 n), then F(It, E) is
the inverse of a superpolynomial function of n, for any fixed ? > 0 (similar considerations apply
to G(It, E, a? ???. This fact makes the CH-bounds a powerful tool for deriving strong performance
guarantees for randomized algorithms and will be used repeatedly in this paper.
4
3.1 0-1 Random Variables
We now present our extension of the CH-bounds to the case of 0-1 A--Hcorrelated random variables.
We begin by defining A--Hcorrelation1 for 0-1 random variables.
Definition 1 Let X1,X2,... , Xn be 0-1 random variables. The Xi's are A--Hcorrelated if for all
nonempty I C [nJ,
Pr AXi=1 <?AflPr(Xi=1).
jEl			jEl
Before proving the extension of the CH-bounds, let us develop some intuition by giving an
example of A--Hcorrelation. Suppose we have A balls that are thrown uniformly and independently
at random into A bins. Let Bj be an indicator random variable that is 1 if bin i is empty and 0
otherwise. These variables are 1--Hcorrelated. To see this, consider a subset J c [A] of bins and
suppose that all these bins are empty. Then, given this information, the probability that some
other bin, say i, remains empty decreases. That is, for all J c [A],
P?Bi=lI?A?jBi=l) = (`?AJ1J1)?
Pr(Bi = 1)
by straightforward induction, this implies that for all nonempty I C [A],
<fIP?Bi=1).
Pr tAEIBi=l			iEI
The CH-bounds say that if X is the sum of n independent 0-1 random variables then Pr(X>
(1 + E)I) < F(?, E). The next theorem shows that if the Xj's are A--Hcorrelated then Pr(X >
(1 + E)I) < A F(j, E). In the statement of the foliowing theorem, ? is an upper bound on
this is sufficient because we are upper bounding the upper tail probability of X.
Theorem 1 Let X be the sum of A-correlated 0-1 random variables, and let E[X] < ?. Then,
Pr(X > (1 + E)I1) < A F(?,E).
P??ooF. As in the classical proof, we start by introducing a positive parameter t and by applying
Markov's inequallty to the variable etx,
Pr(X > (1 + E)I) = Pr(etX > et(l+c)?)
<			E[etX]
--H			et(l+f)?
1This was defined as ?sell--Hweahening with parameter A" in [11].
5
To find a good upper bound on the numerator (we cannot use independence of the Xj's) we
wright down its Taylor expansion, and use the fact that etX is bounded by etn to apply linearity
of expectation to the infinite series:
E[etX] =
oo tkXk
E ?
k=O
oo tkE[XkJ
Focus on the generic term E[xk] of this series. From the fact that X = ? Xi and that the Xj's
are 0-1 random variables it follows that E[XkJ is a linear combination of the form
E(Xk]= 1z???1aIPr tAEIX?=l
for I $ ?, and where all QJ'5 are non-negative (this can be verified by unfolding Xk and by
observing that XtC = Xi for all i and all c $ 0). In order to upper bound the term E[Xk] we
introduce n twin 0-1 random variables ki which have the same distribution as the Xj's except
that they are independent. From the hypothesis of A--Hcorrelation and from the fact the twin
variables are independent it follows that
Hence,
Pr tAEIX?=l			< AllPr(Xi=1)
jEl
= AllPr(Xi=1)
iEI
= A Pr
E[Xk] < A E[kkj.
By upper bounding each term of the series expansion of E[etX] we get
E[etX] < A Ektx].
Notice now that k is the sum of n independent random variables and [X] = E[X] < j. Hence,
by equation 1
Pr(X > (1+ E)?) <
6
E[etX]
A E[etk]
et(1+f)?
A F(?,E).
0
To see all application of this theorem, let us go back to the balls-and-bins experiment, and
suppose we are interested in estimating the number of empty bins after the balls are thrown;
this number is given by the random variable B = ?j Bi. By linearity of expectation and by the
fact that the balls are thrown at random independently of each other,
E[B] = ? E[Bi]
(`--A?
A
e
We already saw that the Bi's are 1--Hcorrelated. Hence, by Theorem 1,
Pr(B > (1 + e)A/e) < F(A/e, e).
Remark. Jain has proved the following lemma [14]: Let a1,a2,... , a? be n random trials (not
necessarily independent) such that the probability that trial a? `succeeds' is bounded above by p
regardless of the outcomes of the other trials. Then if X is the random variable that represents
the number of `successes' in these n trials, and Y is a binomial variable with parameters (n,p),
then: Pr[X > k] < Fr[Y > k], 0 < k < n.
The assumptions of Jain's lemma are strictly stronger than those of i--Hcorrelation. For
instance, in the balls and bins example,
Pr(B?=1I A Bl=o)=AA+?1,,
iE(?--H1]
which, for A > 3, is greater than Pr(B? = 1) ( lie).
3.2 The General Case
In this section we introduce the more general definition of A-correlation and prove the more
general extension of the CH-bounds.
The proof of Theorem 1 is based on the observation that if we can upper bound each term
E[xkj of the Taylor expansion of E[etXj by A E[kk] where k is the sum of independent random
variables, and if E[etk] <B then E(etX] < A B. This motivates the following definition.
Definition 2 Let X1,X2,... , Xn be bounded random variables such that Xj E [a?, bi] and let
X = ?iE(n) Xi. The Xi's are A-correlated if there exists a collection of independent twin random
variables Xj E [ai, bi] such that,
7
. E[X] < E[k], where X = ?iE[n] Xj, and
. for all nonempty I C [n] and positive integers ?i, ..... ., ?III,
E[Exs.i] < A llE[k;'].
jEl			jEl
If we apply this definition to the case of 0-1 random variables we see that to have A--Hcorrelated
0-1 random variables, it suffices to find twin variables Xi such that E[X] < E[kj and, for all
nonempty I C [nJ,
Pr AXi=1 <AllPr(Xi=1)
jEl			jEl
which reduces to definition 1 when Pr(Xi = 1) = Pr(Xj = 1). Notice that for this to hold, it is
not necessary that Pr(X? = 1) = Pr(X? = 1); in typical applications we do not know the exact
value of Pr(X? = 1) but only an approximation (usually an upper bound; we will see examples
in later sections), but this is good enough to apply the CH-bounds extension.
Theorem 2 Let X be the sum of A-correlated random variables Xi,X2,... , Xn, where Xi E
[a?, bi] and let X be the sum of the n twin variables Xi. Then,
Pr(X > (1 +E) E[X]) <A G(i,E,a,b?.
PRooF. Let j = E(X]. As in the proof of Theorem 1, we start with the inequality
Pr(X > (1 + E)I) = Pr(etX >
E[etX]
et(l+c)?
By the hypotheses of boundedness and of A--Hcorrelation it follows that
E[etX] = E
tkxk
tkE[Xk]
tkE[kk]
< A ?
k=O
--H A E[etkj
By the already discussed result of Hoeffding [5], when X is the sum of n independent bounded
random variables Xi E [ai, bi]
o+ E[etk]
m?>i0n et(l+c)?
8
0
In this paper we will use the special case of Theorem 2 where Xj E [0, 1], i E [nj. In this
case, F(i, E) is also an upper bound for the upper tall of X.
Corollary 1 Let X be the sum of n A-corrtlated random variables Xi E [0, 1]. Then,
Fr(X> (1 + E)E[k]) < A F(i, e).
PROoF. Let E(k] = ?. When k is the sum of n independent random variables Xi E [0, 1]
Hoeffding shows that, for t E (0, 1 --H i/n), (cf. Theorem 1 of [5])
Pr k$i>t <(?+Int)?flt(l+n?flj$nt)fl??flt
By setting E = nt/n and by applying the standard approximation 1 + x < e: for x = nt/(n --H
--H nt) we get
Pr(X > (1+E)I) < ((1+$(1+f))' =
4 The Edge Coloring Algorithm
0
We now present our randomized distributed edge coloring algorithm. The algorithm uses an
idea of Karloff & Shmoys to reduce the problem of edge coloring general graphs to that of edge
coloring a special class of bipartite graphs [7].
The Karloff & Shmoys algorithm was proposed for the PRAM model and is as follows. The
input to the algorithm is a graph G = (V, E) of maximum degree A(G) = A.
1.
Compute a random partition of V into black and white vertices (all vertices flip a falr
coin independently and in parallel). Let G[Bj be the subgraph induced by the black
vertices, G[WJ be the subgraph induced by the white vertices, and G[B, W] be the bipartite
subgraph formed by the edges having endpoints of different colors.
2. Optimally edge color G[B, W], i.e., with A(G[B, W]) colors, using the PRAM algorithm
of Lev, Pippenger & Valiant [8].
3. Recurse on G[Bj and G[W] using the same set of fresh new colors on both graphs.
The key fact is that, as long as A is large enough, the maximum degree of the three subgraphs
is at most A/2 + Al/2+f = A/2 + o(A) w.h.p., for any fixed E > 0. (This can be proved by
applying the standard CH-bounds.) Note that the same set of fresh new colors can be used on
9
both G[B] and G[WJ because G[B, W] completely separates the two graphs. The recurrence for
the number of colors used is
C(A) < (?A2 + Al/2+f) + c + Al/2+f)
< A+o(A)
and holds w.h.p. The "high probability" statement holds as long as A = ?(log1+6 n), for any
fixed 6 > 0. In this case, the failure probability given by the CH-bounds is the inverse of a
superpolynomial function. For the case when the degree goes below the ?(log1+6) threshold,
Karloff & Shmoys give a PRAM algorithm to (A + 1)?dge color graphs of this low maximum
degree [7].
The non--Hdistnbuted parts of the Karloff & Shmoys scheme are the subroutine which colors
the bipartite graph optimally (it can be shown that this is impossible in the distributed model
of computation [12]), and the handling of the case A < log1+6 n. The latter can be handled
by Luby's randomized distributed (A + 1)-vertex coloring algorithm [9]. Luby's aigorithm is
simulated on the line graph of the given graph G thus giving a 2A(G) --H 1 edge coloring2. Rence,
if we could get a good distributed algorithm for bipartite edge coloring then we could use the
Karloff & Shmoys scheme to get a good coloring of any graph. We now show how this can be
achieved.
4.1 Distributed Edge Coloring of Bipartite Graphs
We now present a simple Monte Carlo algorithm for edge coloring the special type of bipartite
graphs generated by the Karloff & Shmoys scheme.
Given a bipartite graph G = (A, B, E) we assume that each vertex knows whether it belongs
to A or B. This is an important assumption because such information cannot be computed fast
distributively [12], but it is verified by the bipartite graphs generated by the Karloff & Shmoys
scheme. From now on, we will refer to vertices in A as the top vertices and to the vertices in B
as the bottom vertices.
The algorithm takes O(log n) time and colors a bipartite graph G of maximum degree A with
at most 1.6A+ l6log2+6 n colors w.h.p., for any ? > 0, which is approximately 1.6A+log2 n (the
failure probability will depend on 6). In the algorithm we use a variable Ac that is initialized to
A(G). During any iteration of the aigorithm, Ac is meant to be an upper bound on the degree
of the current graph; we will prove later that this holds w.h.p. Recall that 6(u) denotes the set
of edges incident on vertex n, and let THRESH0LD = log2+5 n.
The algorithm is given two parameters E, 6> 0, and is as follows:
1. PART I: Repeat until Ac <THRESHoLD:
Let Cc be the current graph. Pick a set x of Ac fresh new colors.
2When A(G) = O(polylog(n)) we can compute a --H 1 coloriugs deterministicai1?in O(polylog(n)) time
with an algorithm baeed on the idea of removing maximal matchings. We prefer to use Luby's algorithm here for
conciseness.
10
(Random proposal of bottom vertices) In parallel and independently of the other ver-
tices in B, each vertex v E B assigns a temporary color to each edge in 6(v) with
uniform probability without replacement, i.e. edge e1 is assigned color a E x with
probability 1/Ac, e2 is assigned P E x --H ?aJ with probability 1/(Ac --H 1) and so on.
(Lottery of top vertices) (Comment: The coloring so far is consistent around any
vertex v E B but can be inconsistent around a vertex u E A.) For each u E A, let
Cu(a) be the set of edges in 6(u) with temporary color a. Each vertex u E A selects
a winner uniformly at random in C?(a), for each nonempty C?(a). All other edges,
the losers, are decolored and assigned I.
o+ Set Ac := Ac(1 + ?)/e. Gj, the subgraph of Gc induced by the losers (i.e., by the
I-edges), becomes the new current graph.
2. PART II: Let Gr be the remaining graph. Edge color Gr with 2A(Gr) --H 1 <2 THRESHOLD
colors by executing Luby's vertex coloring algorithm on the line graph of Gr [9].
The key claim is that in every iteration of part (I) above, the maximum degree of the graph
shrinks by a factor of at least (1 + ?)/e w.h.p., as long as A > THRESHOLD. That is
A(Gc)
A(Gi) <(1+ E)			e
with high probability, for any fixed E > 0. The condition Ac > THRESHOLD ensures that the
failure probability given by the extension of the CH-bounds is the inverse of a superpolynomiai
function. The reason for setting THRESHOLD --H log2+6 n will be apparent from the probabilistic
anaiysis.
Once the key claim is established, we can bound both the totai number of colors used and the
running time of the algorithm. To bound the number of colors used observe that if the degree
of the graph shrinks at every iteration by at least a (1 + ?)/e factor w.h.p. then the maximum
degree of Gr is at most log2+6 n w.h.p. Hence, w.h.p., the number of colors used is at most
C(A) <A + A			?+ A +			+
e?' +E) +			?ek(1 E)k 21og2+6n,
where k is the smallest integer such that A(1 + ?)k/ek < log2+6 n. The totai number of colors
used is at most (here, El depends on e and can be made arbitrarily small)
C(A) < (ef 1 +E')A+ (2 ef1			?)log2+6n
<			1.585A+0.4 log2+6n
<			1.59A
when A > 8 log2+6 n. The running time of the aigorithm is O(log n) because part (I) takes
O(log A) time (by the key claim), and part (11), i.e., Luby's aigorithm, takes O(log n) time.
If A > 8 log2+8 n and we use our distributed subroutine for bipartite graphs in the Karloff
& Shmoys aigorithm the totai number of colors is given by the recurrence
11
T(A) <
< 1.6A
c (A2 + Al/2+f) + T (?2 + Al/2+c)
1.59 A + o(A)
If A < 8 log2+5 n, we apply Luby's algorithm directly to get an edge coloring with 2 A --H 1 <
16 ?og2+S n Hence, the total number of colors to color any graph is at most 1.6 A + 16 log2+8 n,
for any 6 > 0, which is approximately 1.6 A + log2 n colors.
The rest of the paper is devoted to establishing the key claim.
5 Probabilistic Analysis
This section is devoted to proving the key claim of the preceding section, namely that given a
graph G and A such that A > A(G) and A > THRESH0LD then, after one iteration of Part (I)
of the bipartite algorithm, the maximum degree of the new graph, A(Gj), is at most (1+ ?)A/e
w.h.p., for any fixed E> 0.
It turns out that the analysis is considerably easier for the top vertices than for the bottom
vertices. We begin with the easy part.
5.1 Analysis of Top Vertices
Let u be a generic top vertex with neighbors Vj, i E [du], and incident edges i = (`L,Vj). Recall
that a loser is an edge which, after having got a tentative color in the random proposal, lost
the lottery and got decolored. So, the new degree of u is given by the number of losers incident
with it.
From the point of view of a top vertex, the random proposal and the lottery are equivalent
to the following random experiment. For each edge i incident on u we introduce a ball i, and for
each color k we introduce a bin k; the assignment of a tentative color to an edge by the algorithm
is equivalent to throwing each ball into one of the A bins independently and uniformly at random,
since the bottom vertices assign tentative colors with uniform probability and independently of
the other bottom vertices. Recalling that we have at most A balls and exactly A bins:
?losers = ?balls --H ?winners
?bins --H ?nonempty bins
?empty bins.
Let X be a random variable denoting the number oflosers. To estimate X and its tall distribution
we will study the random variable B = ?empty bins. For this purpose, we introduce A many
indicator random variables
--H 1 bin i is empty
--H 0 otherwise
12
TOP
e4
e1			e3
BOTTOM
Figure 1: x(ei) 0', x(e2) ?, e1 lost the lottery
and hence, B ZiE[Aj Bi. Notice that X < B always. The variable B was studied in Section 3
where it was shown that E[B] < A/e and that the Bj'5 are 1-correlated, which implies that
Pr(B > (1 + e)A/e) < P(A/e,e). Since E[X] ? E[B] and Pr(X > (1 + e)A/e) ? Pr(B >
(1 + e)A/e) we get
Theorem 3 Let u be a top vertex and X be the random variable denoting the number of losers
incident on it. Then, E[X] < A/e and
Pr(X > (1 + e)A/e) < F(A/e, e).
5.2 Analysis of Bottom Vertices
In this section we analyze what happens to the new degree of a generic bottom vertex VB. This
case is considerably harder to handle than the previous one, because of the way in which the
random variables describing the process are correlated. For a top vertex, the dependency among
the variables was playing for us; given that some edges incident on a top vertex are losers, the
probability of having another loser decreases. For a bottom vertex the situation is reversed:
having some edges losing the lottery might even make the probability of having another loser
increase. The problem can be seen in Figure 1. Suppose we are given that e1 got tentative color
0 and lost the lottery, and that e2 got tentative color ?; we will argue intuitively that given this,
the probability of e2 losing the lottery has increased. Since e1 lost, the probability of e3 getting
tentative color 0 increases, which implies that the probability of e4 getting tentative color p also
increases, and this increases the probability of e2 losing the lottery.
13
For the sake of the analysis we modify the algoritlim as follows: instead of performing all
random proposals in parallel, suppose that the bottom vertices perform their random proposals
sequentially, one after the other. This does not modify the probability distributions because
the choices are still done independently. We want to focus our attention on the last vertex
VB performing the random proposal. We will use the fact that when VB performs its random
proposal, all edges not incident on vB already have a tentative color. By symmetry, any upper
bound on the probabilities we can find for VB will hold for all bottom vertices.
Let i e [dv] denote an edge incident with the bottom vertex VB, with the other endpoint of
i being Uj. We introduce the indicator random variables
-			1			i loses the lottery
--H			0			otherwise
and want to study the expectation and tail probability distribution of X ZiE[dvl Xi. Com-
puting the expectation is easy.
Lemma 1 E[X] ? Ale.
PRooF. Let VB be the bottom vertex. It is sufficient to show that Pr(Xi 1) ? 1/e, for all
i E [dv]. From the analysis of the top vertices, we know that the expected number of losers
incident with Uj is at most A/e and hence that ?jES(u?) Pr(j loses) < A/e. By symmetry,
Pr(j) < 1/e for all j c 6(ni), and hence Pr(Xj 1) < lie.
We now study the tail probability distribution of X. Our goal is to show that X < (1 + e)A/e
w.h.p., for any fixed e > 0. Establishing this claim will take several lemmas.
For technical reasons that will be clear later, we subdivide the edges incident on VB into j&
groups, each of at most ? edges. Let C be any such group and define XG ZiEG X?; we
would like to show that XG is the sum of A--Hcorrelated random variables, so that
VE > 0, Pr(X? > (1 + ()#?/e) ? A F(?/e, E).
Rather than proving this claim, we will prove a weaker one namely that XG is the sum
of A--Hcorrelated random variables with high probability! More precisely, we will define an event
A such that A happens w.h.p., and such that, given that A occurs, then XG is the sum of
A--Hcorrelated random variables. That is,
Pr(X? > (1 + E)VmA/e A) ? A P(?/e, E).
Showing this weaker claim is sufficient because
Pr(X>(l+e)A/e) <
> (1 +
Pr(?G XG > (1 + ?)#A/e A) Pr(A) +
Pr(?C XG > (1 + ?)#?/e AC) Pr(Ac)
< Pr(?C? XG > (1 + e)VmA/e A) + Pr(AC)
MA Pr(Xc> (1 + e)MA/e I A) + Pr(Ac)
MA A P(MA/e, e) + Pr(AC).
14
Another interesting point in the analysis is that A will not be a constant term but an expo-
nentially growing function. However we will be able to show the rate of growth of A is much
slower than that of 1/F(j&/e, e). In other words, we will show that A F(?/e, e) goes to zero
superpolynomially fast.
We now turn to the task of showing that Xc is the sum of A-correlated random variables
w.h.p. Recall from the definition that the Xi's, i E G, are A-correlated if, for all nonempty
I C C,
Pr AXi=1 <?Azji?1P?(Xi=l)
jEl
Consider then a generic subset I c c of edges incident on the bottom vertex VB, and let
us see how to compute Pr (AjEl Xi 1). Recall that we are analyzing the situation where VB
is the last vertex to perform its random proposal. This means that prior to the assignment of
a tentative color to edge i = (v, uj), all other edges incident on Uj already have their tentative
color. Using the balls-and-bins language, we can say that prior to throwing ball ? at random
into one of the bins, all balls coming from the other neighbors of Uj have been thrown. We will
think of i as a red ball and of the other edges at Uj as white balls. Once the red ball is thrown
in, say, bin k E [A], a winner is selected uniformly at random among all (i.e., red and white)
balls in bin k. All other balls, the losers, are discarded. Notice that the probability of discarding
the red ball is itself a random variable which depends on the particular placement of the white
balls prior to throwing the red ball.
Given any placement of white balls at Uj, we construct a vector of probabilities Ci as follows.
Let aik denote the number of white balls in bin k E [A] of vertex Uj, and let Pik = aik/(l +
aik) denote the probabillty that the red ball loses the lottery given that it was thrown in bin
k (equivalently, Pik is the probabillty that edge i loses, given that it got tentative color k).
For each neighbor Uj of our bottom vertex VB, we construct the corresponding vector Ci =
(Pil,Pi2,... ,pi?) Given a set I of neighbors of VB we construct the matrix M1 whose i-th
column is the vector Ci. The next lemma explains why this matrix is relevant to us. From now
on, let p(m,k)= m(m--H 1)...(m--Hk+ 1).
Lemma 2 Let I C G and k = III. Then,
Pr f?Xi=1 =4???)f;)
PRooF. The random proposal of VB is equivalent to choosing an one-to-one function r : I [A]
uniformly at random among the set ? of all such functions. Recall that the entry Mik of M1 is
the probability Pik that edge i loses given that it is given color k. Hence,
Pr(Ai?iXi = 1) = ? Pr(Ai?rXi = 1 iris selected) Pr(iris selected)
7rEP
--H )) (Mlir(l) M2?(2) ... Mk,?(k)) (4 Ak1 A--H?+
perm(Mi)
p(A,k)
15
El
We 110W want to find a good upper bound of perm(Mi). The following lemma gives a sim-
ple upper bound that is sufficient for our purposes. Given a matrix M let S? denote the sum of
the elements of Ci, the i-th column of M
Lemma 3 Let Al [Mik] & a matrix with c columns and r rows, and non--Hnegative entries
Mik. Then, perm(AI) < HiE[cj Si
PRooF. Let P			f7r			: [c]			[r], r is one-to-one]. Then,
perm(M)			? Aii,?(i) M2,?(2) ... Mc,r(c)
?EP
< (M1,1 + ... + Ai1,r) (M2,1 ... . +M2,r)...(Mc,i ... . + Mc,r)
- fiSi.
iE[c]
El
The next lemma relates the value of S? to that of lie > Pr(i loses), i ? ?(v). It is an ap-
plication of the most general definition of A--Hcorrelation. Before the proof of the lemma, we
establlsh
Proposition 1 If 0 < p < 1, q 1 --H p and m is a positive integer, then
PRooF. Let
li			prqrn?r r$I			? _ (1--Hqm+1)
_			p(m + 1)
m
r
prqm?r?+k1
--H			1
Integrating both sides of the binomial expansion
(x + q)m			?rqm?r
between 0 and p, we get
from which the proposition follows.
1--Hqm+l p (1 --H
m+ 1
El
We now return to our scenario where vB is the last bottom vertex to pick tentative colors
for its edges, with its edges having been split into groups. Recall that we are focusing on a par-
ticular group C, and that we want a good upper bound on Pr (AjEl Xi 1), for any nonempty
16
I c c. Lemma 2 gives the upper-bound Pr (AjEIXi = 1) < perm(Mi)/p(A, III), and Lemma 3
gives the upper-bound perm(Mr) < llSi. So, a good upper-bound of S? will hopefully translate
into a good upper-bound for Pr (AjEl Xi = 1). The next lemma says that S? < Ale w.h.p.
Recall that we are analyzing the situation where VB is to give tentative colors for its edges and
that all other edges have alraedy been assigned tentative colors. If we focus on a particular edge
Z = (vB, uj) the situation is equivalent to throwing a red ball i uniformly at random into A bins
where each bin has some number of white balls in it (possibly zero). Notice that the number of
white balls in each bin is itself a random variable.
Lemma 4 Let i = (v, vi) be any edge in I C G and S? be the sum of the elements of Ci, the
i-th column of M1. Then, E[Sj] < Ale = jt and
Ve1 >0, Pr(S? > (1 +q)n) <
PROOF. Let Zk be the random variable denoting the number of white balls in bin k and
Yk = Zk/(Zk + 1) be the random variable denoting the probability that the red ball loses the
lottery given that it ends in bin k. Then, Si = Y = ZkYk. Note that the Yk'5 are bounded
random variables with values in [0, 1]. We will show that E[Y] ? Ale and that the Yk'5 are
1--Hcorrelated (under the most general definition of A--Hcorrelation), which will give our claim.
Firstly, we may assume that we have d = A --H 1 white balls (i.e., the degree of u? is A):
Pr(Y > (1 + q)Ale) is maximized at d = A --H 1, as d varies from 1 to A --H 1. (To see this,
assume d = A --H 1 --H k < A --H 1. Add k yellow balls to the white balls, and run two experiments.
In one experiment throw the white and red balls and compute the probabillty that the red ball
loses the lottery. In the other experiment, throw white, yellow and red balls and again compute
the probability that the red ball loses. In both experiments, let us look at the bin where the
red ball fell. The probabillty that the red ball loses is bl(b + 1) for the first experiment, and
(b + y)/(b + y + 1) for the second, where b and y are, respectively, the number of white and
yellow balls in the bin. Since y >0, bl(b+1) ? (b+y)/(b+y+1). If Yi(d) indicates the variable
Yj when vj has degree d, then Yi(d) < Yj(A) for all i ? [A] and d ? [A].)
First, we will show that, for all i, E[Yj] ? lIe and then we will show that, for any set of k
indices I C [A] and strictly positive integers ?i,
E[HYsi] < tIek.			(3)
iEI
Given this we can apply Corollary 1 by introducing n independent twin 0-1 random variables
Yj such that E[YTh = Pr(Y) = 1) = lIe. Since the Yi's are binary, equation 3 is the same as
E[fIYsi] <fIE[i = HEWiS],
tEl			iEI			iEI
which is to say that the Yi's are 1--Hcorrelated. Noting that 0 < Y < 1 it suffices to show that
E[HYi] < llek			(4)
iEI
17
Without loss of generality we can assume I [k]. We will prove inequality 4 by induction
on k> 1' when k = 1,
E[Y1] =
4			(A1)T$?4)?1T r
_			r+1
(1 --H 1/A)A < lie,
where the second equality follows from Proposition 1. Notice that for all j ? [A], E[Y5] =
E[Y1] < 1/e. When k > 1, the law of conditional probabillties gives
E[ifIE[k] Yi] = E[Y1Y2.. `Yk-1 E[Yk I Y1Y2'??Yk-1]]			(5)
Suppose we show that for all non-zero Cj C [0, 1] with i E [k --H 1],
k--H1			1			(6)
E[Yk			A Yj = c?] < -.
t=1			e
then, since the product YlY2 ```Yk--H1 in equation (5) is zero when any Cj is zero, we see by
induction on k that
k
E[HYi]
i=1
- E[Y1Y2 "Yk-i E[Yk I ....... Yk-1]]
elE[kfilyi]
i=1
? eko+
Hence, the claim follows if we can show that inequality (6) holds,
if a? denotes the number of white balls that fell into bin i, then Cj = a?/(a? + 1). Let
a = zk?1 a- > k --H 1, p = 1/(A --H k + 1), and q = 1 --H p. Then
where
k--H1			k--H1
E[YkIAY%=ei] = E[Y?IAZi=ai]
t=1			i=1
t(r, a),
____1
t(r,a)=			A?;?a			?r???l?a?T			T
r+1
it is easy to check that t(r, a) > t(r, a + 1). As a consequence, the maximum value of
E[YkI A,ffiH Yt- = ci] is attained at a --H k --H 1 in which case we have
18
A--Hi--Ha
? t(r,a) =
r=1
Azkt(r,k 1)
r=i
4
A
q --Hk+i < 1/e,
A--Hk			r
r prqA?k?r ? + ?
by Proposition 1.
Putting all the pieces together (recall that k = Ill):
PT(AiEIXi = 1) =
perm(Mi)
p(A,k)
HiEl5j
p(A,k)
E]
(1+c1)kAk			1
ek p(A,k)
p(A,k) ek'
for any e1 > 0. The			< (1+e1)k Ak			1
first line follows from Lemma 2, the second from Lemma 3, and the third
from Lemma 4 and holds w.h.p. The last line is just a re-writing of the third one. Given c1 > 0,
let A?1 be the event that, for all i ? C, S? < (1 + ei)A/e. Lemma 4 says that A?1 happens
w.h.p., for any e1 > 0. The last line of the above chain of inequalities means that, given that
A?1 occurred, the Xj's are A-correlated with A = (1+ei)kAk/p(A,k). (Alternatively, we can say
that the Xj's are A--Hcorrelated w.h.p.) We now want an estimate of A. The next lemma explains
why we subdivided the edges incident 011 VB into groups.
Lemma 5 For positive integers t and k, tk/p(t, k) < ek2/t, if k ?
PROOF. We first note that ln(1 --H > --H2x for 0 < x ? 1/2. This is true, since if we define
f(x) ln(1 --H x) + 2x, then f(O) = 0 and f'(x) = \ffi2xX? which is non-negative for 0 < x < 1/2.
Now,
p(t,k)			k--Hi (t--H
tk			= fi
t=1
= exp
G(k 7t1)k)
> exp			(since k < t/2)
= exp
k2
> e
19
For t = A and k = I < ?? this lemma implies that Ah/p(A,k) < e and hence, given
A?1, the Xj are A--Hcorrelated with A = (1 + ei)ke < el+?1WA, via the standard approximation
1 + x < ex. Though A goes to infinity superpolynomially fast as A goes to infinity, we can
still upper bound Pr(X? > (1 + e)A/e) by a term that goes to zero superpolynomially fast, as
follows. (Here, ii =
Pr(X? > (1 + e)t) = Pr(X? > (1 + e)jt I Aq) Pr(Aq) + Pr(Xo > (1 + e)jt AECl) Pr(A?)
< Pr(X? > (1 + h? I Aq) + Pr(A?)
< AF(jt,e) + Pr(?GX?>(1+q)?)
? el+??p(?,e) +
< ?l+?? e???'3 +
< ?1+(??J/3)V?A +
Since e1 can be chosen arbitrarily small, we can make the above term go to zero exponentially
fast by choosing e1 such that e1 --H c2/3 < 0.
Remark. We can now see why the parameter THRESHOLD must be Q(log2+? n): the above
failure probability goes to zero superpolynomially fast as long as A = ?(log2+? n), for any fixed
? > 0.
Hence, we have proven that XG < (1 + e)?/e w.h.p., for any fixed e > 0. in turn, this
implies that the new degree of VB after one iteration of the algorithm is at most (1 + e)A/e
w.h.p. To see this,
Pr(X > (1 + e)A/e) < Pr(?C XG > (1 +
MA Pr(X?> (1+ c)MA/e)
MA (e'+(?i??/3)? + MAF(?, ei))
This inequality holds for any e and e1. Hence the new degree of all the vertices is at most
(1 + ?)A/e w.h.p. and we get
Theorem 4 The new degree of the graph after one iteration of Part (I) of the algorithm is at
most (1 + e)A/e w.h.p., for any fixed e> 0.
References
[1] B. Awerbuch, A. V. Goldberg, M. Luby, and 8. A. Plotkin. Network decomposition and
locality in distributed computation. In Proceedings of the lEEF Symposium on Foundations
of Computer Science, pages 364--H369, 1989.
20
[2] B. Berger and J. Rompel. Simulating (logc n)-wise independence in NC. J. Assoc. Comput.
Mach., 38(4):1026--H1046, 1991.
[3] B. Bolloba's. Graph Theory. Springer Verlag, New York, 1979.
[4] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum
of observations. Annals of Mathematical Statistics, 23:493--H509, 1952.
[5] W. Hoeffding. Probability inequalities for sums of bounded random variables. American
Statistical Association Journal, pages 13--H30, 1963.
[6] R. Jain, K. Somalwar, J. Werth, and J. C. Browne. Scheduling parallel i/o operations in
multiple bus systems. Journal of Parallel and Distributed Computing, 16(4):352--H362, 1992.
[7] H. J. Karloff and D. B. Shmoys. Efficient parallel algorithms for edge coloring problems.
Journal of Algorithms, 8:39--H52, 1987.
[8] G. F. Lev, N. Pippenger, and L. G. Valiant. A fast parallel algorithm for routing in
permutation networks. lEFE Transactions on Computers, 30:93--H100, 1981.
[9]
M. Luby. Removing randomness in parallel computation without a processor penalty. In
Proceedings of the IEEE Symposium on Foundations of Computer Science, pages 162--H173,
1988.
[10] R. Motwani, J. Naor, and M. Naor. The probabilistic method yields deterministic parallel
algorithms. In Proceedings of the IEEE Symposium on Foundations of Computer Science,
pages 8--H13,1989.
[11] A. Panconesi and A. Srinivasan. Fast randomized algorithms for distributed edge coloring.
In Proceedings of the ACAI Symposium on Principles of Distributed Computing, pages 251--H
262, 1992.
[12] A. Panconesi and A. Srinivasan. Improved distributed algorithms for coloring and network
decomposition problems. In Proceedings of the ACM Symposium on Theory of Computing,
pages 581--H592, 1992.
[13]
P. Raghavan. Lecture notes on randomized algorithms. Technical Report RC 15340
(#68237), IBM T.J.Watson Research Center, January 1990. Also available as CS661 Lec-
ture Notes, Technical report YALE/DCS/RR-757, Department of Computer Science, Yale
University, January 1990.
[14] R. Raman. The power of Collision: Randomized parallel algorithms for chaining and integer
sorting. In Proceedings, 10th Annual FST ? TCS Conference, Lecture Notes in Computer
Science # 472, pages 161--H175. Springer-Verlag, Berlin, December 1990. Also available as
University of Rochester CS Dept. TR 336, March 1990 (Revised January 1991).
21
