BIB-VERSION:: CS-TR-v2.0
ID:: CORNELLCS//TR93-1378
ENTRY:: 1993-10-14
ORGANIZATION:: Cornell University, Computer Science Department
LANGUAGE:: English
TITLE:: Techniques for Probabilistic Analysis and 
        Randomness-Efficient Computation
AUTHOR:: Srinivasan, Aravind
DATE:: August 1993
PAGES:: 117
COPYRIGHT:: Aravind Srinivasan 1993 - All Rights Reserved
ABSTRACT::
Randomness is well-recognized as an important computational resource in 
theoretical computer science. Not only are there classical problems for which 
the only known ``efficient'' solutions are randomized, but there are problems 
for which randomness can be used to circumvent deterministic lower bounds. 
However, standard tools available for probabilistic analysis such as the 
Chernoff-Hoeffding bounds are not sufficiently powerful in all situations; 
second, since there are no known ``true'' sources of randomness, there is a 
need to develop general techniques for reducing/removing randomness from 
randomized algorithms. This thesis addresses these two issues.

Certain types of scheduling and resource allocation problems in a distributed 
setting can be modeled as edge coloring problems. In Chapter 2, we present a 
fast and simple randomized algorithm for edge coloring a graph, in the 
synchronous distributed point-to-point model of computation. Our algorithm 
computes an edge-coloring of a graph $G$ with $n$ nodes and maximum degree 
$\Delta$ with at most $(1.6 + \epsilon) \Delta + O(\log^{2+\delta} n)$ colors 
with high probability, for any fixed $\epsilon, \delta, > 0$. To analyze our 
algorithm, we introduce techniques for proving upper bounds on the tails of 
random variables, extending the Chernoff-Hoeffding (CH) bounds to some types 
of dependent random variables. This is joint work with A. Panconesi [91]. 

In Chapter 3, we show that the CH bounds for the tails of the sums of bounded 
and independent random variables $X_{1}, \ldots, X_{n}$ require only 
limited independence among the $X_{i}$s. We show that if 
$X_{1}, \ldots, X_{n}$ lie in [0,1] and are $k$-wise independent, then 
$Pr(X \geq E[\sum_{i}X_{i}](1 + \delta))$ can be upper bounded by the CH 
bound, if $k \geq \mu \cdot min\{\delta,\delta^{2}\}$. This leads to 
algorithms that require a reduced amount of randomness for any analysis which 
uses the CH bounds. This is joint work with J.P. Schmidt and A. Siegel [108].

In Chapter 4, we present an $RNC^{2}$ algorithm for the perfect matching 
problem which uses $O(\log Z)$ random bits where $Z$ is any given upper bound 
on the number of perfect matchings in the graph, generalizing results of 
Grigoriev and Karpinski. Underlying our algorithm is a randomness-optimal 
generalization of the Isolating Lemma of Mulmuley, Vazirani \& Vazirani, 
which also leads to other applications. This is joint work with S. Chari and 
P. Rohatgi [26].
END:: CORNELLCS//TR93-1378
BODY::
Techniques for Probabilistic Analysis and
Randomness-Efficient Computation
Aravind Srinivasan
Ph.D Thesis
93-1378
August 1993
Department of Computer Science
Cornell University
Ithaca NY 14853-7501
TECHNIQUES FOR PROBABILISTIC ANALYSIS AND
RANDOMNESS-EFFICIENT COMPUTATION
A Dissertation
Presented to the Faculty of the Graduate School
of Cornell University
in Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
by
Aravind Srinivasan
August 1993
o Aravind Srinivasan 1993
ALL RIGHTS RESERVED
TECHNIQUES FOR PROBABILISTIC ANALYSIS AND
RANDOMNESS-EFFICIENT COMPUTATION
Aravind Srinivasan, Ph.D.
Cornell University 1993
Randomness is well-recognized as an important computational resource in theo-
retical computer science. Not only are there classical problems for which the only
known "efficient" solutions are randomized, but there are problems for which random-
ness can be used to circumvent deterministic lower bounds. However, standard tools
available for probabilistic analysis such as the Chernoff-Hoeffding bounds are not suf-
ficiently powerful in all situations; second, since there are no known `true" sources
of randomness, there is a need to develop general techniques for reducing/removing
randomness from randomized algorithms. This thesis addresses these two issues.
Certain types of scheduling and resource allocation problems in a distributed
setting can be modeled as edge coloring problems. In Chapter 2, we present a
fast and simple randomized algorithm for edge coloring a graph, in the synchronous
distributed point--Hto--Hpoint model of computation. Our algorithm computes an edge-
coloring of a graph 0 with n nodes and maximum degree ? with at most (1.6 +
+ O(log2+6 n) colors with high probability, for any fixed ?, > 0. To analyze
our algorithm, we introduce techniques for proving upper bounds on the tails of
random variables, extending the Chernoff?Hoeffding (CH) bounds to some types of
dependent random variables. This is joint work with A. Panconesi [911.
In Chapter 3, we show that the CH bounds for the tails of the sums of bounded
and independent random variables X1?n require only limited independence
among the Xis. We show that if x1,. .. , Xn lie in [0,1] and are k-wise indepen-
dent, then Pr(X > E[?j Xi](l + 6)) can be upper bounded by the CH bound. if
k > min[6, 621. This leads to algoritlims that require a reduced amount of ran-
domness for any analysis which uses the CH bounds. This is joint work with J. P.
Schmidt and A. Siegel [108].
In Chapter 4, we present an RNC2 algoritlim for the perfect matching problem
which uses O(log Z) random bits where Z is any given upper bound on the number
of perfect matchings in the graph, generalizing results of Grigoriev & Karpinski.
Underlying our algorithm is a randomness?ptimal generalization of the Isolating
Lemma of Mulmuley, Vazirani & Vazirani, which also leads to other applications.
This is joint work with 5. Chari and P. R?ohatgi [26].
Biographical Sketch
Aravind was born to Srimati. Kalyani Srinivasan and Shri. P. R. Srinivasan on the
20th of August, 1968, in Madras, India. After studying at M. Ct. M. Chidambaram
Chettiar Memorial Preparatory School, Sindhi Model Senior Secondary School, and
M. Ct. M. Higher Secondary School, all in Madras, he joined the Indian Institute
of Technology, Madras, in July 1985. He graduated from lIT with a Bachelor of
Technology degree in Computer Science in August 1989, and joined Cornell in Fall
1989. He received a Master of Science degree in Computer Science from Cornell
in January 1993. He spent Spring 1993 at Princeton University as an exchange
student. He has accepted a postdoctoral fellowship offered jointly by the Institute
for Advanced Study and the Center for Discrete Mathematics and Computer Sdeuce
(DIMACS).
iii
A small step for a man, and
An even smaller step for mankind.
--H Alessandro Panconesi, circa 1991
`v
Acknowledgements
It is with immense love that I offer my heartfelt thanks to my parents- Srimati.
Kalyani Srinivasan (Amma) and Shri. P. R. Srinivasan (Appa), and to my Guru, Shri.
Annapurnamba Sahita Shri. Amritananda Natha Saraswati. My parents showered
me with love and made several sacrifices to ensure that I get the best education
possible and over the past four years, Guruji has been a fount of unconditional love
and grace during many a period of confusion.
The influence of my Guru at Cornell, my advisor David Shmoys, has been
tremendous- in addition to his sound technical insights and inputs, he was ever-
supportive. It is difficult to reconcile his young age with his wisdom in guiding a
student- when we started with him, he would guide us every week over even small
points, while later on, he gave us full freedom to pursue whatever we liked. Many of
the problems and approaches considered here became clear due to discussions with
him, with his intuition and wide knowledge helping a lot. He has been my model of
a complete academic, in the accent he places on teaching, and in his superb teach-
ing, writing, and presentation skills. He has been a great influence with his style
of presentation, and indeed has spent so much time in teaching me this skill. He
never scrimped with wise advice, and was always generous with his time. I truly feel
v
indebted to David.
I have had fruitful discuss'ons with Eva Tardos- in particular some important
ideas used in Chapter 2 were suggested by her. Dexter Kozen and Sam Toneg were
sources of suggestions on problems, encouragement and advice. I thank Dexter, Eva
and Sam for serving on my committee. Bob Constable and Juris Hartmanis were
always encouraging, and despite being senior professors with many responsibilities,
were ever-willing to offer their kind advice; thank you! Thanks also to Paul Pedersen
for his constant encouragement and for his infectious enthusiasm for mathematics.
His knowledge of mathematics and spontaneity have been a big inspiration.
The work reported in Chapter 2 is joint work with Alessandro Panconesi, that of
Chapter 3 is joint work with Jeanette Schmidt and Alan Siegel, and Chapter 4 is joint
work with Suresh Chari and Pankaj Rohatgi. My sincere thanks to my co-authors
for all the discussions and for generously sharing their ideas with me. I also thank
Joseph Cheriyan, Devdatt Dubhashi, Radhakrishnan Jagadeesan, Prasad Jayanti,
Steve Mitchefl, Sendhil Mullainathan, Moni Naor, David Pearson, Desh Ranjan,
Sridhar Sundaram, Steve Vavasis, and Vijay Vazirani for valuable discussions.
This work has been supported in parts by NSF PYl award CCR-S9--H96272 with
matching support from UPS and Sun Microsystems, by the U. 5. Army Research
Office through the Mathematical Sciences Institute of Cornell University, and by an
IBM Graduate Fellowship. I am grateful to these sources for their support.
I thank the faculty, staff, and students of the Cornell Computer Science depart-
ment for maintaining such a friendly environment. Dexter Kozen and Sam Toueg
were nothing but supportive in their roles as Graduate Field Representative, while
in the midst of busy academic careers. Jan Batzer could be counted on to make
vi
every bureaucratic hassle vanish, while always maintaining a warm and sweet smile.
The department offered me Teaching Assistantships for six semesters and a summer,
and I have learnt a lot from this experience; thanks to Bob Constable, Bruce Don-
ald, Doug Howe, Keshav Pingali, Amitabh Shah and Kay Wagn&, for whom I have
been a TA. My office-mates Prasad Jayanti, Shyam Kapur and Adam Webber were
models of serenity, whenever I panicked over trivia. I am also thankful to Cornell
University for offering such a nice environment for study, and to the staff of the
Cornell International Students and Scholars Office for helping smoothen the path for
international students and scholars in so many ways.
I spent Spring 1993 at Princeton as an exchange student; I would like to thank
Princeton University in general, and the DINIACS Center, Bob Sedgewick and Bob
Tarjan in particular, for making this possible. I was fortunate to take a wonderful
course on "Communication and Computation" taught by Andy Yao at Princeton.
My heartfelt thanks to Sharon Rodgers, for being so kind and helpful at Princeton.
In addition to my parents, my entire family has been so loving and supportive of
me. Right from childhood, my sisters Smt. Maithreyi Sunderrajan (Maith Akka) and
Smt. Sowmya Sridharan (Chamakka), and my brother Prof. P. 5. Giridharan (Gin
Anna) have tirelessly impressed upon me the value of education; I cannot adequately
measure their affection for me. Though he would never accept it, my brother has
always been a major source of inspiration for me- in his academic excellence and
diligence when I was young, and in the way he conducts his life, now. Above all, he
was the one who introduced me to Guruji. My sincere thanks are also due to my
brothers-in-law Shri. C. P. Sunderrajan (Sunder Athimbare) and Shri. P. Sridharan
(Sridhar Athimbare), and to my sister-in-law Smt. Asha Giridharan (Asha) for their
vii
encouragement.
There have been many who have been sources of good fun, and have offered solace
at times of confusion and depression. In particular, I would like to thank
Prasad Jayanti, for the discussions on India, movies and much more, and Shanta
and him for kindly putting me up in their house when I was down with pneumonia
during one of Ithaca's cold and desolate winter breaks and for always keeping a warm
family atmosphere open for me;
Alessandro Panconesi, whose spontaneity, wit, and several hilarious e-mail messages
made joint work delightful, and who offered me a broader picture of life and never
allowed me to take myself seriously;
Pal and Sridhar, for the continued friendship since our undergrad days and for the
good times;
Alok Baveja, for being a kind friend whose frank criticism I have always benefited
from and who is a source of inspiration in many ways;
Nlike Kalantar the Ever-helpful, for tirelessly inviting me to every BTha"fcelebration
in town and for teaching me the basics of the Baha"ffaith;
Dr. Thangavelu for re-kindling my interest in Thamizh and for introducing me to
many authors, and to Smt. Uma and him for keeping their house always open to
me, and
Venky, for the memorable long walks in scenic Ithaca, our friendship, and for his
timely help in various matters.
I must have certainly forgotten to thank here the many other people who have
influenced me, but my gratitude for them will certainly remain in my heart.
viii
Table of Contents
Introduction
1.1 Related ?Vork
1.2 Contributions and organization of the thesis
2 An Extension of the Chernoff-Hoeffding Bounds with an Applica-
tion to			Distributed Edge Coloring
2.1			Notation			.
2.2			The Chernoff-Hoeffding Bounds Extension . 			. .			. .
2.2.1			Binary Random Variables
2.2.2			The General Case . . . . . . . 			. .			.			.
2.3			The Edge Coloring Algorithm
2.3.1			Distributed Edge Coloring of Bipartite			Graphs			.			.
2.4			Probabilistic Analysis
2.4.1			Analysis of Top Vertices . . . . . 			. .			. .
2.4.2			Analysis of Bottom Vertices
Chernoff--HHoeffding Bounds for Applications with Limited Indepen-
3
dence
3.1
3.2
3.3
The Basic Method, and Applications to Tail Probabilities
3.1.1 Estimating tail probabilities of binary random variables
3.1.2 Tail probabilities of bounded random variables
3.1.3			Redirecting the method. . .			. .
3.1.4 Probability Bounds for Exactly r Successes under Limited In-
dependence
3.1.5 How close to optimal are our results? .
3.1.6 Upper Tail Bounds for some other Distributions
Applications to Computation. . .
3.2.1 Reduced randomness for randomized algorithms . . .
3.2.2 The New Formulation and the Method of Conditional Proba-
bilities. . . . . . .
Conclusions and Open Problems
3
8
11
15
15
17
20
22
24
26
26
28
39
43
44
49
51
58
61
63
67
67
72
79
ix
4 Randomness-Optimal Unique Element Isolation, with Applications
to Perfect Matching and Related Problems
4.1			The Generalized Isolating Lemma
4.1.1			The New Isolation Scheme
4.1.2			Lower bound for the Isolation problem			. . .			.
4.2			Applications to Matching Problems
4.2.1			Algorithms for matching . . . . .			.
4.2.2			Graphs with a polynomial number of perfect matchings
4.2.3			Algorithms with no information on Mat(G)i . .			.
4.3			Other			applications
4.3.1			Randomized reductions from SAT to USAT . .			.
4.3.2			Improved parallel randomness complexity for problems on ma-
troids. . .			.
Randomness??fflcient hash functions
Conclusions and open problems . .			.
4.3.3
4.3.4
5 Conclusions and Open Problems
Bibliography
x
81
S4
S5
90
92
92
95
97
98
loo
t02
104
`05
107
Chapter 1
Introduction
Randomness is well-recognized as an important computational resource in theoretical
computer science now. Not only are there classical problems such as primality testing
for which the only known "efficient" solutions are randomized, but there are prob-
lems for which randomness can be used to circumvent deterministic lower bounds
e.9., approximating the volume of a convex body. The theoretical advances are also
slowly being realized in practice, e.g., the CM-5 parallel computer uses randomized
routing for communication among its processors. However, standard tools available
for probabilistic analysis such as the Chernoff--HHoeffding bounds are not sufficiently
powerful in all situations; second, since there are no known "true" sources of random-
ness, there is a need to develop general techniques for reducing/removing randomness
from randomized algorithms. This thesis addresses these two issues.
There are at least three striking advantages offered by randomness, i.e., the abil-
ity to "flip a fair coin" any number of times independently, to computation. First,
there are classical problems such as primality testing and other number-theoretic
problems which are basic, for instance, to cryptographic protocols, for which the
only known polynomial-time algorithms are randomized (see, for example, Solovay
& Strassen [116], Rabin [98], and Adleman & Huang [2]). Furthermore, there are
classical problems for which the best known deterministic algorithms start with a
2
randomized construction and then derandomize it, e.g., testing if an undirected graph
is connected, within very limited space bounds (Nisan [87,88], Nisan, Szemer6di &
Wigderson [89]). Third and most strikingly, there are several areas in which random-
ization can be used to circumvent lower bounds for deterministic algorithms- fault-
tolerant Byzantine agreement in asynchronous distributed systems, packet routing in
interconnection networks, communication complexity theory etc. A recent important
illustration of this is as follows. Given a convex body in n dimensions presented in a
certain natural form, one can approximate its volume only to within an exponential
factor of n, deterministically in time polynomial in n (Elekes [37], Barany & F?redi
[13]); however, given any ? > 0, Dyer, Frieze & Kannan show how to approximate
this quantity to within ? in time polynomial in n and 1/?, via randomization [36].
The above reasons, combined with the general simplicity of, and the interesting
mathematical techniques brought forth by, randomized algorithms, make random-
ized algorithms an interesting topic of study. See Gill [42] for a complexity-theoretic
grounding for randomized algorithms, Kozen [66] for a proposed semantics for ran-
domized algorithms, and Karp [58], Raghavan [101] and Tompa [125] for some recent
surveys of issues in randomized computation.
Two orthogonal issues in randomized computation are studied in this thesis. On
the one hand, standard tools available for probabilistic analysis such as the Chernoff-
Hoeffding bounds [27,50], are not sufficiently powerful in all situations; more tools
and techniques are needed to design and analyze efficient randomized algorithms. In
particular, one can say informally that while the behavior of independent random
variables is fairly well understood, there is a need to develop more tools for analyzing
the behavior of random variables displaying various types of correlation. On the
other hand, there are no known "perfect" sources of randomness, i.e., those that
can output any number of unbiased, and more importantly, independent random
bits; thus, we need to develop general techniques for reducing and if possible, even
removing the randomness in randomized algorithms, without significantly increasing
3
their complexity. To quote Karp [5S],
It is clear that random bits, if they can be produced at all, will be slow and
costly to generate. For this reason, there has been considerable interest
in reducing the number of truly random bits that algorithms require or
else showing that imperfect sources of randomness are adequate.
We emphasize, however, that in practice, randomized algorithms seem to work quite
well, even with the linear congruential generator as their source of randomness. The
main motivations for studying randomnessefficient computation under this reality
are to have a theoretically sound basis for randomized algorithms, and to make some
advances towards settling the complexity-theoretic question of dassifying problems
according to the amount of randomness they need, to be solved efficiently. Fur-
thermore, such research opens up the possibility of deriving efficient delerministic
algorithms from their probabilistic counterparts, if the randomness requirement is
made small enough.
Thus, this thesis investigates these two issues of developing new techniques for
analyzing randomized algorithms, and inventing general methods to reduce/remove
the randomness from randomized algorithms.
1.1 Related Work
We now give a quick and informal, far from comprehensive, survey of some important
existing tools for probabilistic analysis and randomness-efficient computation. For
the purposes of this thesis, we are interested in tools for probabilistic analysis of a cer-
tain type--Hupper bounds on the probabilities of deviation of certain random variables
from their mean. The Chernoff--HHoeffding bounds are basic tools used in bound-
ing the tail probabilities of the sums of bounded and independent random variables.
They are commonly used in randomized algorithms and in derandomization to upper
bound Pr(IX--HiI > a), where X = ?:ffiiXi and ? = E[X], withX1,X2
4
being bounded and independent random variables. An informal scenario where they
might typically be used is where X denotes the (random) value taken on by a compu-
tational resource, say time, in a randomized algorithm, and the algorithm performs
`?well" when X is "close to" its expected value ?; such bounds show that with high
probability, the algorithm indeed performs well. In the case of martingales, i.e.,
sequences of random variables X0,X1,... such that Vi > 0 E[Xi+i Xi] = Xi ideas
from the Chernoff-Hoeffding bounds have been used in Azuma's inequality to upper
bound Pr(Xj > A?), when X0 =--H 0 and IXi+i --H Xi < c for some constant c (see
Alon, Spencer & Erdo's [8]). See Chapters 5-8 of [8] for some other useful probability
inequalities for correlated random variables.
The following are some known general approaches towards randomness-efficient
computation.
(i) Pseudorandomness. A general method for reducing randomness is the theorv
of pseudorandom number generators (PSkGs) initiated by Blum & Micali [18] and
Yao [139]. A PSRG is a sequence of functions [g??, g? : fO,1?? H
which deterministicalty "stretch" a given seed to a much longer seed; p(n) could be
n2, for instance. A PSRG is such that no polynomial time test can ?distinguish
between" 9n(X) and y, where x is a truly random string of length n and y is a
truly random string of length p(n), even with the help of true randomness. Thus,
a polynomial-time implementable PSRG can be used to safely run a randomized
algorithm which requires p(n) random bits, using just n random bits. An intimate
connection- equivalence- between PSRGs and one-way functions has been found (Im-
pagliazzo, Levin & Luby [52], Hastad [49]); since the existence of one-way functions is
only a conjecture so far, there are no known provable PSRGs. This idea has inspired
the next method, which is more limited in purpose but which does not require any
complexity-theoretic assumptions.
(ii) Using available random number generators. Bach pioneered the idea of running
randomized algorithms judiciously by taking a very short truly random seed, and
then stretching it using standard random number generators such as the linear con-
gruential generator to still guarantee the performance of these algorithms [12]; his
main focus was on certain number-theoretic algorithms. Karloff & kaghavan have
used this idea to judiciously run a randomized version of Quicksort, routing algo-
rithms on the hypercube etc. [56], and Mulmuley has recently used it for certain
geometric algorithms [82].
Given a randomized algorithm which requires n random bits, suppose one could ana-
lyze it by some method which only requires these bits to have some weaker stochastic
property than that of being unbiased and independent (which clearly requires n ran-
dom bits). One can then hope for some savings in randomness, by implementing the
weaker property. The following two approaches exploit this; see also Schulman [111]
and Koller & Megiddo [65] for related work.
(iii) Limited independence. Random variables X1, X2?? are k-wise independent
if every k (or fewer) of them are mutually independent. Note, for instance, that if
a random variable X is such that X = Zi fi(X1,X2??) where each fj is a
function of at most k of X1,X2?n, then E)?] is the same if x1, x2,.. . ,X? are
mutually independent or are k-wise independent; thus, if X denotes the running time
of a randomized algorithm, then the expected running time will be the same under
these two situations and hence, we just require k--Hwise independent Xi, ....... ,
A very important property of such random variables is that they can be generated
from O(k log n) truly random bits, when, for instance, each Xi is an unbiased ran-
dom bit (Joffe [54], Carter & Wegman [24]). So, if k is a constant, then the sample
space is polynomial-sized in n (n0(k)), and thus we can eliminate randomness via
searching all the points in the sample space, incurring just a polynomial overhead in
6
time. This is a very powerful, oft-used technique. It was used implicitly by Narp &
?Vigderson in their breakthrough result that computing a maximal independent set
in a graph is in AC [62], and was made explicit later by Luby [75].
(iv) Biased random variables. A sample space -? for n--Hbit vectors was defined to be
k--Hwise e?biased by Naor & Naor [S5] (see also Vazirani [132]) if
vsc ?1,2,...,nJ,1 <181 <?k,lPr(?xi=1)--HPr(?i=O)l <e
i?S			jES
where ? denotes the XOR? function and Xj denotes the ith bit of an n-bit string
x picked uniformly at random from X. One important property of such a sample
space is that v?e= 1,2,...,k,V[?1,?2,...q) c [l,2,...,n1,V6ib?...be Ei [0,1)',
1Pr(xi1 = b1,xj2 = b2,... ,Xie = 6,) --H 1
Constructions of k--Hwise e?biased sources of size poly(k, log n, -?) were presented in
[85]. Such sample spaces have been shown to have several applications to explicit
constructions and to derandomization, mainly since probabilistic analyses may be ex-
pected to be robust under small perturbations of the probabilities. These results have
been extended by Azar, Motwani & Naor [11], Alon, Goldreich, Hastad & Peralta [6],
Even, Goldreich, Luby, Nisan & Velickovic' [41] and Chari, R?ohatgi & Srinivasan [25].
(v) Deterministic amplification. Given a randomized algorithm A which uses r ran-
dom bits and succeeds with probability p, one can easily boost its success probability
to q > p, by just running it log?1?p)(l --Hq) times independently (assuming that we can
detect the algorithm's "success'). However, this requires r 10g(i--Hp)(1 --H q) random
bits; can we do better? Karp, Pippenger & Sipser [59] and Sipser [115] reduce this
problem to that of the explicit construction of certain types of graphs (dispersers
and expanders). Ajtai, Komlos & Szemere'di show how to do this via random walks
on expanders [3], a connection further explored by Cohen & Wigderson [31] and
9
Pr(X > ij(l + 6)) can be upper bounded by the Chernoff-Hoeffding upper bound,
even if k > 6. Additional methods are also presented, and the aggregate results
are very sharp and provide a better understanding of the proof techniques behind
these bounds. They also yield improved bounds for various tail probability distri-
butions and enable improved approximation algorithms for jobshop scheduling. The
"limited independence" result implies that weaker sources of randomness are sufh-
cient for randomized algorithms whose analyses use the Chernoff-Hoeffding bounds;
further, it leads to algorithms that require a reduced amount of randomness for any
analysis which uses the Chernoff-Hoeffding bounds, e.g., the analysis of randornized
algorithms for random sampling and oblivious packet routing. This is joint work
with J. P. Schmidt and A. Siegel [1081.
In Chapter 4, we study the parallel randomness complexity of perfect matching
and related problems. We present a randomness?efficient RiVC2 algorithm for the
perfect matching problem which uses O(log 7 + log n) random bits where 7 is any
given upper bound on the number of perfect matchings in the graph. This improves
and generalizes the results of Grigoriev & Karpinski [47] who present an iVC3 al-
gorithm when 7 is polynomially bounded. In the worst case the algorithm uses
O(n log(m/n)) random bits, improving on the previously best-known O(m log n)
randomness complexity of Mulmuley, Vazirani & Vazirani. When no such upper
bound is known, this gives an PNC3 algorithm to find a perfect matching which
uses at most O(log M + log n) random bits with high probability, provided that Al,
the number of perfect matchings, is nonzero. See Karp & kamachandran [60] for
a study of the complexity class JVC. The core of our algorithms is a randomness-
optimal generalization of the Isolating Lemma of Mulmuley, Vazirani & Vazirani
[83]. Given a set S and an unknown family ? C 2? with i? < 7, we outline
a scheme to assign polynomially bounded weights to the elements of S using only
O(log 7 + log ISI) random bits, such that the minimum weight set in ? is unique
with high probability. We also prove a matching lower bound for the randomness
7
Impagliazzo & Zuckerman [53], who show, when p is at least some constant, how to
do this using O(r + l0g?i?p?(l --H q)) random bits. Nisan shows. using very different
techniques, how this can be approximated very well, using O(rlog(log?1?p)(1 --H
random bits [87].
The final technique we sketch is of a very different flavor, and is an important
tool for derandomization.
(vi) The method of conditional probabilities. This method is implicit in the work of
Erd6s & Selfridge [40], and has been systematized by Raghavan [100] and Spencer
[117]. Given a joint distribution D on n (say, binary) random variables X1
such that
ED[f(Xl,X2??)] <a,
for some function f and some constant a, call a point i = (xi, 2xn) E [0. 1 1?
good if f(i) < a, and suppose we want to find any good point J = (?i g2
Clearly, some good point exists; how to find it efficiently? The following is one such
method.
for i:= 1 to n do
if
EDW(X) Xj = 0,Affi21(Xj =gj)] < ED[f(X) Xj = l,Affiffi1(Xj =9j)]
then g? := 0
else g? := 1.
The key claim is that
i+1
ED[f(X) A (Xj = 95)] < ED[f(X) A(xj=gi)],i=0,... , n --H 1,
5='			5='
which implies that
8
?.e, that g is good. Thus, if we can compute the above conditional probabilities
efficiently, we can zoom in on a good point fast. Typically, we cannot compute these
conditional probabilities exactly in an efficient manner, but Raghavan shows how
certain approximations to them (which he calls pessimistic estimators) are sufficient
[100].
1.2 Contributions and organization of the thesis
Certain types of routing, scheduling and resource allocation problems in a distributed
setting can be modeled as edge coloring problems; an edge coloring of a graph is an
assignment of colors to its edges such that no two edges incident at the same vertex
get the same color. In Chapter 2, we present a fast and simple randomized algoritlim
for edge coloring a graph, in the synchronous distributed point-to--Hpoint model of
computation. Our algorithm computes an edge-coloring of a graph 0 with n nodes
and maximum degree A' with at most (1.6 + e)A + O(log2+6 n) colors with high
probability (arbitrarily close to 1), for any fixed e, ? > 0. To analyze the performance
of our algorithm, we introduce new techniques for proving upper bounds on the tail
probabilities of certain random variables. Our results extend the Chernoff--HHoeffding
bounds to certain types of random variables which are not stochastically independent.
This is joint work with A. Panconesi [91]. This technique has been recently used by
us [118] to improve certain known results on the distribution of the number of prime
factors of integers (Hardy & ?amanujan [103], Erdo's & Kac [39], Turan [126]), and
has also served in part as motivation for the material presented in Chapter 3.
In Chapter 3, we present a simple technique which gives slightly better bounds
than the Ch&noff?Hoeffding bounds for the tail probabilities of the sums of bounded
and independent random variables, and which more importantly requires only limited
independence among the random variables, thereby importing a variety of standard
results to the case of limited independence `for free". In particular, we show that
if x1,x2,... , Xn are bounded between 0 and 1 and are k-wise independent, then
10
complexity of this problem.
Due to the wide applicability of the Isolating Lemma, this generalization yields
randomness?efflcient RA?C algorithms for several other problems such as variants
on matching and basic problems on linearly representable matroids such as matroid
intersection and matroid matching. This also gives a randomnesseflicient version of
the Valiant-Vazirani reduction from SAT to USAT [131] which yields new proofs
that FewP C ?P (Cai & Hemachandra [22]) and FewP ? C?P (K5bler, Sch5ning,
Toda & Toran [64]). This is joint work with 5. Chari and P. Rohatgi [26]
Chapter 5 concludes and presents some open problems.
We have attempted to keep each chapter self-contained. For the reader in a hurry,
we recommend Sections 2.2.1, 2.3, 3.1.1 and 4.1 to collect the essence of this work.
Chapter 2
An Extension of the
Chernoff-Hoeffding Bounds with
an Application to Distributed
Edge Coloring
An important limitation for a distributed network without global memory in coniput-
ing any function is that for an efficient algorithm, each processor can communicate
only with those processors that are within a small radius around it. Niodels of parallel
computation like the PR?AM abstract this problem of locality away by assuming the
existence of a global shared memory with fast concurrent access. We are interested
in studying how fast individual processors can compute their portion of the output
in a message--Hpassing distributed system, with such ?local" information alone. The
model we study in this chapter is the synchronous distributed point-to--Hpoint model,
in which the processors are arranged as the vertices of an n-vertex graph G = (V, E),
and where all communication is via the edges of 0 alone. This chapter describes joint
work with A. Panconesi [91].
An edge coloring of a graph is an assignment of colors to its edges such that
11
12
no two edges incident at the same vertex get the same color. We present a fast
and simple randomized distributed algoritlim to edge color C with at most (1.6 t
E)A t O(log2+6 n) colors with high probability for any fixed ?. 6 > 0. where A is the
maximum degree of the vertices of C.
One main contribution of this chapter is the tools developed to analyze our al-
gorithm. The Chernoff-FIoeffding (henceforth CII) bounds [27.50] are ffindamental
tools used in bounding the tail probabilities of the sum of bounded and indepen-
dent random variables. The most frequent form in which these bounds are used
in computer science is when there are n ?ndependent random bits Xl X2
X = Z?j=1 Xi, and ? = EF?]; in this case, these bounds show that Pr(X >
is inverse exponential in ji and 62, for 0 < 6 < 1. W??? introduce a new way of look-
ing at the CII bounds and prove that CH-type bounds can be used for the sums of
certain types of dependent random variables too. A generalization to a related form
of dependence is known (Raman [1021), but it is not strong enough to be used in
our applications. We also prove similar results to certain types of dependent non-
binary random variables. These new extensions of the CII bounds are used cruciallv
in the analysis of our algorithms. Furthermore, these methods have served partly
as motivation for recent work of Schmidt, Siegel & Srinivasan [108] and Srinivasan
[1181.
The edge coloring problem can be used to model certain types of jobshop schedul-
ing, packet routing, and resource allocation problems in a distributed setting. For
example, given a set of processes ? and a set of resources 1? such that each process
p Ei ? needs a subset f(p) C 1? of the resources where: (i) each process p needs
every resource in f(p) for a unit of time each, and (ii) p can use the resources in
f(p) in any order, we can construct a bipartite graph Gp?? = (?, 1Z, Ep??) where
Ep,? = f(p, r) p Ei ? A r E f(p)t, and an edge coloring of Cp?? with c colors yields
a schedule for the processes to use the resources within c time units. It has also
been used in resource allocation problems such as the Dining Philosophers problem
13
(Lynch [77], Styer & Peterson [120]). Edge coloring can also be used in distributed
models in situations where broadcasts are infeasible or undesirable: an edge coloring
of the network results in a schedule for each processor to communicate with at most
one neighbor at every step (at time step i, processors communicate via the edges
colored i only), and using a ?small" number of colors reduces the wastage of time in
this schedule,
Note that A colors are necessary to edge color a graph with maximum degree A.
Vizing showed that it is always possible to edge color a graph with A + 1 colors and
gave a polynomial time algorithm [135] (see, for example, Bollobas [19]). Karloff &
Shmoys gave an RNC algorithm for this problem that uses A + A0?5+? colors, which
was derandomized in A?C by Berger & Rompel and Motwani, Naor & Naor [17,81].
In the distributed model, a straightforward edge coloring algorithm is to apply a
vertex coloring algorithm to the line graph of 0. There are fast (polylogarithmic)
randomized vertex coloring algorithms that use (A+ 1) and A colors, which translate
to (2A--H1) and (2A--H2)--Hedge coloring algorithms respectively (Luby [76], Panconesi
& Srinivasan [92]). There are no known polylogarithmic deterministic algorithms in
the distributed setting for (2A --H 1)??dge coloring; moreover, distributed A?dge
coloring requires ?(diameter(G)) time, even with randomization [92].
It has recently come to our attention through Alon & Spencer [7], that meth-
ods of R5d [106] and Pippenger & Spencer [96] can be used to derive randomized
polylogarithmic distributed algorithms to edge color using A( 1 + o( 1)) many colors.
However, we present the material in this chapter with the view that the analysis is
of independent interest, and also in view of the new extension of the CH bounds and
its applications.
This work also belongs to the realm of studying how a distributed system can
solve some problem (related to scheduling and synchronization) on its own topology.
The main problems that have been studied in this regard are computing a maximal
independent set (MIS) in 0 (Alon, Babai & Itai [5], Luby [75]), a (A + 1)--Hvertex
14
coloring of 0 (Awerbuch, Goldberg, Luby & Plotkin [101, Luby [76]), and a A-vertex
coloring of 0 [92]. An NIlS captures the idea of a set of processors working in parallel
without interfering with the decisions made by their neighbors, and a vertex coloring
is a partition of the processors into independent sets, yielding a schedule for the
processors to work in parallel.
An important point about all of these problems is that they need just a local
search for an incremental update, in the following sense. Consider the simple sequen-
tial algorithm to compute an MIS pick any vertex v, add it to the partial MIS
computed so far, remove v and its neighbors from 0, and repeat. Thus, a partial
MIS can be updated incrementally with a local search of radius 1. Similar `local
search" results for incremental updates are known for (?t 1) and %-vertex coloring
(and hence for (2? --H I)--H and (2A --H 2)?edge coloring): local searches of radius 1 and
radius O(log? n) respectively [92]. Hence, the key problem in implementing these
in a parallel setting is symmetry breaking: parallelizing the incremental sequential
algorithm (implied by such a local result) by somehow breaking the symmetrical ef-
fort of all the processors to execute one incremental step of the sequential algorithm.
Two key tools for symmetry breaking are randomization [5,75,76] and network de-
composition (Awerbuch, Goldberg, Luby & Plotkin [10], Linial & Saks [72], Berger
& Cowen [16], Panconesi & Srinivasan [92]).
However, such a local search result is not known for edge coloring with less than
--H 2 colors. Note that if we are allowed 2? --H 1 colors, an uncolored edge is always
surrounded by at most 2A --H 2 colors, and a local search of radius 1 is sufficient for
the update. It is not clear that any local result should hold when we have at most
--H 3 colors. Hence, another main contribution of this chapter is a fast and simple
distributed algorithm for a problem not falling under the hitherto dominant paradigm
of "symmetry breaking and local search". Our algorithm is also very simple; we first
present an algorithm for edge coloring bipartite graphs, and then extend it to general
graphs using an idea of Karloff & Shmoys [57]
15
2.1 Notation
Given an undirected graph c; = (V,E) we denote by A its maximum degree, ic.,
the maximum number of edges incident at any node; by du we denote the degree of
vertex u, and by ?(u) we denote the set of edges incident at vertex `u.
Given a positive integer n, [n] denotes the set fi, 2nJ. The permanent of a
(possibly non--Hsquare) matrix i? ? ?rxc with c _ r is defined as the natural extension
of the permanent of square matrices. Let ? = : [ci [r], r is one--Ht&?onet.
Then,
perm(Ai) = ? fi Ai?(j)?j.
?EP i=I
An event A is said to happen with high probability (w.h.p.) if Pr(A) > 1 --H l/f(n)
for some superpolynomial function f(n) (i.e., nc = o(f(n)) for all c > 0. and the
failure probability of A is at most l/f(n)).
2.2 The Chernoff-Hoeffding Bounds Extension
In this section we introduce our extension of the Chernoff-Hoeffding bounds, which
are important tools used in estimating the tail probabilities of random variables.
Given n independent and bounded random variables Xl, X2Xn, these bounds
are used in deriving an upper bound on the upper tail probability Pr(X > (1 + e)jt),
where X = Z?1 Xj, ? = E[X], and e> 0. We extend these bounds to a certain case
of dependency among the Xj's, which we call A-correlation. We first introduce this
notion in the case where the Xj's are 0-1 random variables and prove our extension
of the CH bounds. This special case is simpler to handle and more intuitive but
contains all the essential difficulties of the general case. After the special case is
dealt with, we will take care of the more general case.
Let us review Chernoff's classical approach to upper bound the upper tail prob-
ability of a random variable X, which is the sum of independent binary random
16
variables X1, X2Xn [27] (this approach apparently dates back to 5. N. Bern-
stein). Let Pr(X? = 1) = pi) and ? = E[X] = ZW--H1m We want good upper bounds
on Pr(X > a(1 t e)), for e > 0. Chernoff implicitly showed that for identically
distributed 0--H1 variables Xl X2)(? and for a > ?,
E[etX]			fl--H?)n?a
Pr(X>a)??m?nt 6at ?L(fl?ii??)=(??)?(? --H a
Hoeffding [50] extended this by showing that L(n,u?a) is an upper bound for the
above minimum even if the Xj's are not identically distributed and range between 0
and 1. Replacing a by ?(1 t c) in the Hoeffding estimate L(.,.,.) gives, for e > 0,
(it(n--H74+()))n--Htt(I+()
> ?(1 + e)) ? F(n,a,e)			(1 + ?)t(1+()
Since L(n,a,a) is symmetric with respect to (a, ii) and (n--H a, n --H ji), the Hoeffding
estimate also shows that
Fr(X < jt(1--Ht)) = Pr(n--H? > n--Hu(1--He)) ? F(n,a,--He) = (1
(1 --H
The following simple upper bounds for F(n.a? 6) and F(n,a --He) are sufficient to
derive most of the useful approximations that have appeared in the literature [50,9,
101,8].
G(a,e) = ?(i+))`+??? (see, for example, [101]); (2.1)
for e < 1, G(jt, 6) ? e??2ItI3 [9];			for e > 2e --H 1, 0(a e) < 2--H(I+?)# [101], and
F(n,a? --He) < 0(a? --He) < e???2/2 [50,9,101,8]
Hoeffding also considers the more general case where X is the sum of n inde-
pendent and boundedrandom variables Xi E [ai, bi], and uses the above approach to
show that if E[X] = a? then for 6> 0,
i?)?n )t??k??a? < H(a, e, ?ffl b) = exp --H ??2??$ai)2 (2.2)
17
Inequalities (2.1) and (2.2) will be used in our proofs. If ? is a fixed positive
quantity no greater than 1 (which will be true in all our applications), then C(ji, ?) ?
??? ?/3 Hence, if ? = ?(log1+? n), then C(?, ?) is the inverse of a superpolynomial
function of n, for any fixed ? > 0 (similar considerations apply to H(?., ?, J, b), when
each bi --H a? is bounded). This fact makes the CH-bounds a powerful tool for deriving
strong performance guarantees for randomized algorithms and will be used repeatedly
in this chapter.
2.2.1 Binary Random Variables
We now present our extension of the CH-bounds to the case of 0-1 A--Hcorrelated
random variables. We begin by defining A--Hcorrelation (this was defined as `?self-
weakening with parameter A" in [91]) for 0-1 random variables.
Definition 1 Let X1,X2,... , ?n be 0-] random variables. The Xi `sare A--Hcorrelated
if, for all nonempty I C [n],
Pr %Xi=l ?AHPr(Xi=1).
i?1
Before proving the extension of the CH-bounds, let us develop some intuition
by giving an example of A--Hcorrelation. Suppose A balls are thrown uniformly at
random and independently into A bins. Let Bj be an indicator random variable that
is 1 if bin i is empty and 0 otherwise. These variables are 1--Hcorrelated. To see this,
consider any nonempty subset J c [A] of the bins and suppose that all these bins are
empty. Then, given this, the probability that some other bin, say i, remains empty
decreases. That is, for all J c [A] and i ?
Pr(Bi =1			=1) = 1?A$ J			?<(1 --H			= Pr(Bi = 1).
By straightforward induction, this implies that for all nonempty I C [A]
jEl ? =			II Pr(Bj = 1).
jEl
`s
The CH-bounds say that if x is the sum of n independent 0-1 random variables
then Pr(X > (1 + e)u) < O(jt,e). The next theorem shows that if the Xis are
A--Hcorrelated then Pr(X > (1 + e)ji) < A C(ii, e). In the statement of the following
theorem, jt is an upper bound on E[X]; this is sufficient because we are upper
bounding the upper tail probability of X.
Theorem 1 Let X be the sum of A-correlated 0-1 random variables, and let E[X] ?
jt. Then,
Pr(X > (1 + e)?) ? A G(?,e).
PROOF. As in the classical proof, we start by introducing a positive parameter
t and by applying Markov's inequality to the variable etx
Pr(Y > (1 + e)?) = PT(etX > et(l+?)?) <
To find a good upper bound on the numerator (we cannot use independence of the
Xj's) we write down its Maclaurin expansion, and use the fact that etX is bounded
by etn to apply linearity of expectation to the infinite series:
E[etXl = E
tk ?k			tkEF?k]
k'			=?			k!
k=o
Focus on the generic term E[Xkl of this series. From the facts that X = Z Xi and
that the Xj's are 0-1 random variables it follows that E[Xk] is a linear combination
of the form
E[xk] =			a? Pr A Xj
for 1 ? ?, and where all aJ's are non-negative (this can be verified by unfolding
Xk and by observing that Xf = Xj for all i and all c > 0). In order to upper
bound the term E[Xk] we introduce n twin 0-1 random variables Xj which have the
19
same marginal (individual) distributions as the Xis except that they are indepen-
dent. From the hypothesis of A--Hcorrelation and from the fact the twin variables are
independent it follows that
Pr )1Xi=1 ?A ;HEITh(X?=l)=A ff1Th(ki=1) APr %Xi=l
Hence, E[xk] ? A E[kk]. By upper bounding each term of the series expansion of
E[etXl, we get
E[etX] < A E[etX].
Note that X is the sum of n independent random variables and E[A] E[X] <
Hence, by inequality (2.1),
Pr(X > (1 + ?)ij) ? Ektxl E[etX]
--H et(l+?)I ?A et(1+e)i
To see an application of this theorem, let us go back to the balls-and-bins ex-
periment, and suppose we are interested in estimating the number of empty bins
after the balls are thrown; this number is given by the random variable B = Zi Bj.
By linearity of expectation and by the fact that the balls are thrown at random
independently of each other,
E[Bi]= ?
iE[?			iE[?
We already saw that the Bi's are 1-correlated. Hence, by Theorem I,
Pr(B > (I + e)?/e) ? G(A/e, e)
A very different proof of this result also appears in ([8], page 91), using the theory
of martingales.
Remark. Jain has proved the following lemma [102]: Let al, a?a? be n random
trials (not necessarily independent) such that the probability that trial a? `succeeds
20
is bounded above by p regardless of the outcomes of the other trials. `Then J X is the
random variable that represents the number of `successes in tbese n trials, and Y is a
binomial variable with parameters (n,p), then: PrF? > ? Pr[Y > kj. 0 ? k < n.
The assumptions of Jain's lemma are strictly stronger than those of l-?correlation.
For instance, in the balls-and-bins example,
iE[?--H1]
which. for A > 3, is greater than Pr(B?			1) ( lIe).
2.2.2 The General Case
B1 =0 = A-i
A + 1
W'e now introduce the more general definition of A-correlation and prove the corre-
sponding extension of the CH-bounds.
The proof of Theorem 1 is based on the observation that if we can upper bound
each term E[xk] of the Taylor expansion of E[e?X] by A E[kk] where X is the
sum of independent random variables, and if E[etX] < B then E[etX] ? AB. This
motivates the following definition.
Definition 2 Let Xl, ?2X% be bounded random variables such that Yj E [aj, bi]
and let X = ZiE[n] ?j. The Xi `s are A-correlated if there exists a collection of
independent twin random variables Xi Ei [aj, bi] such that,
o+ E[X] < E[k], where -? = ?iE[nj Xi, and
0 for all nonempty I C [n] and positive integers 51
E[H X7.?] < A H E[k;?]
jEl			ici
If we apply this definition to the case of 0-1 random variables we see that to have
A--Hcorrelated 0-1 random variables, it suffices to find twin variables Xi such that
E[X] ? E[k] and, for all nonempty I C
21
Pr j$Xi=1 ??AHTh(Xi=l)
iEl
which reduces to Definition 1 when Pr(Xj = 1) = Pr(kj = 1). Notice that for this
to hold, it is not necessary that Fr(??? = 1) = Pr(?)(? = 1); in typical applications
we do not know the exact value of Pr(Xj = 1) but only an approximation (usually
an upper bound; we will see examples in later sections), but this is good enough to
apply the CH-bounds extension.
Theorem 2 Let ? be the sum of A--HcorreThted random variables Xl, X2
where )Cj Ei [aj, bi] and let ? be the sum of the n twin variables ?j. Then,
Pr(? > (1 t e) EF?]) < A H(?, e, ?ffl b).
PROOF. Let ? = E[X]. As in the proof of Theorem 1, we start with the
inequality
Pr(?? > (1 + e)?) = Pr(etX > et(l+c)tL)
? E[etX]
By the hypotheses of boundedness and of A--Hcorrelation it follows that
E[etX] = E  t%!k = j tk)$kl ? A			t?E[?k] = A EktA1,
k=o			k=o			.			?
k
By the already discussed result of Roeffding, when X is the sum of n independent
bounded random variables Xj E [aj, bi],
o+ E[etX]
min
t>t) et(l+c)I < H(n, e, ?ffl b).
In this chapter, we will use the special case of Theorem 2 where Xi ? [0,1],
i ? [n]. In this case, G(?, e) is also an upper bound for the upper tail of X.
22
Corollary 1 Let Y be the sum of n A--Hcorrelated random variables )(j C [0, 1] and
let Y be the 5U?n of the n twin variables A). Then
Pr(X > (1 + e)E??]) ? A O(ji,e).
PROOF. Let E[N'] ji. When k is the sum of n independent random variables
Xj E [0,1] Hoeffding shows that, for t Ei (0,1 --H ?/n) (cf. Theorem I of [50]),
Pr			_____			nt
k$? > ?			1 + ? --H --H nt n-?-nt
By setting e = nt7ii and by applying the standard inequality 1 + x ? er for x --H
nt/(n --H --H nt), we get
Pr(X > (1+ e)jt) < (1+?)?i+c? ?
2.3 The Edge Coloring Algorithm
We now present our randomized distributed edge coloring algorithm. The algorithm
uses an idea of Karloff & Shmoys to reduce the problem of edge coloring general
graphs to that of edge coloring a special class of bipartite graphs [57].
The Karloff & Shmoys algorithm was proposed for the PRAM model and is
as follows. The input to the algorithm is a graph O = (V, E) of maximum degree
1. Compute a random partition of V into black and white vertices (all vertices flip
a fair coin independently and in parallel). Let G[B] be the subgraph induced
by the black vertices, G[Wj be the subgraph induced by the white vertices.
and G[B, Wj be the bipartite subgraph formed by the edges having endpoints
of different colors.
2. Optimallyedge color G[B, W], i.e., with A(C[B, ?V]) colors, using the PRANI
algorithm of Lev, Pippenger & Valiant [70].
23
3. R?ecurse on G[B] arid G'[??] using the same set of fresh new colors on both
graphs.
The key fact is that as long as A is large enough, the maximum degree of the
three subgraphs is at most A/2 + A'12+? --H A/2 + o(A) w.h.p., for any fixed e > 0.
(This can be proved by applying the standard CH-bounds.) Note that the same set
of fresh new colors can be used on both G[R] and G[W? because G[R, l??] completely
separates the two graphs. The recurrence for the number of colors used is
C(A) <			+ Al/2+?) + c (?A2 + Al/2+?)
? A+o(A)
and holds w.b.p. The `high probability" statement holds as long as A = ?(log1+? n),
for any fixed ? > 0. In this case, the failure probability given by the CH-bounds is
the inverse of a superpolynomial function. For the case when the degree goes below
the ?(log1+6) threshold, Karloff & Shmoys give a PRAM algorithm to (A+ 1)??dge
color graphs of this low maximum degree.
The non--Hdistributed parts of the Karloff & Shmoys scheme are the subroutine
which colors the bipartite graph optimally (it can be shown that this is impossible
in the distributed model of computation [921), and the handling of the case A ?
log1+6 n. The latter can be handled by Luby's randomized distributed (A + 1)-
vertex coloring algorithm [76j. Luby's algorithm is simulated on the line graph of the
given graph G thus giving a 2A(G)--H 1 edge coloring. (When A(G) = O(polylog(n)),
we can compute a 2A(G) --H 1 coloring deterministicalty in O(polylog(n)) time via
an algorithm based on the idea of removing maximal matchings. We prefer to use
Luby's algorithm here for conciseness.) Hence, if we could get a good distributed
algorithm for bipartite edge coloring then we could use the Karloff & Shmoys scheme
to get a good coloring of any graph. We now show how this can be achieved.
24
2.3.1 Distributed Edge Coloring of Bipartite Graphs
We now present a simple Monte Carlo algorithm for edge coloring the special type
of bipartite graphs generated by the Narloff & Shmoys scheme.
Given a bipartite graph 0 = (A, R, E) we assume that each vertex knows whether
it belongs to A or B. This is an important assumption because such information
cannot be computed fast distributively [92j, but it holds for the bipartite graphs
generated by the Karloff & Shmoys scheme. From now on, we will refer to vertices
in A as the top vertices and to the vertices in B as the bottom vertices.
The algorithm takes O(log n) time and colors a bipartite graph 0 of maximum
degree ? with at most 1.6? + O(log2+?? n) colors w.b.p., for any fixed 6 > 0 (the
failure probability will depend on 6). In the algorithm we use a variable Ac that
is initialized to A(0). During any iteration of the algorithm, Ac is meant to be
an upper bound on the degree of the current graph; we will prove later that this
holds w.h.p. Recall that 6(u) denotes the set of edges incident at vertex v, and let
THRESHOLD = log2+6 n.
The algorithm is given two parameters e, 6 > 0, and is as follows:
1. PART I: Repeat until Ac <THRESHOLD:
Let 0c be the current graph. Pick a set ? of Ac fresh new colors,
o+ (Random proposal of bottom vertices) In parallel and independently of the
other vertices in B, each vertex v E B assigns a temporary color to each
edge in 6(v) with uniform probability without replacement, i.e. edge e? is
assigned color a C v with probability 1/Ac, C2 15 assigned 4K ? --H (at
with probability 1/(Ac --H 1) and so on.
o+ (Lottery of top vertices) (Comment: The coloring so far is consistent
around any vertex v ? R but can be inconsistent around a vertex v E A.)
For each v E A, let Cu(a) be the set of edges in 6(v) with temporary color
a. Each vertex v ? A selects a winner uniformly at random in C?(a),
25
for each nonempty C?(a). All other edges, the losers. are decolored and
assigned "color" ffi.
o+ Set ?? Ac(1 + e)7e, where e = 2.71?. is the base of natural log-
arithms. Oj, the subgraph of E;c induced by the losers (i.e., by the
i-edges), becomes the new current graph.
2 PART II: Let Ur be the remaining graph. Edge color Cr with 2A(Gr) --H 1 ?
2 THRESHOLD colors by executing Luby's vertex coloring algorithm on the line
graph of Gr [76].
The key claim is that in every iteration of part (I) above, the m&ximum degree of
the graph shrinks by a factor of at least (1 +e)/e wfi.p., as long as A > THRESHOLD.
That is,
A(Gc)
e
with high probability, for any fixed e > 0. The condition Ac > THRESHOLD ensures
that the failure probability given by the extension of the Cil-bounds is the inverse
of a superpolynomial function. The reason for setting THRESHOLD = l0g2+? n will
be apparent from the probabilistic analysis.
Once the key claim is established, we can bound both the total number of colors
used and the running time of the algorithm. To bound the number of colors used,
observe that if the degree of the graph shrinks at every iteration by at least an
(1 + e)/e factor w.h.p. then the maximum degree of Gr is at most log2+? n w.b.p.
Hence, w.h.p,, the number of colors used is at most
where k is the smallest integer such that A(l + e)k/ek < l0g2+? n. The total number
of colors used is at most (here, e' depends on e and can be made arbitrarily small)
C(A) ?			1 +			A + (2 --H			1 --H ?) log2+6 n
26
l.585A+04 log2+6 n
? t.59A
when A > SO log2+? n. The running time of the algorithm is 0(log n) because
part (I) takes 0(log A) time (by the key claim), and part (Il), i.e., Luby's algorithm,
takes 0(logn) time.
If A > 80 log2+6 n and we use our distributed subroutine for bipartite graphs in
the Karloff & Slimoys algorithm the total number of colors is given by the recurrence
TC(A) < 0			+ A'12+?) + TO			+ Al/2+()
? 1.59A+o(A)
< 1.6 A.
If A < 80 log2+? n, we apply Luby's algorithm directly to get an edge coloring with
2 A --H 1 ? 160 log2+6 n colors. Hence, the total number of colors to color any graph
is at most 1.6 A + 0(log2+6 n), for any ?> 0.
The rest of this chapter is devoted to establishing the key claim.
2.4 Probabilistic Analysis
This section is devoted to proving the key claim of the preceding section, namely
that given a graph G and A such that A> A(G) and A > THRESHOLD then, after
one iteration of Part (I) of the bipartite algorithm, the maximum degree of the new
graph, A(G1), is at most (1 + e)A/e w.b.p., for any fixed e > 0. It turns out that
the analysis is considerably easier for the top vertices than for the bottom vertices.
We begin with the easy part.
2.4.1 Analysis of Top Vertices
Let u be a generic top vertex with neighbors Vj, E [du], and incident edges e? =
(u, vi). Recall that a loser is an edge which, after having got a tentative color in the
27
random proposal, lost the lottery and got decolored. So., the new degree of v is given
by the number of losers incident at it.
From the point of view of a top vertex, the random proposal and the lottery
are equivalent to the following random experiment. For each edge e? incident at u
we introduce a ball i, and for each color k we introduce a bin k; the assignment
of a tentative color to an edge by the algorithm is equivalent to throwing each ball
into one of the A bins independently and uniformly at random, since the bottom
vertices assign tentative colors with uniform probability and independently of the
other bottom vertices. kecalling that we have at most A balls and exactly A bins:
?Thsers = :balls --H ;;winners
<			:`:bins --H ?nonempty bins
--H			?#empty bins.
Let X be a random variable denoting the number of losers. To estimate X and
its tail distribution we will study the random variable B = `?empty bins. For this
purpose, we introduce A many indicator random variables
1 bin k is empty
Bk =
0 otherwise
and hence, B = Zkc[A] Bk. Note that X < B always. The variable B was studied
in Section 2.2.1 where it was shown that E[B] ? A/e and that the Bk's are 1-
correlated, which implies that Pr(B > (1 + e)A/e) < G(A/e, E). Since E[Xj ? E?B]
and Pr(X > (1 + e)A/e) < Pr(B > (1 + e)A/e), we get
Theorem 3 Let u be any top vertex. For any iteration of Part (I) of our algorithm,
suppose the degree of v was at most Ac before that iteration, and let X be the random
variable denoting the number of losers incident at v afler the iteration, i.e., v's new
degree. Then, E[X] ? Acle and
Fr(X > (1 + e)Ac/e) ? O(Ac/e, e).
28
2.4.2 Analysis of Bottom Vertices
In this section we analyze what happens to the new degree of a generic bottom
vertex ?B This case is considerably harder to handle than the previous one because
of the way in which the random variables describing the process are correlated. For
a top vertex, the dependency among the variables was playing for us; given that
some edges incident at a top vertex are losers the probability of having another
loser decreases. For a bottom vertex the situation is reversed: having some edges
losing the lottery might even make the probability of having another loser !ncrease.
The problem can be seen in the following situation. Let xl = vB and r2 be bottom
vertices, and yi and Y2 be top vertices which induce a four-cycle, ?.e., there is an
edge e?? = (rj,yj) for ?,j = 1,2. Suppose we are given that eli got tentative color
? and lost the lottery, and that el,2 got tentative color i3; we will argue intuitively
that given this, the probability of ei?? losing the lottery has increased. Since ?i?i
lost, the probability of C2,i getting tentative color a increases. which implies that
the probability of e2?2 getting tentative color j3 also increases, and this increases the
probability of e1,2 losing the lottery.
For the sake of the analysis we modify the algorithm as follows: instead of per-
forming all random proposals in parallel, suppose that the bottom vertices perform
their random proposals sequentially one after the other and independently, with vB
being the last. This does not modify the probability distributions because the choices
are still done independently. ?`e want to focus our attention on the last vertex VB
performing the random proposal. We will use the fact that when vB performs its
random proposal, all edges not incident at vB already have a tentative color. By
symmetry, any upper bound on the probabilities we can find for vB will hold for all
bottom vertices.
Let e?, for i C [dvB], denote an edge incident at the bottom vertex ?B? with the
other endpoint of e? being Uj. We introduce the indicator random variables
29
e? loses the lottery
Xj =
0 otherwise
and want to study the expectation and tail probability distribution of X = ZiE[dvB] Xj.
Computing the expectation is easy.
Lemma 1 E[Xl ?
PROOF. It is sufficient to show that Pr(X? = 1) ? 1/e, for all i E [dv11]. From
the analysis of the top vertices, we know that the expected number of losers incident
at Uj is at most A/e and hence that ZejE6(ui) Pr(e? loses) ? A/?. By symmetry,
Pr(e5 loses) < 1/e for all j E ?(uj), and hence Pr(X? = 1) ? 1/e. E]
We now study the tail probability distribution of X. Our goal is to show that
X ? (1 + ?)?/e w.h.p., for any fixed ? > 0. Establishing this claim will take several
lemmas.
For technical reasons that will be clear later, we subdivide the edges incident at
vB into ? groups, each of at most ? edges. Let A be any such group and define
XA = ?e?EA X?; we would like to show that XA is the sum of A?correlated random
variables, so that
Ve > 0, PT(??A > (1 + e)?/e) ? A C(?/e, e).
R?ather than proving this claim, we will prove a weaker one namely that XA is the
sum of A--Hcorrelated random variables with high probability! More precisely, we will
define an event A such that A happens w.h.p., and such that, given that A occurs,
then XA is the sum of A--Hcorrelated random variables. That is,
Pr(X? > (1 + e)?/e I A) < A G(?/e, e).
Showing this weaker claim is sufficient because
Pr(X > (1 + e)?/e) ? Pr(?A XA > (1 +
30
--H Pr(?A XA > (1 + ?)??/e I A) Pr(A) +
Pr(BA XA > (1 + ?)?X/6 AC) Pr(AC)
? Pr(?A XA > (1 + ?)?/e A) + Pr(AC)
? MAPr(X?>(l+?)V'?/?IA) + Pr(AC)
? ?A G(???/?,?) + Pr(AC).
Another point in the analysis is that A will not be a constant term but an expo-
nentially growing function. However we will be able to show the rate of growth of
A is much slower than that of l/Cr(?/e, ?). In other words, we will show that
A G(#&/e, ?) goes to zero superpolynomially fast.
now turn to the task of showing that XA is the sum of A-correlated random
variables w.b.p. Recall from the definition that the Xj's, e? Ei A, are A-correlated if,
for all nonempty I C A,
?AHTh(Xi=l).
Pr e)iXi=l			jEl
Consider then a generic subset I C A of edges incident at the bottom vertex ?`B,
and let us see how to compute Pr (Aei?i Xi = i). Recall that we are analyzing the
situation where vB is the last vertex to perform its random proposal. This means
that prior to the assignment of a tentative color to edge e? = (vB, uj), all other edges
incident at Uj already have their tentative color. Using the balls-and-bins language,
we can say that prior to throwing ball ?? at random into one of the bins, all balls
coming from the other neighbors of Uj have been thrown. We will think of Cj as a red
ball and of the other edges at Uj as blue balls. Once the red ball is thrown in, say,
bin k Ei [?], a winner is selected uniformly at random among all (i.e., red and blue)
balls in bin k. All other balls, the losers, are discarded. Notice that the probability
of discarding the red ball is itself a random variable which depends on the particular
placement of the blue balls prior to throwing the red ball.
Given any placement of blue balls at Uj, we construct a vector of probabilities
31
Ci as follows. Let ?ik denote the m?mber of blue balls in bin k c [A] of vertex
Ui, and let Pik = aik/(l + aik) denote the probability that the red ball loses the
lottery given that it was thrown in bin k (equivalently, Pik is the probability that
edge i loses, given that it got tentative color k). For each neighbor Uj of our bottom
vertex vB, we construct the corresponding vector Ci = (Pil,Pi2p??); recall that
all this is with respect to the given placement of blue balls at each Uj. Given a set
I = fei1, ....... , ej?? of edges incident at vB, we construct the matrix Al1 whose
i-th column is the vector Cjj. The next lemma explains why this matrix is relevant
to us. Let k]?=??--H1)...(?--Hk+l).
Lemma 2 Let I = fej1, ej2, . . . , ej?) C A, and fix some placement PB of blue balls
at each Uj. Then,
Pr A Xi =1			=
PROOF. The random proposal of vB, when restricted to I, is equivalent to
choosing an one?to-one function ?: I H [A] uniformly at random among the set P
of all such functions. Recall that the entry iNIei of i'Ai is the conditional probability
that edge e?? loses, given that it has temporary color ?. Hence,
Pr( A Xi = 1 ?B) = ? Pr( A Xi = 1 iris selected, P!3) Pr(iris selected)
e?EI			?EP			e?EI
= ? (M?(1)?1 `l?(2),2 l??k)?k) )? A)j
perrn( Aij)
[A]k
We now want to find a good upper bound of perm(A11). The following lemma
gives a simple upper bound that is sufficient for our purposes. Given a matrix Al
let Si denote the sum of the elements of the i-th column of Al.
Lemma 3 Let Al be a matrix with c columns and r rows, c _ r and with non-
negative entries. Then, perm(Al) < HiE[c] 3i
32
PROOF. Let ? =			:			[r],			is on&to-one). Then,
perm(AI) =
= fi Si.
iE[c]
i?I7r(1),1 `?`?(2)?2 ? I'Vf?(c)?c
7rEP
(A111+ `?+A!ri)(Afi2+???+?Ir2)...(Ati,c+.. +Airc)
The next lemma relates the value of Si to that of 1/e > Pr(ei loses), e? E 6(v).
It is an application of the most general definition of A--Hcorrelation. Before the proof
of the lemma, we establish
Proposition 1 Jf 0 < p < 1, q = 1 --H p and Tn is a positive integer, then
PROOF. Let
pTqm?r_r			(t?qm+l)
r			r+1			--H1
Tn
r
r+1
= 1 --H qm --H			q
T=1
Tn			I
r			r+l
Integrating, with respect to x, both sides of the binomial expansion
between 0 and p, we get
Tn
(x + q)m = ?Tqm?r
r=O			r
1--Hqm+l
rn+1
from which the proposition follows.			c
?Ve now return to our scenario where vB is the last bottom vertex to pick tentative
colors for its edges, with its edges having been split into groups. Recall that we
33
are focusing on a particular group A, and that we want a good upper bound on
Pr' (Ai--Hi kXji = for any nonempty I = fe51,e? ejkt C A. Given any
placement ?B of blue balls at each Uj, Lemma 2 shows that
Pr
= 1 ?B = perm(Ati)/[A1?,
and Lemma 3 gives the upper bound
k
perm(?kIj) ? H 5?j.
i=1
So, a good upper bound on each Sjj will hopefully translate into a good upper bound
for Pr (Ai=1,.,kXji = i). The next lemma says that Sjj ? A(l + ei)/e w.b.p. for
any fixed ci > 0. Recall that we are analyzing the situation where vB is to give
tentative colors for its edges and where all other edges have already been assigned
tentative colors. If we focus on a particular edge e5? = (vB, Ujj) the situation is
equivalent to throwing a red ball e?? uniformly at random into ? bins where each
bin has some number of blue balls in it (possibly zero); the number of blue balls in
each bin is itself a random variable.
Leinma 4 Let e?? = (v, Ujj) be any edge in I = [e?1, e?2, .. ., e??J C A, and Sj? be
the sum of the elements of the i-th column of ?i. Them E[Sjjl < A/e = ? and
Ve1 >0, Pr(Sjj > (1 + ei)?) <
PROOF. For vertex Ujj, let Ze be the random variable denoting the number
of blue balls in bin ? and ? = Ze/(Ze + 1) be the random variable denoting the
probability that the red ball loses the lottery given that it ends in bin ?. Then,
Sjj = Y = ?? Ye. Note that the ?e's are bounded random variables with values in
[0,1]. ?Ve will show that E[Y] ? A/e and that the ?e'5 are 1-correlated (under the
most general definition of A--Hcorrelation), which will give our claim.
Firstly, we may assume that we have d = ?--H 1 blue balls (i.e., that the degree of
Uj is ?): Pr(Y > (l+ei)?/e) is maximizedat d = A--Hl, as dvaries from l to ?--Hl.
34
(To see this, assume d = A --H 1 --H ? A --H 1. Add ? yellow balls to the Uue balls. and
run two experiments. In one experiment throw the blue and red balls and compute
the probability that the red ball loses the lottery. In the other experiment, throw
blue, yellow and red balls and again compute the probability that the red ball loses.
In both experiments, let us look at the bin where the red ball fell. The probability
that the red ball loses is b7(b + 1) for the first experiment, and (b + y)/(b + y + 1)
for the second, where 6 and y are, respectively, the number of blue and yellow balls
in the bin. Since y > 0. 67(6 + 1) < (6 + y)/(b + y + 1). Thus, if Y(d) indicates the
variable Y when Ujj has degree d, then Y(d) ? Y(A) for all d ? [A].)
First, we will show that, for all r ? A, E[Yr] < l/e and then we will show that,
for any set of ? > 1 indices L C [A] and strictly positive integers Sr,
E[H yTSr] ? l/e?.			(2.3)
rEL
Given this we can apply Corollary 1 by introducing n independent twin 0-1 ran-
dom variables ? such that E[?] = Pr(Yr = 1) = 1/e. Since the ?`s are binary,
inequality (2.3) is the same as
E[H Y;r] ? ? EtYr] = II E[Y)?],
rEL			rEL			rEL
which is to say that the Yr's are 1?corrAated. Noting that 0 < Yr ? 1, it suffices to
show that
E[H Yr] < l/ee			(2.4)
rEL
We assume w.l.o.g. that L = k1? and prove inequality (2.4) by induction on > 1;
when ? = 1,
?--H? A --Hl			lrl??1?r r
r ?A1?Ar+l
--H (1 --H 1/A)? < 1/e,
E[Y1] =
35
where the second equality follows from Proposition I. Notice that for all r C [A],
E[Yr] = E[Y1] < l/e. ?hen i > 1, the law of conditional probabilities gives
E[ fi Yr] = F[Y1Y2 .. ?e--Hi E[Y? I Y1, ...... ,
Suppose we show that for all non-zero CT ? [0, 1] with r ? k --H 1],
(2.5)
I A Yr = cr] ? -;			(2.6)
then, since the product Y1Y2 ?e--Hi in equation (2.5) is zero when any Cr is zero,
we see by induction on ? that
F[HYr]
r=1
= EWY?...Ye?iEWeIYi,YiYe?i]]
`--H1
?e1E[H Yr]
T=1
Hence, the claim follows if we can show that inequality (2.6) holds.
If a? denotes the number of blue balls that fell into bin r, then Cr = ar/(ar + 1);
each a? is strictly positive since Cr >0. Let a = ?e?z11 ?? > ? --H 1, p= 1/(A--H?+ 1),
and q = 1 --H p. Then
e--H1			e--H1
E[YeI A Yr = cri = E[YeI A Zr =
r=1			r=1
--H			? t(r,a),
r=1
A--Hl--Ha			r
t(r,a)=			r			prq??l?a?r r+l
where
36
It is easy to check that t(r,a) > t(r.a + i). As a consequence, the maxim'im
value of E[YeI A??' Y --H cr] is attained at a (`--H 1, in which case we have
by Proposition 1.
?ffi??t(r,a) =
t(r,?--H 1)
r=1
r
A-e+1
q			< l/e
pq
Putting all the pieces together (recall that k = I ):
Pr( A Xj = 1) --H
e?EI
perm( Al1)
[Alk
HiEl3i
r
r+l
(1			+ ?`? k ?			[Alk
Ak
(1			+ ?1)k [Alk			7k'
E
for any el > 0. The first line follows from Lemma 2, the second from Lemma 3, and
the third from Lemma 4 and holds w.b.p. The last line is just a re-writing of the
third one. Given ci > 0, let A( be the event that, for all e? ? A, Si < (1 + ei)A/e.
Lemma 4 says that A?1 happens w.h.p., for any ci > 0. The last line of the above
chain of inequalities means that, given that Ac1 occurred, the Xj's are A-correlated
with A = (1 +ci)kAk/[A]k. (Alternatively, we can say that the Xj's are A--Hcorrelated
w.b.p.) We now want an estimate of A. The next lemma explains why we subdivided
the edges incident at vB into groups.
Lemma 5 For positive integers t and k, tk/[t]k ? ek2/t, j k ? t/2.
PROOF. W,e first note that ln(l --H x) > --H2x for 0 ? r ? 1/2. This is true,
since if we define g(x) = ln(l --H x) + 2x, then g(0) = 0 and g'(x) = `jffi2f, which is
37
non-negative for 0 <r ? 1/2. Now.
[1]k k--H1 2)
tk --H			fi			t
i=1
exp )??n (i --H
exp
_			(since k < t/2)
> e?T
For t = ? and k = 11 < VmA this lemma implies that Ak/[?]k < e and hence,
given A?1, the Vj are A--Hcorrelated with A = (1 + ?i)ke < via the standard
approximation 1 + x ? ex. Though A goes to infinity superpolynomially fast as
goes to infinity, we can still upper bound Pr(X? > (1 + ?)A'/e) by a term that goes
to zero superpolynomially fast, as follows. (Here, ii =
PT(XA > (1 + ?)ij) = Pr(X? > (l + h? A?1) Pr(A?1) +
Pr(X? > (1 + ?)? A?1) Pr(A?1)
? Pr(?? > (1 + ?)tj I A?1) + Pr(A?C1)
< A&(??) + Pr(BAX?>(1+?i)ti)
? e1+?'? &(?, ? + ? &(?, ?i)
< e1+?'? e??2?13 + VmA G(?, El)
< + ? 6'(?, El).
Since El can be chosen arbitrarily small, we can make the above term go to zero
exponentially fast by choosing El > 0 such that El --H E2/3 < 0.
:38
Remark. We can now see why the parameter THRESHOLD must be ?(log2+6 n):
the above failure probability goes to zero superpolynomially fast as long as --H
?(l0g2+? n), for any fixed ? > 0.
Hence, we have proven that XA ? (1 + e)#X'/e w.h.p., for any fixed e > 0. In
turn, this implies that the new degree of vB after one iteration of the algorithm is at
most (1 + e)?/e w.h.p. To see this.
Pr(X > (1 + e)?/e) ? Pr(?A XA > (1 +
_ MA Pr(X?> (1 + e)MA/e)
VmA (e1+((1??2/3)Vm?+ v'?G(??ei))
This inequality holds for any e and ci. Hence the new degree of all the vertices is at
most (1 + e)A/e w.h.p. and we get
Theorem 4 For any iteration of Part (i) of our algorithm, the maximum degree of
the graph afler that iteration is at most (1 + e)Ac/e w.h.p. for any fixed e > 0, if its
max?mum degree before the iteration was at most Ac.
In conjunction with Theorem 3, Theorem 4 proves the key claim, and hence, as
shown in Section 2.3.1, proves the stated bounds on the performance of our algorithm.
Thus, the probabilistic tools developed in Sections 2.2.1 and 2.2.2 have been crucial
in the analysis of our algorithm. As mentioned earlier, these results served partly
as the motivation for the work of the next chapter, and the work of Section 3.1
generalizes the results of Sections 2.2.1 and 2.2.2.
Chapter 3
Chernoff--HHoeffding Bounds for
Applications with Limited
Independence
As pointed out in Chapter 2, the most fundamental tools used in bounding the
tail probabilities, ?.e., the probabihties of deviation from the mean, of the s'ims of
bounded and independent random variables, are based on techniques of Chernoff [27]
and Roeffding [501. They are frequently used in the design and analysis of randomized
algorithms, derandomization, and in the probabilistic method. Motivated in part by
the work of Chapter 2, we present, in this chapter, a simple method which generalizes,
somewhat, the classical method for proving the Chernoff-Hoeffding bounds in the
case of bounded random variables confined to the interval [0,11. More importantly.
this approach requires only 1irnit?d independence among the random variables, and
thereby imports a variety of standard results to the case of limited independence
for free. This and related bounds lead to a variety of applications ranging from
improved bounds for tail probability distributions to new algorithmic results. This
chapter describes joint work with J. P. Schmidt and A. Siegel [los].
The "limited independence" result implies that sources of randomness that are
39
40
weaker than the standard model of unbiased and independent bits, are sufficient for
any algorithm whose analysis uses thc Chernoff-?Hoeffding bounds. It also provides
a better understanding of the proof techniques behind these bounds, and gives im-
proved bounds for various tail probability distributions. Via standard techniques. it
leads to algorithms requiring fewer random bits for such classical problems as ran-
dom sampling. The formulation also leads to approximation algorithms with better
approximation guarantees for certain problems.
Given n random variables Xl, X2`(n, suppose we want to upper bound the
?`upper tail" probability Pr(X > a). where X = Zt?i Xi, ii = E[X], a = a(1t?) and
? > 0. The classical idea behind the Chernoff-Hoeffding bounds (see, for instance,
Chernoff [271, Hoeffding [50], Raghavan [101] and Alon, Spencer, & Erd6s [8]) is as
follows; Hoeffding states that this was apparently first used by 5. N. Bernstein. For
any fixed t > 0, Pr(X > a) = pr(etX > 6at) < E[;'tX], by Markov's inequality.
Computing an upper bound u(t) on p[6tX] and minimizing U?tt) over t > 0 gives an
upper bound for Pr(X > a).
An important situation in computation is the one in which Xi ? fo, 1], i --H
1,2,. .. , n. For this case, we construct a class of functions of X that is as easy to
analyze, and which includes the class f6tX : t > 0] and do the above minimiza-
tion over this class. In the process. we discover that X1, X2?? need only be
h(n,u,?)-wise Independent for a suitably defined function h(.,.,.), which is typi-
cally much less than n for many algorithms; recall that a set of random variables V
exhibit k-wise independence if any subset of k or fewer random variables from V are
jointly independent, which is to say that their joint probability distribution function
is just the product of the individual distributions. One reason for the use of the 6tX
function in the classical methods is that E[etx] generates all higher moments of X;
using only a constant number of higher moments, for instance, gives weak bounds.
However, in the binary case, the first n moments are sufficient to generate all higher
moments, which motivates our method. Interestingly, this formulation also can be
41
applied to general Xi that take arbitrary values in the interval [0, 1], even though it
is not true that the first n moments of X = Z?j=1 Xi determine all higher ones.
The results have many applications to tail probability distributions. They imply
similar "limited independence" results when X1, ?2An take values in the inter-
val [0,11; this can be extended to bounded random variables, by scaling their ranges
to [0,1]. In the case of the hypergeornetric distribution (sampling without replace-
ment), it provides an elementary mechanism to attain slightly better bounds than
those implied in [50] and by Chva'tal [30]. The method also yields good upper bounds
for the tail probabilities of the sums of random variables with limited independena?.
These constructions also provide pointers to further improvement of the indepen-
dence bounds. For example, we redirect the basic method to obtain improved tools
for analyzing the behavior of the sum of k--Hwise independent random variables. The
results simplify and sharpen some of the analyses done in Schmidt & Siegel [109,
110]. In particular, we derive good upper bounds on E[((??1 Xi) --H E[?w?1 Xi])k],
where X1,X2,... , Xn are k--Hwise independent random variables, each of which lies in
the interval [0,1]; this leads to better independence bounds than our h(n, jL. ?) when
< l. It can be shown that the independence which we guarantee to be suffident for
the Chernoff--HHoeffding bounds cannot be improved in general by more than a factor
of O(log n). In fact, by combining our constructions with results of [85,6], we get
optimal independence (to within a constant factor) and optimal tail probabilities (to
within a constant factor in the exponent) simultaneously, for commonly occurring
situations such as the case where the Xi's are binary with Pr(Xi 1) = 1/2. ?Ve
also prove good bounds on the probability of exactty r successes in a sequence of k-
wise independent Bernoulli trials, which shows that even with modest independence,
probabilities and conditional probabilities are close to the fully independent case. in
situations like hashing.
The sufficiency of limited independence has several computational applications.
First, it means that any random process whose analysis uses the Chernoff-Hoeffding
42
bounds can be simulated with a weaker random source than one which outputs
unbiased and independent bits. Next via known constructions of random variables
with limited independence using fewer random bits (Joffe [54], Carter & Wegman
[24], Mehlhorn & Vishkin [79], Alon, Babai & Itai [5], Siegel [114]), we can reduce
the randomness required for certain algorithms. One example is that of random
sampling: given a universe U and a subset X c u the problem is to estimate the
fraction of objects of type X in U, such that the absolute error of the output is
at most ? with probability at least 1 --H ?, for given error parameters ? and ?. The
new constructions imply that if R independent samples are required to yield the
desired bound, then it in fact suffices for those R samples to be k*-wise independent,
for k* = O(log(1?)). These samples can be generated by O(log(-?)) random samples
from U, using standard methods. Since the best previously attained bound on R
is O(? log(1?)), this significantly reduces the amount of randomness needed for this
problem. This construction is not optimal with regards to the number of random
bits used?see Bellare, Goidreich & Goldwasser [14] for an optimal construction, but
it is extremely simple. It is also easily parallelizable, while it is not yet known how to
parallelize other schemes for reducing randomness, e.g., random walks on expanders.
Via weaker bounds on the kth moment, essentially the same bounds for the random
sampling problem have been obtained by Bellare & Rompel [15]. We believe that
there should be additional applications yielding reduced randomness ?for free'?. A
spectrum of explicit constructions of oblivious routing algorithms on the butterfly
with varying time--Hrandomness parameters is among the results of Peleg & Upfal [95];
our "limited independence" result directly matches these bounds on the hypercube
and, we believe, should extend to other interconnection networks.
Finally, we combine the method of conditional probabilities [100,117] with the
new construction to obtain two results. We get a much faster implementation of
the sequential jobshop scheduling algorithm of Shmoys, Stein & Wein [112]. It is
comparable in time complexity to the speedups due to Plotkin, Shmoys & Tardos
43
[97] and Stein [119] but importantly, the approximation bound it presents is better
than the ones of [97] and [119]. Here, we show that a problem can be derandomized
directly, thereby avoiding the bottleneck step of solving a huge linear program. We
also prove an ?exact partition" result for set discrepancy, and derive a polynomial-
time algorithm for it.
3.1 The Basic Method, and Applications to Tail
Probabilities
now introduce the method, discuss its implications to the tail probabilities of
various distributions, and analyze some related approaches. ?`e also prove proba-
bility bounds for exactly r successes in a sequence of Bernoulli trials under limited
independence. As discussed above, the basic idea used in the Chernoff-Hoeffding
(henceforth CII) bounds is as follows. Given n random variables (henceforth "r.v."s)
Xi,X2,... , Xn, we want to upper bound the upper tail probability Pr(X > a),
where X = Z:??i Xi, a = E[X], a = a(1 + ?) and 6 > 0. For any fixed t > 0,
Pr(X > a) = Pr(etX > eat) < E[etX],
eat
by computing an upper bound u(t) on E[etX] and minimizing ? over t > 0, we can
upper bound Pr(X > a). When Xi, X2, .. ,Xn are binary, we construct a class of
functions of X that includes the class fetX : t > 0) and do the minimization over this
class; in the process, we discover that X1, X2?? need only be h(n, a, 6)-wise
independent for a function h(.,.,.) that will be defined in equation 3.3 of the next
section.
Notation. If x is real and r is a positive integer, then (Xr) will denote, as usual,
with (zo) = 1.
44
3.1.1 Estimating tail probabilities of binary random
variables
The CII bounds are frequently used when the r.v.'s Xi,X2,... Xn are binary and
independent. In this section, we first assume that X1, X2Yn are 0--H1 indepen-
dent r.v.'s with Pr(Xi = 1) = p?, 1 ? i ? n; the independence assumption will
be relaxed later, and the results will be extended to r.v. )? Xi with 0 ?
Xj ? 1, in Section 3.1.2. Let X = Z?iXi, and ? = FLY] = ZW--Hipi The
reader is referred to Section 2.2 for a quick survey of the CII-bounds. In short, the
estimate derived there is
Pr(?Y > jt(l +?)) < F(n,jt,?) = (1+(n?tf(61+6))?n--Hii(1+?)
(1 + ?)a(1+6)
with
C(u,6) =
The basic idea behind these estimates is the simple fact that for any a > ? and
any t > 0,
and hence,
E[etX]
Pr(X > a) = Pr(etX > eat) --H			eat
E[etX]
Pr(X > a) <			mint eat
Hence, at the heart of such estimates are the simple calculations associated with
the multiplicative nature of E[e?Xi]. Recall that etX = z190=0 ?x? Consider X2,
for instance. X2 --H (X1 + X2 + . + Xn)2 = Z?i=iX:2 + 2Zi<i1<i2<nXiXj --H
?,?wiXi + 2?1<j1<j2<nXiXj, since X?2 = Xi for Xi ? f0,lt. Similarly, other
higher powers of the Xi's are unnecessary, implying that a form simpler and more
useful than functions of the form fetX : t > 0J might exist. There are many ways
to formalize this. ?7e define, for . = (zi,z2....,zn) c ??, a family of symmetric
multilinear polynomials Sj(z),j = 0,1,... ,n, where S0(z) = 1, and for 1 < j ?
Sj(z) = Zi<i1<i2'<ij<?nZi1Zi2			??j We start with the simple
45
Lemma 6 Suppose I, 2Z? take on binary values. Then for any positive integer
I, (zl+z2''' +zn)? = z7)1????? a?S?(z1 2,.... zn), for some non-negative integers
a1, a2,, . . , ?min(j,n).
The proof of Lemma 6 is trivial and is omitted. The converse of Lemma 6 also is
true; if z = (zi, Z2zn) e fO, lJfl, then for any j, j = 0,1
j			J
uj) E ??+? B(vu,..., vj) Ei : ? uiSi(z) =--H Z vi(z1 + Z2 + ``` + zn)?
i=O
So, the two forms: polynomial of zi + 2 + `,, + Z? and linear combination of
So(z), Si(z),... , Sn(z) are equivalent. Note that if the binary random variables
X1,X2,... , are independent, then EtSi(X1, ?2,.. Xn)] is explicitly available:
E[Sj(X1,X2= Si(Pl,P2,...,Pn), where p? = Pr(X? = 1). This explains
our preference for the Sj'5.
Since the expansion etZ = z?0 kz? converges for all t and Z and since all the
coefficients ? are positive if t > 0, we get
Corollary 2 Let Zl,Z2,... , z? take on binary values. Then for any t > 0, there exist
non-negative reals ao, al,. . , a? such that et(zI+z2+???zn) = Z%0 ajS?(zi,z2,... , zn).
One reason for the use of the mysterious etX function in the CII bounds is the need
for higher moments of X In particular, the moment generating function of X is
defined to be E[etXl = ??o ?E[X?1; its derivatives generate all higher moments of
X. Moreover, the use of moment generating functions embed the problem of attaining
probability estimates in a space rich with algebraic structure and convex inequalities.
(More about the computational aspects of such an alternative approach can be found
in [113].) The need for higher moments is due to the fact that a direct application
of Markov's or Chebyshev's inequality to upper bound Pr(X > E[X] (1 + ?))
leads to weak bounds. Higher moments and exponentials give dramatically better
estimates. But when X is the sum of random bits X1 , X2,Y?, Lemma 6 and
Corollary 2 imply that all the higher moments of X can be linearly generated by
46
(E[Si(A1,X2,... )(n)] : i = 0,1,..., n). Equivalently, they are also generated
linearly by any n higher moments of X.
So, we now consider functions of the form ??? yiS?(Ai,. . . , Xn) where y?,
0, instead of restricting ourselves to those of the form etX, for some t > 0. Indeed,
by Corollary 2, we will be considering a class of functions which includes the class
(etX : t > 0). For any y = (yo,yi,... ,yn) ? ?n++1 and z = (zl,z2 . . . Zn) ?
define fy(z) = Z:ffi--Ho y:-Si(z). With this notation, we can restate Corollary 2 as
Vt>0 ?y E : fy(Xi,X2,...,Xn) = etX. Let a = ?i $6) be assumed to be
integral. Note that for any non-negative integer m, X = Tn iff j)(X1. X2Yn) =
yj (m?) and hence,
a
Pr(X>a) = Pr fy(X1Xn)>?yj
i=O
? E[fy(Xi,...,Xn)]
?:%o yj(a?)
?3%--HoyiSi(pi,p2pn)
(3.1)
z?o yj(a?)
So, our goal now is to minimize this upper bound over (y?,y?,... yn) ? ?+n+1 To
do this, note that Ya+1, Ya+2,.'' , y? must all be set to 0 since they contribute non--H
negative terms to the numerator and nothing to the denominator. Next, note that
(3.1) is minimized by setting y? = 1 if i = j* and 0 otherwise, where j* is the integer
at which Si(p1,P2,. .. is minimized, over the range i = 0,1,... ,a. To get a
better handle on this minimum, we need
Lemma 7 For any i > 0 and s > 0) 5?(zi, z2...., zn) is rnar?rn?ed by setting
Zi =			=			= ?3, when subject to the constraints that (zi, 2,..., zn) E ??+ and
?;=1 zi = 5.
PROOF. Suppose Zp ? Zq for some vectorz satisfying the constraints (zizn) ?
?+? and ?j?i Zj = 5. Then, set 4 = Zp $ e and = Zq --H e for any e --H zp, and
set ??
= Zj for all indices j, j ? (p, q). It is easy to verify that z? satisfies the above
47
constraints, and that Si(z') > 8i(z). Hence Sj(z) is maximized at = z*, where
4 --H for j = 1,2			n.			E)
Inequality(3.1) and Lemma 7 imply that if p _ ZW1m --H ??, then for any y E
gi(flj) p?
Pr(X ? a) ?			.			(3.2)
Since (j+fl1)p?+1/(j+?1) = (n--Hi)p which is less than, equal to, or greater than 1 according
(flj)p?/(?j)
as i is less than, equal to, or greater than 7D$??n (n?)pi1(a?) is minimized at
= h(n,?,?) = ?1 %)?n1 = ?1??1			(3.3)
So, the r.h.s. of inequality(3.2) is minimized at y = y*, where Y:* = 1 if =
and 0 otherwise. Hence, we get
Pr(X > jt(l+?)) < U1(n,p'1,. . .,p'n,6) = 8i?(P1'P2? U?(n? ii.
(?( jI$?))
U1(n,p1,. . . ,pn, ?) is guaranteed to be better than any estimate based on the CH
method, since we have considered a larger class of functions. Also, the upper bound
on U1(n,p1,. . .,p?,?) is better than any such estimate which depends
only on ? and which is oblivious to the actual values of Pl,P2, ,pn; this includes
F(n,?,6) and G(?,6).
But most importantly, note that these new bounds will hold even if X1,X2
are only h(n, ij, ?) -wise independent. This is because each term in 5k ( Xl, X2,. .. , Xn)
is of the form Xj1Xj2... Xik for any integer k, and hence, E[Sk(Xl, X2?n)1 will
be the same for k--Hwise independent Xl, X2as for completely independent
X1,X2,... Xn. Since ?(l + ?) ? n, h(n, ?, ?) ? n; in typical algoritlimic situations,
h(n, tL? ?) ? n. This will be seen to be of great use later on.
Theorem 5 Let bits X1, ..... . , Xn be random with Pr(Xj = 1) = p?, X = ??1 Xi
and jt = h?i = pi. Suppose furTher? that the Xi are k-wise independent for
k> h(n,?,?). Then for any ?> 0,
(ifl)(?)i
48
Pr(X > ?(l +?)) ? Ui(n,pipn??) ?
Furthermore, U?(n,ti,?) ? F(n,?,6) ? G(u.,c). i.e., the CH upper bounds hold
even if the Xis are only h(n,ji,?)-wise independent.
Our results also imply upper tail bounds for r.v.'s with smaller independence than
h(n,tj,
Lemma 8 Let X1,X2X? be binary r.v.'s with X = Z?1Xi and ? = FLY].
Then for any 6>0, Pr(X > ?(l +6)) ? (n?)(ii/n)k7(tt(1t6)), f the Xi's are k--Hwise
independent for any k < h(n,jt,6).
PROOF. Set yi = 0 for i $ k and Yk = 1 in inequality (3.2).
It turns out that U?(n,jt,6) is almost the same as
Theorem 6 Given n random bits X1,X2,.. ,Xn, let X = ?s?i Xi ? = EFY], and
p = j/n. Then for any 6 > 0,
1. If the Xi's are are [ji6??wise independent, then Pr(X > `i(l +6)) ?
where:
=			e6			,			?`6 ? 1;
(1 + 6)1+6)? e ? ?--H??3, f 6 > 1
2. If the Xi's are [1???--Hwise independent, then Pr(X > ji(1 + 6)) ?
3. If the Xi's are = [n6i wke independent, then Pr(X ? jt( 1 --H 6)) ?
F(n, jt, --H6), where:
e???2/?2('??)), f p < 1/2;
--H e?2?'62			f p > 1/2.
PROOF. The first claim follows by setting k = [jt6? < h(n,j,6) in Lemma 8.
The only interesting case is that k < h(n, jt, 6). ?`e apply the inequality
? Q(n, k, a) = a			n???			a
49
valid for any a < n; this inequality follows by induction on k, combined with the
fact that the function (1 --H xl)X?l is flonincreasing for x > 1. Let a (1 t 6)?i and
= ?6. Then,
Qnk'a			= (n7(n--H
(1 t 6)?(1+6)
(ltl6)Ii(1+?) 1+n?$6
(1
= C(?, 6).
It is easy to verify that Q(n,x,a) is nonincreasing for x ? h(n,ji,6) and hence the
bound G(?, 6) established for k' also holds for k = [jt6?. The upper bounds for
G(i, 6) are either straightforward or have been established in [9,tol,s]
The second claim follows immediately from Theorem 5, while the third claim
follows by obtaining lower tail bounds from the upper tail of Z:%i(l --H Xi), and
importing the upper tail bounds established in [50]. By Theorem 5, these boi'nds
hold with independence h(n, jt, 6).			E]
As we will show in Section 3.1.3, bounds almost as good as C(?, 6) and F(n, ?, --H6)
hold with the much smaller independence k = [?62J, when 6 ? 1.
3.1.2 Tail probabilities of bounded random variables
We now show that almost the same results hold for arbitrary r.v.'s which take values
in [0,1]. Analogous bounds for bounded r.v.'s that are constrained to lie in other
intervals can be obtained by a linear transformation of their ranges to [0, 1]. Given
arbitrary r.v.'s Xj such that 0 < Xi ? 1, i = 1,2n, we wish to upper bound
Pr(X > ?(1 + 6)), where X = ??? Xi, tL = E[X] and 6> 0. Hoeffding has proved
upper tail bounds for bounded random variables, assuming full independence among
the Xi's; the main point of interest here again is that partial independence suffices,
giving almost as good bounds as Hoeffding's. Almost all of the work is done by
Lemma 9 Let Zj be real numbers, with 0 ? Zj < 1, i = 1, 2,.. n,. and suppose that
a > 0,1 < Ki and ??=1 Zj > a. Then, Sj(Zl,Z2.,. ..?zn) >
PROOF. ?Ve will just consider the case Z?1 Zj = a; then the upper bound will
directly follow if ???? Zj > a. If 0 < Zp ? Zq < 1 for p # q, then Sj(z) decreases
if we set Zp := Zp --H e and Zq := Zq + e, for any e ? min(z?, 1 --H Zq). So. if &j(z) is
minimized at z* in the domain [0, lin under the constraint that ??1 Zj = a., then
o ? 4 ? 1 for at most one 1 ? _
If a is integral, then 4 ? f0,1t, i = 1,2n, and hence ?.(?*) --H (?a). Other-
wise, suppose a is nonAntegral; let al = [aJ and a? = a --H al. Hence, Zp* = a? for
some indexp, and 4 E f0,lJ for i ? [1,2,n? --H (p?; so,
3j(z*)=			+?2			5)1
and we need to show that this is at least (?a); i.e. that
[aij? + a?j [ai]5?1 > [ai + a2ij = [alj.
where [X]r denotes x(x--Hl). .. (x--Hr+1) and [r]o = 1. This is easfly seen by induction
on j, as follows. Equality clearly holds for 5 = 1. For 5 > 1, [ai]j + a?j [ai]j?1 --H
[ai]j + (5 --H 1)a?[ai]j?i + a2ki1j?i. Since a? < 1, a?[a1]j?1 > a?[a1 + a? --H l]j--Hi and
hence,
[ai]? + a?j kili--H'			>
_ a1([a1 --H 11i--H' + (5 --H 1)a2[a1 --H 11j--H2) +
a?[a1 + ?2 --H 11j--Hi
ind)?P. ai[a1 + ?2 --H 1?j--H1 + a?[ai + ?2 --H 1]j--H1
--H [ai + a?]j.
El
By essentially the same analysis as before, we get
Theorem 7 Given n arbitrary r.v.'s X1, X2,...,Xn with 0 ? Xi ? 1 and E[Xj]
?, let Y			Zt?w--Hi Xi and a			ELNi. Then fXj,X2Xn are k?wise independent
fork> h(n,jt,?), then Pr(X > a(l +?)) ? Ui(n,p1n,?) ? U2(n,sj,c), for
any ?> 0.
PROOF, From Lemma 9, we have that for any a > 0 and for non-ne?ative
yo,yi
Pr(X > a) ? Pr(f?(X1,. . . ,Xn) >
and the rest of the proof follows as before. E)
Remark. The methods of Section 3.1.1 were motivated by the fact that if X is
the sum of n 0--H1 random variables, then any n higher moments of X linearly generate
all the higher moments of X. However, note that if random variables Xl' X2
take arbitrary values in the interval [0, 1] and if X = Zs?'--Hi Xi, then such a result is
not true: in fact, no bound can be put on the number of higher moments needed to
generate all the moments of X. However, the intuition gained from Section 3.1.1 has
helped us obtain a large deviation bound for X, which is as good as the known bound
[50j. This is despite the fact that we have not considered all the higher moments
of X; one of the original motivations for Chernoff to consider E[etX] was that it
generates all the higher moments of X. A possible interpretation of our result of this
subsection is that it pinpoints the "crucial" higher moments.
3.1.3 Redirecting the method
Recall that in Section 3.1.1 we introduced the class of functions So(z), Si(z)Sn(z)
and generalized Chernoff's idea by working with non-negative linear combinations
of these functions. A natural generalization of this is to allow arbitrary linear com-
binations, but the corresponding optimization problem, described below, seems hard
to analyze.
52
Suppose we have n binary random variables Xl, X2?n with Pr(X? = 1) =
and with X = ??? Xj, and we want good upper bounds on Pr( X > ci) where
ci > [??, when the Xj's are k--Hwise independent. As before let
fy(?i,A%X'n) = Zyi5i(Vi,?2
with the further restriction that yj = 0 for i > k + 1 to capture the idea of k--Hwise
independence; note that fy(Xi, X2Xn) is a function of X,
fy(X1,X2Xn)=g?(X)= ?S?)k0X)yj
Now, if g(t) > 0 for t = ....... , n (so that Markov's inequality can l)C applied) and
if g?(b) > g?(ci) for b > a, then
E[gy(X)] _ ZtkwcyiSi(pi,p2pn)
Pr(X > a) < Pr(gy(X) > gy(a)) ? 9y(ci)
can scale the ?i'5 50 that gy(a) = 1 and thus, we get the following linear program
with YO,Yl,.. , Yk being arbitrary real variables.
LP(a,k,p1,p2,...,p?):
Minimize ?2k?0 yiSi(p?,p?,... ,pn) subject to
o+ g?(5)>O,)=o,i
o+ gy(a) = 1, and
o+ > 1, b = a + 1, a + 2,..., n.
We unfortunately have been unable to analytically compute the optimum of this
linear program. However, we now consider an important case where some of the
multipliers are negative, and which is a feasible solution to the above LP; our results
generalize a result of [68,17,81]. ?`e use the kth moment inequality
Pr(IX --H E[X]I > ?EFx'i) < E[IX--HE[x]Ik]
53
which is attributable, in various formulations and generalizations, to Chebyshev,
Markov, and Loeve [51], and has been used to attain probability deviation estimates
for over a century. Note that if X = X1+X2t `tX? where the Xj?s are random bits,
then (x?EF?])k is alinearcombinationofSo(X1,X2Xn). S1(X1,X2
Sk(Xl, X2Xn), with some of the multipliers being negative. `Ae derive
good upper bounds on [IX --H E[x]?k], where X = ?2fl=1 Xi, with the Xi's being
k--Hwise independent random variables which satisfy Xi --H E[Xi] ? 1. yielding bounds
that are better than those given by Theorem 5 and Lemma 8, when k ? h(n, jt. 6)
and 6 < 1. Moreover, the large deviation bounds derived in Theorem 9 for k-wise
independent random variables agree with the simple exponential forms of the large
deviation bounds most often cited for sequences of fully independent Bernoulli trials.
Theorem 8 is similar in spirit and proof to Lemma 4.19 of [68] for identically
distributed Xj and constant k, but the present result is somewhat tighter even in the
case of identically distributed Xj, especially if X = ??1 Xi has small variance. A
slightly weaker form of a special case of one of the inequalities proven in Theorem 8
was also obtained in [17] and some related formulae were given in [81]. The proof
of Theorem 8, as well as related proofs presented elsewhere, is based upon estimates
for the k-th moment of X. Estimates related to ours, but for a more general class of
random variables, were established in [84]. That formulation however, is considerably
more complicated than ours, and is not as tight for the cases specifically considered
here. In particular, Theorem 9 cannot be derived from the bounds in [84] for the k-th
moment. Other related work was done by Gladkov ([43], with later improvements in
[44]). He shows that if Y1, ..... . , Yn are independent r.v.'s with Yj having the same
distribution as Xi and with Y = Y1 + Y2 + + Yn, then as n oc, the convergence
of Y to the normal distribution implies a comparable convergence for X, provided k
is sufficiently large.
Theorem 8 Let X1,... , X? be a sequence of k-wise independent random variables,
that satisfy IXi --H E[Xi]I ? 1. Let X = Zs?wi Xi with EF?] = ?, and let a2FY] denote
54
the variance of??, so that o'2[X] = Z:?=i 0'2 [Vi] (provided k > 2, Lvbich we require.)
Then the following hold for any even
(I)			FoA C > a2[X],			Pr(IX --H			> T) ? v??cosh
_			)6)??			4)
(11) For 2 ? k ? 3(o'2LV])'/3, Pr(IX --H > T) < 2 k$j?F2N'1 k/2,
(III) For C > ma?[k,2FV]t, Pr(IX --H jt > T) < e;i$2
PROOF, ?Ve use the kth moment inequality
E?Ix??Ik1
Pr(IX--HtiI >?T)<			k			(3.4)
For even k, E [ix --H ?k1 = E [(X --H jt Most of our effort will therefore be invested
in estimating the k-th moment of X --H ?. Let pj = E[Xi], for 1 < _ n. Then,
EwV?[t)k1 =
i1+'?n=k ?l,? ?? jki E[(Xj --H pj)?i], (3.5)
Clearly, E[Xj --H pi] = 0 and any term in (3.5) that has some = 1 must be zero.
More generally, IE[Xi --H pi]?I < o'2[Xi] for any ? > 2, since IXi --H pi! < 1 and therefore
jXj --H ? IXi --H pi 12, hence IE[Xi --H pi]?I ? lELVi --H pi]2 = a2)X)]. Thus,
Ekx?tt)k] =
k/2--H1			k/2--He
ii<4<ik			H E[xir?pir]ir
i1+?'?+ik/2??=k			51,			,			r=1
jj>2
k/2--H1			k			k/2--He
51,..			?Sk/2--He			?,			? o'2[Xir]
r=1
jj>2
k/2--H1
z			51?.			k,j?i2?e			(0'2FV])k/2?e			(3.6)
___			(k/2 --H
jj>2
`Recall that cosh() = e+2c??, where e denotes, as usual, the base of natural logarithms.
0.9
Estimate (3.6) comes about because the summation
k/2--He
Z			H a2)?]
r=1
is maximized when all the U2F?c] are equal, by Lemma 7, and hence is at most
______			(o,2[?1)kI2?e
k/2fl--H ?			a2?F?1 k/2?? ?
Let T0,T1,... Tk/2?1 denote the k/2 terms in summation (3.6), hence:
k			(a2Lv])k/2?e
Te =			j?p?e			(k/2
and
z
ii+i2+?+ikI2 e=k 51,j2,
jj>2
(a2[x])k/2
(3.7)
T0 =			_________
There are exactly (k/2$;?I) terms (i.e., sets of assignments for ji, . . ?Jk/2--He) in
T? since ?j? = k and ii > 2. For each such assignment of values, ?kk=I2Ff(jk --H
2) = 2?, hence H?k?127? jk! > 2k/2--He32e (and equality holds only when ? < k/6 and
exactly 2 of the k/2 --H  values equal 3, while the remaining values equal 2). Hence
? (2/9)'(2, 2k 2)' and
Te<			2 --Ht			?a????? C()$2)!,)! T0.
Since k/2+'--Hi (? 2			(k/2)3?			k			k/2--Hi			k3e
2e			k/2--He)! <			(2')! ` wehaveE(?--H?)			<?T0 )) (36a2[xl)'(2)!
Using Taylor's Series for cosh(x) --H e?+2e?' = Zi Jj?!' we see that
2e
_____			3
(2)!			--H cosh
E [(X --H ? )kj < cosh			T0.
Consequently
(3.9)
56
T0 is readily bounded by expanding (3.8) to get T0 =
2k12(k$2)!(a2[x?)k/2 ?`e may
apply a strong version of Stirling's Formula [105]:
(r/e)TvTh,?re1I(12T+1) < r! ? (r/e)TVmirrei/(12r)
which is valid for all r > 1, to bound both k! and (k/2)!. This yields T0 ?
$2 (ko;[xl) k/2 Substituting for T0 in (3.9) gives
E [(x --H ? )kj ? $2cosh 36$3FY]
(3.10)
which establishes the desired bound for E --H ij )kj
Now, estimate (3.6) is an increasing function of a2[X], and the estimate in (3,10)
exceeds (3.6). Therefore o'2[X] can be replaced by any C > o'2[X] in (3.10). The
proof of (I) is completed by applying this estimate to (3.4).
All other bounds are special cases of (I). When k < 3(o'2[x])'13, we use C = a2[X]
in Theorem 8.1 and overestimate cosh ()4ffi23[X] by cosh (fi) ? $2.
(III) is easily veri?ed for k = 2, by applying Chebyshev's inequality:
o'2[X]			2o'2h?]
Th(X-jtI>T)<
T2 --H e213T2
For k > 4 we may replace C by max(o'2[X], kt in (I) and overestimate cosh (47c)
by cosh(k/6). Since cosh(x) <J/$2 for x > 1/2, we get cosh(k/6) ? ek/6/$), and
hence
Pr(?--Hjt>?<
This concludes the proof of estimate (III).			c
We now combine the results of Theorem 8 and Theorem 6 to establish Chernoff-
like bounds [27,50], where the independence k might even be much smaller than the
deviation we wish to bound.
Theorem 9 If X is the sum of k-wise independent r. v. `s, each of which is confined
to the interval [0, 1] with a = E[X], then:
(I)			For? ? 1 and k = [?2?e?'/3J.			Pr(t?? --H >			< e?tkI2i < e
(11)			For? > 1 and k =			Pr(I?V --H ii >			? --H?k/2j < g?[643J
(III) For?>l andk= [??l			Pr(?V--Hjt?>?6ji)? G(?,6)<e-?
PROOF. (I). ?`e apply Theorem S.III with C = jt, and k = [?2?ie1/3J which is
permissible since jt > k and `i > a2[X] for variables in the range [0, 1]. If
is even, this gives a bound of
Pr(IY --H > < e2?3?2? k/2 < e?k/2 < e?[62?13i, since 2e113 <3.
If [62ii/e'/3J is odd, we follow a calculation similar to that above, but use inde-
pendence k =			--H 1. This gives
Pr(!X --H ii > < ?213?2? k/2
Since k > 2 e?k/2(k+1) < e?113 and hence,
k/2 < e?k/2e?k/2(k+1).
< e(k) 1)
Pr(I? --H			>			<			< e?[62'/3J
In part (11) we follow the same iteration as in part (I), but set C = ?? and k --H
or [??/e'/3j --H 1, depending on the parity of [6jt/e'/3J. In part (III) we
reiterate the result of Theorem 6, combined with Theorem 7.
Remark. The proofs of parts (I) and (I II) of Theorem 9 also point out the
relative merits of the basic method (Sections 3.1.1 and 3.1.2) versus its redirection
of this subsection. The basic method of using non?negative linear combinations
of the symmetric polynomials Si gives better probability bounds when ?, the rela-
tive deviation from the mean, is greater than 1: it yields the probability bound of
exp(--HO-(?ln(l +?)t)) in this case. On the other hand, the formulation involving the
kth moment inequality gives a much smaller bound on the amount of independence
needed, when ? < 1
3.1.4
Probability Bounds for Exactly ? Successes under
Limited Independence
Some applications require estimates for the probability that exactly r successes
occur in cases where the occurrence of at least r successes is not too improbable.
The following theorem shows how and when this can be done. It also provides
relative errors, which can be useful for estimating conditional probabilities.
Theorem 10 Let X1,X2,Xn and Y1,Y2,.. . ,Y? be Bernoulli trials with ptvba-
bilities of success E[Xjl = E[Yj] = pj. We let the Yj `s be independent, but only require
the Xj `s to be k-wise independent. Let p(r) = Pr(?? Yj = r), and p?(r) = Pr?(X --H
r), where the subscript k denotes the k-wise independent trials, Let P(r) = ?e>? p(?)
and P?(r) =
(I) Ifr<k then
(fi) ffk>ej+ln(i/p(0))+r+D, then
(III) If k > e? + ln(1/p(0)) + r + D, then
ip?(r) --Hp(r)I ?
1p?(r) --Hp(r)j < e?Dp(r),
--H P(r)i < (1 --H P(r))e?D
(IV) If r > (1 + ?)? + k, then			P?(r), P(r) ? (1 + 6)--Hk,
and hence			IP?(r) --H P(r)1 < (1 + ?)?k
Although (IV) holds for all values of k it is meant to be used for k ? [6j], indeed:
(V)			It'r>?(1+??+k andk> [?jt], then			P?(r),F(r)?G(ti,?),
and hence			IP?(r) --H P(r)1 ?
PROOF, For an arbitrary event A, we may use standard inclusion-exclusion to
estimate the probability of the event [A A [A???i1irj(Xe = 0)]] The probability
p(r) can be expressed in terms of events A = [AjE(iiir)(Xi = 1)], which admits a
simple estimation as follows.
59
= z
z%r
i1<i2<?'?<i
%rz
1=0 ?l<????ir+I
j?fi1 irt
A (Xj=l)
(--H1)1Prk
tr+1<???<ir+?
ir+i?fii ....?r J
Truncating the outer summation at 1 = k --H r introduces an error that is bounded
by the last term of the truncated sum. Let pf(r) and pT(r) denote these truncated
sums, in the respective cases of k-wise and full independence. Since the first k --H r
terms in the outer summation are the same for both fully and k-wise independent ran
dom variables, pf(T) = pT(r). Furthermore, Pr? (AjEtii ikJ(?j = 1)) = H?k=iPij
Hence,
= pT(r) --H (?1)k?r6?			Pij
<...<ikj=1
for some ? E [0, 1], and an identical inequality holds without the k subscripts.
Hence,
Ip?(r)--Hp(r)i?<			i?<..?<:k Hkpij
z
k
Z II pi, is maximized when all Pi, are equal (Lemma 7), and hence,
il<...<ik j=1
(pi+p2+...+pn)k _
(jt/?)k,
and (I) now follows.
Pie
To get multiplicative error bounds, let Qr =			1 --H Pie , and define
il<i2<???<ir e=i
the summation in the error estimate of equation (3.11) by Rk = ?iI<..?<ik H,k=i Pi,.
Observe that Rk is the expected number of size k sets of successes among n trials,
60
so that z successes total accounts for such sets. In the fully independent case.
p(r) = p(O) ii<i%..<ireki ?k$j, = p(O)Qr. Furthermore, Rr ? Qr. and (kT)1?k ?
Rr x 1?k--HT It follows that
p(r)ptk?r
Ip?(r) p(r)I <			Rk ? Rr X Rk?r ? Qrl?k--Hr ? p(0)(k --H
since ?k--Hr ? (kffir) ???? k--Hr k--Hr
--H<
For k > e? --H log(p(0)) + r + D. the factor (?)k?r is bounded by
p(O)(k--Hr)!
k+l0?(P?$))?D--Hr k?T6?10g(p(())) < 6--HD
which establishes (11).
Part (III) is immediate, since P?(r) = 1 --H ?e<rpk(?) and hence,
IP?(r) --H P(r)I < Z IPk(?) --H ?(?)I ? ?p(e)e?D = (1 --H P(r))e?D
Finally, suppose that r = (1 + ?)jt + k. Then by Lemma S,
k
pk(r) <
--H			--H r(r--H1)(r--H2)...(r--Hk+l)
<
(:3.12)
Part (IV) is completed by observing that (3.12) also holds with P(r) substituted
for P?(r). Part (V) is an immediate consequence of Theorem 6. This concludes the
proof of Theorem 10.			ED
It is worth pointing out that parts (I) through (III) of Theorem 10 are not strong
when ? > 7n, since it follows from the work of Linial & Njsan [71] that Pr(X --H
= Pr(Y = )(1 + o(e?2(k??)/v'?)) independently of ji, which gives a much sharper
bound in this case. Also, independent of our work, a result similar to part(I) of
Theorem 10 has been proven by Even, Goldreich, Luby. Nisan & Velickovic' [41]:
they show that tpk(0) --H p(0)I ?
61
Theorem 10, in fact. achieves its greatest strength when ? is small, say ? =
or even ? = 0(1). Such instances are not unusual when pseudorandom integers
are being generated uniformly in the range [0, n] and a successful trial corresponds
to just a few different values. This is precisely the usual circumstance in, for in-
stance, hashing [110,136]. As an example, consider the (uniformkv distributed) ran-
dom placement of n balls among n slots, The expected number of items in slot
1 is just 1. The probability p(O) that no item lands in a given slot is about 1
Theorem 10 shows that if the independence k is e + 1 + r + D, the probability
that exactly r items land in that slot is the same in the k-wise independent case
as in the fully independent case, up to a multiplicative factor of (1 + e?D) or less.
Suppose that, during the placements of balls 1 through m, exactly r balls fall into
slot j for 1 ? 1 ? ? n, r _ m --H 1 + 1. Let xT[t,m] denote this event (with
the dependence upon I understood). The conditional probability that, under k-
wise independence, ball rn + 1 also falls into slot j is Prk(?(m+i,m+i? N'[i?mj)' If we
use one degree of freedom for the Tn + 1-st ball, it will be uniformly distributed
while the previous rn balls will enjoy (k --H 1)-wise independence. We may estimate
the conditional probability as Prk()Q[Ti,m]Ix[lm+1?meli) ? PT(?['m+l?m+l]), Since both
PTk?1(?rUm])
Prk?1(?Tp,mi) and Prk(x[T1?rn]I,x(m+1?rne1]) can be estimated by Pr(X[)?m])(1 + err??i)
where errk?l is the relative error that results from the limited independence, we see
that Prk(x[lm+i,rn+i]Ix[?I,m]) is very close to 1/n, with a relative error that is approx-
imately 2erTk?1. For k > 1 + e + 1 + r + D, the resulting accuracy is about 1 + 2g?D
Thus even with modest independence, this process behaves "as expected" much of
the time; that is, the corresponding conditional probabilities for k-wise independence
are very close to the ones for full independence, in many cases.
3.1.5 How close to optimal are our results?
It is known that the standard Chernoff--HHoeffding bounds are optimal in general
to within a constant factor in the exponent, since we know by the Central Limit
62
Theorem that as n ?. the tail of the scaled sum of i.i.d. r.v. `s tends to the
tail of a normal distribution, and hence we cannot significantly improve the tail
probabilities presented by Theorem 9. However, what about the independence we
get? Can it be reduced further to get the same tail probabilities?
To answer this, we note that the tail probabilities presented by Theorem 9 for
k--Hwise independent r.v. `s are of the form e?C?k However, n k-wise independent r.v. 5
require a sample space of size at least
lk/2i
(o(n/k))tkA2i,
as shown for binary unbiased r.v. `s by Chor, Goldreich, Hastad, Friedman, Rudich
& Smolensky [29] and for general r.v.'s by Alon, Babai & Itai [5]. Noting next that
any nonzero probability in a sample space of size t is at least l/t, we see that to get
a tail probability of the form e?ck, we need at least Q(10g(kfl/?) )--Hwise independence.
Thus, the independence we get cannot in general be reduced by more than a factor
of 0(log n).
However, by using results from the newly developing theory of approxiinating
probabdity distributions (Naor & Naor [85], Azar, Motwani & Naor [11], Alon, Gol-
dreich, Hastad & Peralta [6], Even, Goldreich, Luby, Nisan & Velickovid [41] and
Chari, Rohatgi & Srinivasan [25]), we get optimal results in the case where the Xj's
are binary with Pr(Xj = 1) = 1/2. A sample space X for n-bit vectors was defined
to be k--Hwise &biased by Naor & Naor [85] (see also Vazirani [132]) if
V&ct1,2,...,n),l?Is <k, Pr(Oxi=l)--HPr(Qxi=O)I<?,
jES			iES
where ? denotes the XOR function and Xj denotes the ith bit of an n-bit string x
picked uniformly at random from X. One property of such a sample space is that
v??= 1,2,...,k,V[ii,i2,...it c fl,2,...,n?,Vbib2...be E [0,1)'
Pr(r?1 = bi,xi2 = b2,.. , Xj? = b,) --H
?2eI ?			(3.13)
63
Y is ?--Hbiased if it is n--Hwise ?--Hbiased. Constructions of k--Hwise ??biased sources of
size poly(k, log n, 1?) were presented in [85,61. Such sample spaces have been shown to
have several applications to explicit constructions and to derandomization, mainly
since probabilistic analyses may be expected to be robust under small perturbations
of the probabilities. Now, it is easy to see how our methods can be used to derive
large deviation bounds for xi + r2 + `. + x?; from inequality (3.13), it follows that
for a k--Hwise ?--Hbiased source
k E[Se(x1, ...... , xn)] <
and hence by picking ? ? 7)1, this quantity can at most be double its unbiased value
of (n?) 721 Thus, for a k--Hwise &biased random source with k = h(n, n/2, ?) = n6
and with e = 2--Hk,
Pr(xi > ??2(l +?)) ? 2.U2(n,n/2,?).			(3.14)
i=i
Since such a source can be generated using poly(k, log n) random bits, we see that
this result is optimal as long as k = ?(log n); if k = O(log n), then the probability
space is polylogarithmic in size and should in most cases be dispensable, via brute
force search of the sample space. Similar results hold when the Xj's are binary with
their probabilities of being one being the same negative power of two (not necessarily
1/2), using identical methods.
3.1.6 Upper Tail Bounds for some other Distributions
Suppose we have random bits X1, X2?n with some arbitrary distnbution. Let
X = Zs?w?iXi, and let jt = E[xl. Then, for any a> ji, the methods of Section 3.1.1
yield
Pr(X > a) < ?t%oyiSi(Xi,X2,
yi
The following theorem is immediate.
Xn) V(yo,y1,. ..ya) ?
64
Theorem 11 Given n random bits X1, X2,k'n with X = ??1 Xi and pt --H
ELk'], supposez? is an upper bound on E[Sj(Xi,X2,...,Xn)],j = 1.2Then.
a = ?(1 + ?) for ?> 0,
o+ Pr(X > a) ? Zt%oyizi
Z3%?cyi(?i) V(yo,yI,...ya)c??+I
o+ JfX1,X2k'n are k?wise independent, then Pr(X > a) ? min?=12 k zk
As an example of a distribution which benefits from the above, recall the A-
correlated random variables defined and used in Chapter 2: random bits X1
are defined to be A--Hcorrelated if for all j and for all distinct indices kij, Xj2 At)
E[H?eLiXie] ? AH?eLiE[Xie]; note that Zj ? A(flj)(?In)? in this case. Hence, Theorem 11
directly implies one of the main theorems of Chapter 2, Theorem 1 which states that
if X1, X2,. . . ,Xn are A--Hcorrelated random bits with X = Xj and a =
then for any ?> 0, Pr(X > jt(1+?)) is at most A times any Chernoff?Hoeffding type
upper bound on the corresponding probability had the Xj been independent, with
the same individual distributions, Indeed, it was the work described in Chapter 2
which mainly motivated the methods of Section 3.1.1. Further, the applications
sketched in Section 3.2.2 use Theorem 11.
Theorem 11 helps improve the known upper tail probability bounds for the
hypergeometric distribution, an important source of A-correlated random vari-
ables. Suppose n balls are picked at random without replacement from an urn
containing i? red balls and JV --H i? balls of other colors. Let X be the number of
red balls picked in the random sample, and let p = AI/X. Then for 6 > 0, a special
case of a result of Hoeffding [50] (see Chvatal [30] for another proof) implies that
Pr(X > np(l +6)) ? F(n,nm6).
We prove the following strengthened version of inequality (3.15).
(3.15)
65
Lemma 10 Suppose a random set of n balls is picked from an urn containing Al red
balls and A --H `I balls of other colors. Jf X denotes the number of red baJs picked
p=?ItV,?>Oandfk=h(n,nTh?)=o(X) then
Pr(X > np(1 + ?)) < U2(n.np,6)e??(k2/M) < F(n,np,c)e?O(k2Wti).
PROOF. (SKETcll) Number the balls picked as 1,2n, and let Xi be the
indicator r.v. for the event that ball i was red. Then X = ??1 Xi and
Pr(X > a) ? U3(n,nm?)			E[Sk(Xl,X2,...
where a = [np(1 + ?)]. For distinct indices i1, ?22,k
hence,
E[fljffliXij] = Pr( k			k-1
p\ Xij = 1) = II \-
j=1			i=O
U3(n,np,?)7			A kk?lAl_--H?
U2(n,np,?)			=			?I?i=oN			.)
= \tl(i?(4V 1?x) i) < e?(M??')?ffi1' ?:
which is e?0(k2t?), if k = o(X).			c
Remark. Note that sampling without replacement produces r.v.'s which are
1--Hcorrelated. Lemma 10 gives good improvements over inequality (3.15) in many
interesting cases, e.g., consider the case p = constant, ? = constant, and ?(M0?5+?) ?
n = o(N), for any fixed e> 0.
Also, the CH bounds [27,50,101.8] depend only on jt and not on the actual
values of pi, and give the upper bound F(n, ?, 6) > U2(n, ij, 6). We know for any
6>0 that Pr(X > jt(1 +6)) < &Ti(n,pi,...,pn,6) = S?(pi?...pn)/(??'?6?), where
k = h(n, ?, 6). By Lemma 7, this is maximized by setting pi = p/n Vi, subject to the
constraint that E[X] = ti. But if the values p? are rather different, we get bounds
formally superior to U2(n, ?, 6) and F(n, a, 6); suppose, for example, that a = n/2,
66
pi = ?, i 1,2,... ,n/2, and pj = 1 --H ?. i = n/2 + 1n, where 0 ? ? 1/2.
Then,
U1(n,p1,p2,p?,6)			k
U2(n, ji, 6)			?f(?)=2k(:Z n;2			712? ?i(1 --H
where k = h(n,?,6). Note that f(0) ? e??(k2Ifl) by Lemma 10 and that f(?) can
get arbitrarily close to f(0) since f(.) is continuous.
Remark. This particular result can only increase the constant factor in the
exponent of U2(n,?,6). But, it is a small step towards better understanding of the
dependence of Pr(X > ?(1 + 6)) on n,p?,. . . ,p?, and 6. Similar improvements can
also be made in the case of non?binary r.v. `s. An alternative approach might be to
derive Chernoff-Hoeffding bounds for a sum of Bernoulli trials as a function of the
variance as well as ji, a, and n, as in Siegel [1131.
A final application is to the sem??random source introduced by Santha & Vazi-
rani [107]. A random source which outputs bits Xl, X2Xn is defined to be
&semirandom in [107] if
V? 1/2--H?? Pr(X? = 1I?i,x2,?i--Hi) ?
z.e., the random bits can be correlated, but only to a limited extent, independent of
the past history. Despite its seemingly weak nature, such a model has been shown to
be able to simulate complexity classes such as RP (Vazirani & Vazirani [133]), and
the study of a generalization of this model due to Zuckerman [140] has led to rich
results recently (Nisan & Zuckerman [90], Wigderson & Zuckerman [138]). Noting
that for such a source,
k
E[H Xi?] <(1/2 +
for all k > 1 and for all distinct indices ?i,?2,.. ,?k, we see that
Pr(? Xj > n(1/2 + ?)(1 + 6)) < U9(n n(1/2 + c). 6), V6> 0,
i=1
for an &semirandom source.
Inequality (3.14) shows another application of our techniques.
67
3.2 Applications to Computation
The most striking point of Theorems ? and 9 in our opinion is that bounds as good
as the CH bounds can be obtained with small independence. This implies, for any
analysis that relies on the CH bounds, much weaker requirements on the random
sources used. We now present some further computational applications of the new
results.
3.2.1 Reduced randomness for randomized algorithms
There are known constructions of r.v.'s with limited independence using a small
number of random bits; for example, the construction of [54] and the use of universal
hash functions [24] to generate IF many k-wise independent random elements from
a finite field F using O(klog Fl) random bits, and the result of [5] using coding
techniques [78], which gives a polynomial (in n) time algorithm to construct n
wise independent and unbiased random bits, given O(k log n) independent unbiased
bits for any k, k <n. Combining these with our result on reduced independence for
the CH bounds, we get a major reduction in the amount of randomness needed for
several randomized algorithms.
3.2.1.1 Reduced randomness for random sampling
In random sampling, we have a huge finite universe L? and a subset ?? C U, and we
want to estimate the fraction f* = WI/?UI. Given error parameters ? > 0, the
method used is to pick a random sample S of size N(? ?) from U and output the
fraction f of type W elements in S; N(? ?) must be such that Pr( If* --H fi > 6) ?
This is required, for instance, in PAC learning [129] and in running BPP algorithms.
What falls out of the CII bounds is that N(6, ?) = O(? log(-])) with all the samples
being independent. We can improve this to
68
Theorem 12 Given a un?verse L?. a subset ? C U, and error parameters 6 e > 0
suppose a set S of O(?log(-?)) random samples with O(log(--H?))--Hwise independence
are picked from U. Then, if f* and f are the respective fractions of type Hi elements
in U and S) Pr(If* --H f > 6) < e wiU hold.
PROOF. Consider a randomized algorithm which looks at a random set of sam-
ples S from U, and outputs the ratio f of type Hi elements in S. We now look at
the random properties of S which are required for the claim Pr( If* --H fi > 6) ? e to
hold.
Let SJ = n. In the notation of Theorem 9, we want to claim that Pr( X--HE[X]f >
6'Eh?1) < e where X = nf and 6' = 6/f*. Parts (I) and (Il) of Theorem 9 show that
for a suitable of choice of independence k among the elements of S, Pr( IX --H EF)(] I
can be bounded by e?tk/2i. Hence, it suffices to choose k > 2fln(-?1)i, for
the above probability to be bounded by e. To compute the required sample size
n, we distinguish between the two cases 6' ? 1 and 6' > 1. Then, from part(I)
of Theorem 9, the probability that IX --H FLY] I is greater than 6'F[X] is bounded
by e, if the samples are [6'2E[X]/e113J--Hwise independent and if e?t612E[Xi/3J ?
i.e., if 6'2E[X]/3 > [In(--H?)? This will hold if n62/(3f*) > [ln(1r)?, i.e., if n _
N1(6,e) = %6?fln(--H?)]. If 6' > 1, then a similar analysis using Theorem 9.11 implies
that V2(&, e) = --H63Fn(-?)1 many samples with 2[ln(1?)]?wise independence suffice to
satisfy the error bounds.
Note that since both f* and 6 are clearly bounded by 1, the number of samples
and independence needed in both the above cases can be upper bounded by iV3(6. e) --H
4[ln(?1)] and k* = 2[In(-?1)). Note further that though by Theorem 9 the choice for
the independence that minimizes the error bound is an increasing function of the
sample size, increasing the sample space size when given a fixed independence will
reduce the error probability; a proof of this claim follows. In the proof of Theorem 9,
parts(I) and (II) were derived from part(III) of Theorem 8 with C = E[X]. Note
that in the current problem, C = nf* and T = n6 where n is the number of random
69
samples picked, in the notation of Theorem 8. When the independence k is fixed,
the bound given by part(III) of Theorem 8 decreases with n for these values of C
and T, and the claim follows.
Theorem 8 also allows an estimate for the required size of a sample space with
k--Hwise independent variables, in case k ? 3 [ln( 1c)1
Theorem 13 Given a un'verse U, a subset ?V C U, and error parameters ? e > 0.
suppose that 5 is a sample space of U whose elements are k-wise independent, for
some even k. Then, f f* and f are the respective fractions of of type ?r elements ?n
L)T and 5, then for Pr(f*--Hf > ?) ? e to hold, it is sufficient to chooses > 625k2/k)
for some constant c.
PROOF. Use part(III) of Theorem 8 by setting C = n, where n = S?. c
Theorems 12 and 13 imply "reduced randomness" results for random sampling,
if the universe U has some properties. For instance, if &r is a finite field and if the
field operations can he done in polynomial (in 6' and log(71)) time, then any number
of k--Hwise independent samples from U can be generated from k independent randoni
samples [54,24].
We note, however, that the results of Theorem 12 are not optimal; see [141 for
an optimal construction. The main advantages of our construction are its simplicity
and direct parallelizability: it is not yet known how to parallelize other schemes for
reducing randomness, e.g., random walks on expanders (which is the method used
in [14]). Via weaker bounds on the kth moment, essentially the same bounds for the
random sampling problem have been obtained by Bellare & R?ompel [15].
3.2.1.2 Reduced randomness for oblivious permutation routing
We now show how our results directly imply bounds that match the explicit con-
structions of algorithms with reduced randomness due to Peleg & Upfal [95], for
oblivious permutation routin? on fixed interconnection networks (see also [56.101,
104,128,130]).
70
Given some interconnection network with A' nodes and a permutation 0' : (1
.1..., ,iV], the problem is to route a packet Vj residing at each node i, to its des-
tination o'(i) so that the total time taken is small, Further, the routing must be
oblivious in that the path F?(x) chosen for a packet x must be "`independent" of the
path P?(y) chosen for any other packet J? (see [95] for a precise definition when ran-
domized routing protocols are allowed). Explicit constructions of algorithms with a
spectrum of time--Hrandomness parameters are among the results proved in [95] for the
degree--H4 butterfly network; these are also extendible to other networks (see Karloff
& Raghavan [56] for a protocol for the hypercube with slightly weaker bounds)
Here, we show how our results of Section 3.1.1 directly imply the bounds of [95] for
the hypercube; we believe that similar results should hold for other interconnection
networks.
Consider the implementation of Valiant's two--Hphase scheme [128] (see also Valiant
and Brebner [130]) on a hypercube with V = 2? nodes: (I) Each vertex i picks a
random p(i) E f1, 2iV? as an intermediate destination for Vj, and routes Vj
there; (11) Each packet ?j is routed to its final destination o'(i). We now follow
the discussion of the standard aspects of this from [101]. Assume FIFO queues at
each edge, and that phase(I) routes Vj from i to p(i) by "correcting" its bits from
left to right assuming that the nodes of the hypercube are indexed by n bits, and
that phase(II) "corrects" bits right to left. So, phase(II) is like "running phase(I)
backwards", and so we consider phase(I) alone here. It is shown in [101] that the
time taken for packet Vj in phase(I) is at most
N
nt?
j=1
Hij,			(3.16)
where Hi> = 1 if the paths ? i,p(i) > and < j,p(j) > share an edge in phase(I),
and 0 otherwise (recall that n = log2 N). It is also shown in [101] that if each p(i)
is uniformly distributed in f1, 2,... ,N], then Vi, E[?,N.=1 Hi>] ? n. Here is the
theorem that matches the explicit construction of [95].
71
Theorem 14 There are explicit constructions of oblivious routing algorithms on the
hypercube which, for any T clog X ? T < Vmw (c > 4 is a constant):
1. use O(10gIO??))0)Q)? log f\r) random bits and terminate in T steps with probability
at least 1 --H Q for any 0 < Q < I;
2. use 0(log($0$!o)x)) random bits and terminate in expected time at most T?
PROOF. (SKETCH,) Consider any packet v?; the probability that it takes more
than T/2 steps in phase(I) is at most Pr(?fffi1 Hij > T/2 --H log Y). If the p(i)s
are picked uniformly and in k--Hwise independent fashion, then the I1?? are k --H 1-way
independent, while E[zfJ1 Hiji < log ]V, as before. It follows from our discussion
of Section 3.1.1 and from Lemma S that, if T > E[zfL1 Hq], (i.e., if T > 4log N),
then
Pr( Hij > T/2 --H log V) ? (kffiYl)(IO\tN)k?I _ (log1v)k--H1
(T/27jo1? N)			113k=02(T/2 --Hlog N --H
By picking k = O(?0g\??fflj)/g%???), we can ensure that Pr(z;Y=i Hij > T/2 log N) ?
Q/(2iV) holds. Arguing similarly for phase(JI) and summing up over all i, we get (1-)
above.
For (2) above, we set Q = l/(2iV) and replace T by T --H 1 in (1). Let Tmaz be
a random variable denoting the time taken by the protocol, i.e., the maximum, over
all packets i, of the number of steps taken by packet Vj to reach a(i). Note that
Tmaz < log N + N, from the upper bound (3.16). Also,
Pr(Trnaz > (T - 1)) ?
from (1) above. Hence,
E[Tmarj ? (T--Hl)'Pr(Trnaz <(T--H1))+UogN+N).Pr(Trna>(T?1))
1			1
< (T--H l)(1 --H 21?)+(logX+1V)2?
72
Note that for any k, k-wise independent p(i)s can be generated from k log X random
bits using hash functions [241, since the p(i)s can be thought of as belonging to the
field &F(2?). Hence, we get bounds that match those of [95]
The above example typifies the type of application we expect our methods to
find, i.e., as direct "plugAn's in analyses where the CH bounds are normally used.
3.2.2
The New Formulation and the Method of
Conditional Probabilities
The method of conditional probabilities [100,117] is an important technique
for the derandomization of algorithms; the reader is referred to [101] for details. ?Ve
now show how this method can be combined with the formulation of Section 3,1.6.
This will enable us to derive simple and efficient deterministic polynomial--Htime al-
gorithms from randomized algorithms which can be analyzed using our formulation,
zn a un?ed way.
Given n random bits X1, X2??, can the conditional expectation
E[Sk(X1,?2,.. .,Xn)IXi = &1,??2 = b2,. . .,?Vj = bi]
be evaluated, or at least be given a `?reasonable" upper bound, for any k, any i, ? --H
0,1,2,... , n --H 1, and any ....... bi ? [0, 1t?? If the Xj's are identically distributed,
then it is reasonable to assume that an upper bound Ut on E[fl?r?i??r X1 = b1, ?2 --H
b2= bi] is known for all C, ? = 1,2,... ,n --H i, and for all distinct indices
Xj1, Xj2,. .. , Xje E f? + 1, i +2,..., n?; this is sufficient for the two applications
shown below. Then, if
we can see that
fi I (1<5<i) A (bj= 1)tI=ii,
E[Sk(?I,X2?n)IXi = b1,X2 = b2,. . .??? = bi] ?
min(i1 k
Uk?r.
(3.17)
73
now present two applications where the combination of our formulation and
the method of conditional probabilities leads to fast polynomial--Htime algorithms. via
upper bound(3.17). The first application, to jobshop scheduling, is a "`natural" de-
randomization of the randomized algorithm of [112], faster than the derandomization
techniques of [112] and [97]; this is shown in Section 3.2.2.1. The second application
is to discrepancy theory, and is discussed in Section 3.2.2.2. The ?usual" method of
conditional probabilities for these problems frequently calls for independence among
the random variables corresponding to the bits X1, X2seen above; this is
not the case for these two problems and in general for many other problems,
3.2.2.1 Improved algorithms for packet routing and jobshop scheduling
We now present simpler approximation algorithms for packet routing (Leighton,
Maggs & Rao [69]) and jobshop scheduling (Shmoys, Stein & Wein [112]) which
provide improved approximation guarantees, by using ideas from above. The non-
preemptive jobshop scheduling problem is as follows: given n jobs, n? machines and
a sequence of operations for each job where each operation is assigned to a specific
machine, construct a schedule to run the jobs so that the time taken to process all
the jobs is minimized, subject to: (i) the operations of each job must be done in
sequence; (ii) no operation of any job running on any machine can be preempted till
it is completed, and (iii) a machine can process at most one operation at a time.
One of the results of [69] tackles a special case of this problem; the general case is
handled in [112]. Both these papers give polynomial--Htime algorithms to produce
good approximations to an optimal schedule.
Let Pi be the total time needed for job Jj, and let Pmaz = ntaXic[i?n]Pi. Let H?
be the total time for which machine I?? is needed, and let fimax = maxic[i?m]Hi.
Before an actual schedUe is constructed in [112], a pseudo--Hschedule S is constructed
which temporarily assumes that each machine can work on upto D operations simul-
taneously, where D > 1 depends on the input instance. The pseudo--Hschedule is later
74
used to construct an actual schedule. The only step where randomization is used in
[1121 is during the construction of the pseudo--Hschedule and is the following.
An initial random delay dj E [1,2,..., firnax) is assigned for each job ii. Suppose
that the sequence of operations of job Ji are 0i?1,?? 0??Tj? and that operation O???
takes time t???; then, in the pseudo--Hschedule S, job Ji is scheduled to start at time
di and runs to completion without interruption, i.e., operation 0i?r starts at time
dj + zersil tj e We denote the offset zeril tie by r(Oir). As shown in [69,1121, if the
dj'5 are generated uniformly and independently, then with high probability, every
machine at every unit of time will have (a congestion of) at most D(n, mrnax)
? Iog(n.rnmar)
Ioglog(n.rnmar) jobs scheduled on it for some constant c, where mrnax is the maximum
number of operations in any job. This step is then derandomized to deterministically
compute initial delays leading to a congestion bound of O(log(n. Tnmax)). Linear
programming is used for the derandomization, making this step the bottleneck. This
step is sped up in [97,1191. Here, we get a better congestion bound of D(n, rnrnax) as
opposed to the previously known O(log(n. rnrnax)) bound, with an algorithm which
is more direct than the ones of [97,119], while having time complexities comparable
to theirs.
We assign random initial delays [di E [1,2,..., fl?ax?? uniformly and inde-
pendently to the jobs. Suppose that the operations scheduled on machine iNli are
..... 0rnj, which respectively belong to jobs Ji1, ....... Jimj and take ..... . , trni
units of time. For any machine Ali and time instance t E [1, 2Hrnax + Prnax), we
define ?C[1,rni] te = Hi many indicator r.v.'s Xj(t), I = 1, 2Hi, to analyze the
congestion on machine Mi at time t in S; each of these r.v.'s is an indicator for the
event that a particular unit of time of some operation gets scheduled on ?i at time
tin S, as follows. The index j encodes the time unit and operation: let Ir = ?tT=il te
for r = ....... , m?; then if IT < I ? Ir+i, I represents the I --H Irth time unit of Or
75
as follows.
xj(t) --H
1 if) = Jr + m 1 ? p < tr, and the pth time unit of Or is scheduled for
time t? je., if dir + r(Or) + p --H 1 =
0 otherwise.
It is easy to see that E[X)(t)] ? llm'ar' and that for any positive integer k, the
probability that machine jWi has congestion at least k at time unit t is
pr(z11??Yi(t) > k) ? E[sk(xli(t),.. ?V11?j(t)) ? (nrriax)k
r=1
In addition, if Zr11=?i Xr?(t) > k holds for some time t, then Z11r=?i Xr?(t!) 1> k also
holds for some time t', where t' is one of the starting times of the operations scheduled
on iNli. Further, the starting time of each operation Or is uniformly distributed in
[r(Or), fimax --H 1 + T(Or)]. Hence, for any k, Pr(some machine has congestion at
least k at some time instance) is at most
m mi flmar--Hl+r(Or)
t=r(Or)
which is at most
E[sk?l()(li(t), . . . , YnS?(t))],			(3.18)
Hmaz			1'
m			fimax			Hi			1k--H1
7?1mi flmaz k--H1			fimax
 (k$?1)!
Clearly ??1 rflj < ???maz, hence for k--H1 > k* = !og(nmmar) tor some suitable
Cl1og??g(fl?7fl???)
constant ci, the above probability estimate is less than one. We may now use the
above form as a pessimistic estimator [100] to deterministically set the delays
dj for the jobs one--Hby??ne by the method of conditional probabilities [100,117], to
achieve the congestion bound of D(n rnmaz).
Assume inductively that initial delays d1 = d*?, d2 = ...... , ds = ds* have been set
deterministicallyfor the jobs Ji? J2, . . , J3; the aim is to computed?+1 now. Consider
any machine Mi on which Js+i has at least one operation; let these operations be
As+i,i, A3+1,2,. . . , As+i,aj. Let B3,1, Bs,2,... , Bs,bj be the operations which belong to
76
somejob in fJ1,J2JsJ and which are scheduled on I?j? and let t31,t?2.,. ,t?bi
be the times at which they are scheduled to start on 1I?; these times are known,
since we know the values of d1, J2ds. Let Oi, O2O?? be the operations on
machine ?j that belong to jobs from the set fJs+2 Js+3,' , Jn?. We define, for any
t E ?1, 2Pmar + Hrnaz? and r E (1,2fimart
g(s + 1, it,r) = E[&k?(Xi?(t), ?2?(t)?lii(t))Idl = d*1, .., = d?, d?+1 = ri,
and num(i,t,< Xi,X2Xj >) to be the number of operations from jobs JiJj
that are scheduled on machine iktj at time t, given that d1 = xi, ...dj = xi.
When conditioned on the event d1 = d*1,d2 = d9*,...,ds = d?* d?+1 = A for any
A E (1,2fimazi, upper bound(3.18) becomes
f(s + 1, A) = ?			(? g(s + 1, i,r(As+i4) + A, A) +
i=1			j=1
bj
? g(s + 1,?, tsj,A) +
Cj r(Oj)+llmax--Hl
filg(s+l,i,t,A)).
j=1			t=r(Oj)
Recall that the method of conditional probabilities will set ds+1 = d?+1, where
is the index at which f(s + 1,.) is minimized. Note that f(? + 1, A) can be readily
computed for any A and hence, so can d*sei. To make the computation more efficient,
we use the following observations.
1. Suppose we need to compute f(s + 1, A) for some A, using upper bound
(3.17). Then, for any machine ]Wj which has some operation from job Js+i,
the term n --H in upper bound (3.17) corresponds to the number of oper-
ations of jobs J1,J2,Js+i on machine iklj, i.e., a1 + b1. Hence, for any
t Ei (1,2,...,Pmaz +fimazi, g(s+ l,j,t,A) can be computed in 0(k*) time if
num(j,t,< d*1,d2*,...,?s,A >) is known, since upper bound(3.17) involves a
sum over at most k* terms (recall that k* is an upper bound on the number of
operations scheduled at the same time).
77
2. Suppose inductively that nurn(i,t,? d*1,J*24?* >) is known, for all ma-
chines Mi and for all times t. Let ??s+1 be the number of operations of job J3+1.
We will consider only those machines that have some operation of J?+i; the
number of such machines is clearly at most Ws+1. Hence, given the nurn(?,t,?
d*1 d* d3* >) values, the nurn(., ,< d*1,d9*,. .,ds*,1 >) values need to be
updated for at most Ws+1 machines and for Pmaz + H,nax time units. and can be
done in O(ws+i(Pmaz + Hrnax)) time. Given the num(.,.,? d*1,d9*....,d*3, I >)
values, computation of f(s + 1, 1) takes O(ws+ik*(Prnax + fimar)) time, since
1) can be computed in 0(k*) time, given these values.
3. Suppose we have computed the nurn(., ,< d*1,d2*,...,d*?,A>)valuesandf(s+
1, A) for some value A, and that we need to compute f(s + 1, A + 1). ?? can
proceed to first compute the num(.,.,<4*1,4*2,..., d*s, A+l >)valuesandthen
f(s + 1, A + 1), as follows. Suppose some operation a of ?s+i is scheduled to
run on some machine iwj from time t1 to time t2, when we set 4s+1 = A. Then,
note that
num(i,t',< 4*?, ,4s*,A+1 >) = num(i,t',< 4*?, , 4*s, A >),Vt' ? [ti+1, t2].
Hence, a leaves at most two of the nurn(.,.,< 4*1,42*4?*, A + 1 >) values
different from the corresponding num(.,`,< ..... , 4*s,A >) values. Hence,
num(,.,< 4*1,4?,...,4*s,A+ I >) can be updated in O(?+i) time. It now
follows from arguments similar to those used above that J(s + 1, A + 1) can be
computed in O(k*ws+1) time.
The above observations imply an efficient algorithm to compute 4?+1: induc-
tively maintain the num(,.,< ?1,42*,...,43*,r >) values as r goes from l,2?..
to flmaz, and compute f(? + 1,1), f(s + 1,2),..., f(s + 1, Hrnaz) in that order by
sequentially updating the corresponding nurn values. Since computing 4s*+1 takes
O(ws+ik*(Pmax + flmaz)) time, the total time complexity is O(wk*(Pmar + fimar)),
where w is the total number of operations. Hence, we have
78
Theorem 15 Jnitial delays (di :1 ? i < n1 in the range [1,2,.. Hma?1 for each
job Jj can be sec ?n O((PmaxtHma? )w??J$1?0(g)??????X???) i?me where w is the total number
of operations, such that in the (infeasible) schedule in which every job Jj starts at
time di and runs without interruption, every machine has at most
jobs scheduled on it at any time.
We feet that the above is a natural derandomization of the randomized algorithm
since it sets the delays one--Hby?ne, as opposed to the more complex ways used before.
3.2.2.2 Exact Partitions in Set Discrepancy
Set discrepancy problems [81 are combinatorially important, special cases of which
can model divide--Hand--Hconquer situations; see, e.g., the R?NC edge coloring algo-
rithm of Karloff & Shmoys [57]. Given a finite set X and a family of subsets ? --H
[S1, S2Sn1 of X, the goal is to come up with a ?2--Hcolonng" v : X [0,11 such
that the discrepancy dis?x) max? discj(,v), where disci(v) = [1(ZiESi k(j)) --H
5il/2Il, is minimized. It is known that a 2--Hcoloring v with disc(v) = O(ffNThgn)
exists and can be computed in polynomial (in XI and n) time [8], and that a 2-
coloring ? with disc(v) --H O(A0?5+??ogn) for any fixed e > 0 can be computed in
NC [17,81,85], where A = maxilsil. Using the ideas of Section 3.1.1, we can prove
Theorem 16 Given a finite set X with X even and a family of subsets ? =
[Si,S2,...,5n1 of X such that A = maxilsil, there exists a 2?coThring v* : H
[0,11 computable in polynomial (in IX and n) time such that: (i) disc(v*) --H
O(?/AThm, and (ii) [y ? X : vo+*(y) = 011 = ?[y E X : ?*(y) = 111
PROOF. For the existence proof, we can show that if we pick a random subset
Z C X with Z = IX 1/2 uniformly from the set of all size IXI/2 subsets of X and
set vz(?) = 1 iff y E Z, then Prz(disc(vz) = O(?Aogn)) > 0, as follows. It is
well--Hknown [8] and easily checked via the CII bounds that there is a constant c> 0
such that if v(y) is picked uniformly and independently from [0,11, then for any Si,
79
Fr(disci(,y) > c#Th?Aogn) ? --H?`; hence.
Pr(d?s??) > c)mlogn) = Pr(B? : d?sci(N) > cA-Alogn) ? n			= 1.
fl
Note that f?z(y) : y E X? is a set of seli--Hweakening random bits with parameter
1, i.e., if z is picked uniformly at random from the set of !xI/2 sized subsets of X,
then for any distinct Y1,Y2,???,Yi ?
[Hj?=i,yz(w)]			=			Pr(?z(yi) = Yz(Y2) =			= kz(Yi) = 1)
<			Hj?'=iPr(,kz(Th) = 1)
=			Hj??iE[yz(yj)]
Hence, it follows from Section 3.1.6 that for any Si, Pr(disci(,?z) > c??ogn) ? I
still h0lds, concluding the existence proof.
Assume that jX = m and that X = fi, 2,... , rnJ. To use the method of condi-
tional probabilities to derandomize the above construction, note that for fit, i2ij? C
fi+ 1,i +2,...,ml,
E[U?e?i,yz(?e)i,yz(l) =xz(r) = br] = undefined, if r1 > rn/2 or r2 > m/2
where
0, if r? +j > rn/2
(mmiA?iJ$j)
(mtfflffiri)			otherwise
jf (1 ?			? r) A (be = 1)11 =
and r2 = r --H r1. This can now be derandomized, by our initial discussion in Sec-
tion 3.2.2.			E)
3.3 Conclusions and Open Problems
In this chapter, we have presented a simple method which shows that the CH upper
bounds hold even under (a suitable amount of) limited independence, via a new
80
technique. \?\?e then extend this method somewhat, and our combined results yield
good tail probabilities for some other distributions too. The main computational
application of this work, in our opinion, is the reduction in randomness we get
for any analysis which requires the CH bounds, due to the sufficiency of limited
independence. We expect to see more such applications in future.
An interesting open problem is to analytically solve or approximate well, the
linear program introduced in Section 3.?.3. Techniques and tools from approximation
theory have recently been put to good use to solve (or at least to approximate quite
well) similar problems (Linial & Nisan [71], Paturi [94]). Can similar methods help
here? Another open problem is to resolve the optimality of our results, though we
cannot expect to get significant returns here as shown in Section 3.1.5.
Chapter 4
Randomness-Optimal Unique
Element Isolation, with
Applications to Perfect Matching
and Related Problems
Given a graph G, a matching in 0 is a subset of the edges such that no two edges
in the subset have a common endpoint. A max?mum matching in 0 is a matching
in 0 of maximum cardinality, and a perfect matching is a special type of maximum
matching, in which each vertex is covered by an edge, ?.e., each vertex is matched
Finding a perfect matching, if any, in a graph is a fundamental problem in combina-
torial optimization. While its sequential complexity has been well-studied (Lovasz
& Plummer [74]), it is a major open question whether a perfect matching can be
found or even detected in NC. Considerable effort has been devoted towards devel-
oping parallel algorithms for the perfect matching problem. For instance, sublinear
time parallel algorithms for general graphs (Goldberg, Plotkin & Vaidya [46], Vaidya
[127], Grover [48]) and for bipartite graphs (Goldberg, Plotkin, Shmoys & Tardos
[45]) are known. NC algorithms for special classes of graphs: cocomparability graphs
81
82
(Kozen, Vazirani & Vazirani [67]), strongly chordal graphs (Dahlhaus & Narpinski
[33]), graphs with polynomially maQv perfect matchings (Grigoriev & Karpinski [47]),
and planar bipartite graphs (Miller & Naor [80]) have also been developed. A suc-
cessful approach for the general problem has been to use randomness: both Monte
Carlo RNC algorithms (Karp, Upfal & Wigderson [61], Mulmuley, Vazirani & Vazi-
rani [83]) and Las Vegas RNC algorithms (Karloff [55]) have been developed. This
chapter describes joint work with 5. Chari and P. Rohatgi [26]
In this chapter, we investigate the parallet randornness co?piexity of perfect
matching and related problems, i.e., the amount of randomness required to solve
them in parallel. For perfect matching we present an RNC2 algorithm which uses
O(log Z) random bits where Z is any given upper bound on IMat(C)I, the number of
perfect matchings in the graph C. This improves and generalizes the result of Grig-
oriev & Karpinski [47] who give an NC3 algorithm when Z is polynomially bounded.
Since 1Mat(C)i is at most (2m/n)?, the worst case randomness complexity of our
algorithm is O(n log(rn/n)) which improves on the previous bound of O(rn log n) due
to Mulmuley, Vazirani & Vazirani [83]. Thus by linking the randomness complexity
of this problem to IMat(G)I, we unify and generalize previous results, while also im-
proving on them. In special cases, e.g., K3,3--Hfree graphs, IMat(G)I can be computed
in NC2 (see Vaziraui [134], whose work was motivated by that of Nastelyn [63] and
Little [73]); in general, if no good upper bound on Mat(G)1 is known, our results yield
an 1?NC? algoritlim to find a perfect matching which uses at most 0(log iMat(G)
random bits with high probability, if IMat(G)I > 0. This is a significant reduction
if 0 < IMat(U)I ? (2rn/n)?. Our results are obtained by a randomness?optima?
generalization of the Isolating Lemma of [83]. Since the Isolating Lemma has several
applications such as to basic problems on linearly representable matroids, variants of
matching and structural complexity theory, our generalization reduces the number
of random bits needed in these settings as well.
Given a set S = [xi,x?,. . . ,xNt and an unknown family ? C 2?. the Isolating
83
Lemma of [83] shows that if the Xj'5 are independently assigned weights uniformly
at random from the range [I, 2,. . ,2vt, then with probability at least 1/2 the
minimum weight set in ? is unique. This clearly requires O(N log N) random bits
and it was left open whether this independence is necessary. We generalize this as
follows: given an upper bound Z on ??, we assign polynomially bounded weights
to the Xj'5 using only 0(log Z + log V) random bits and achieve the same result.
In the worst case = 2N), our scheme needs 0(N) random bits as compared to
O-(ivlog N); for smaller Z, we get better randomness complexity. This settles the
open question of [83] by showing that independence is not necessary. Our scheme
provides an explicit construction of a collection of (Yz)0(') weight assignments such
that, for every family ? of size at most Z, at least one (in fact, at least 50%) of these
assignments makes the minimum weight subset in ? unique. Such a collection of size
O(NZ) is guaranteed to exist by applying Adleman's technique [1] to the original
Isolating Lemma, and thus this is one of the few instances where Adleman`s result
can be made constructive with only a polynomial blowup in size. Furthermore, we
prove a matching lower bound for our generalization, i.e., any scheme which assigns
polynomial weights to the Xj'5 must use ?(log Z+log iv) random bits even to achieve
isolation with only nonzero probability.
As mentioned earlier, our Isolating Lemma directly yields randomness?efflcient
algorithms for perfect matching. We get an RNC2 algorithm to find a perfect match-
ing in a graph G which uses O(log Z) random bits, where Z is a given upper bound
on 1Mat(G)1. We also obtain randomness?ffident algorithms for the exact matching
problem. Matroid intersection and matroid matching are generalizations of bipartite
and general graph matching, and the Isolating Lemma has been used to obtain RiVC2
algorithms for these basic problems on linearly representable matroids (Narayanan.
Saran & Vazirani [86]). Here, by obtaining good bounds on the size of the family on
which the Isolating Lemma operates, we obtain significant savings in the number of
random bits required. The randomness complexities of our algorithms for matroids
84
when specialized to the case of matching, yield precisely our results on matching.
In particular, we give A?C2 algorithms for these problems on matroids under certain
restrictions which in the case of graph matching correspond exactly to a polynomial
bound on iMat(G)?. This generalizes the results of [47] to matroids.
Using the generalized Isolating Lemma we obtain an alternative to the Valiant &
Vazirani random reduction [131] from 5AT to USAT. The randomness complexity
of our reduction is logarithmic in the number of satisf,7ing assignments arid in the
worst case matches the best-known bound of 0(n) (Tarui [121]). This directly gives
new proofs of the results that FewP, where the number of satisfying assignments is
polynomially bounded, is contained in ?F (Cai & Hemachandra [22]) and in GP
(Nobler, Sch6ning, Toda & Toran [64]).
4.1 The Generalized Isolating Lemma
?Ve now present a randomness??fficient generalization of the Isolating Lemma [83]
and prove a matching lower bound for this problem.
Definition A set system (S, ?) consists of a finite set S = [x1,... , xyj and a
family ? of subsets of S, i.e., ? = ....... , 8k? where Sj C 5. A weight assignment
w = (w1w?? to the elements xI,. . , xN, extends naturally to sets in F with
= ?XjESj Wj.
Isolating Lemma [83] Let (5, ?) be any set system. If the elements Xj of 5
are assigned random weights uniformly and independently from [1,22AJ then,
with probability at least 1/2, there will be a unique minimum weight set in ?
Two features make the isolating lemma widely applicable. First, the isolation
process works for arbitrary and unknown families F and second, isolation is achieved
by assigning only polynomial weights. For example, in the bipartite perfect matching
85
problem where S is the set of edges and ? is the collection of perfect matchings,
the first feature allows isolation to be done in RA?C without any knowledge of the
matchings and the second ensures that a matching can be found by inverting a matrix
with polynomial sized entries [83].
4.1.1 The New Isolation Scheme
\?`e prove a randomness?fficient generalization of the Isolating Lemma which also
assigns polynomial weights and which depends only on any given upper bound Z
on the size of the unknown family ?. The motivation for this generalization is that
in several applications, e.g., perfect matching and matroid intersection, Z ?
The original lemma uses O-(N log iV) random bits since weights are assigned inde-
pendently and it was left open in [83] whether independence was necessary. Since our
generalized lemma uses O(log Z + log N) random bits, this shows that independence
Is unnecessary, even when ? = 2N
Lemma 11 [Generalized Isolating Lemma] Let (5, ?) be any set system and
let Z be a given upper bound on the size of the unknown family . There ? a s'mple
scheme which uses 0(log Z + log N) random bits to assign integer weights to the
Xj 5 in the range [0, ?`V?) such that with probability at least 1/4, there is a unique
m?n?mum weight set in ?. (This probabillity of 1/4 can be boosted to any constant
lesser than 1 using results of de Bruijn [34], with a constant factor blowup in the
number of random bits used.)
PROOF. We outline a four-step process for assigning weights to the Xj'5 and prove
that this scheme has the desired properties. At step j, we assign an intermediate
weight w(?).
Since ? is unknown we deterministically assign very large weights in Step 1 so
that every set in ? gets a distinct weight.
Step 1: For each i, set ?`?			2?
Under w?'?, sets in ` have distinct weights in fl,2 ,2X+1?. Clearly, if Z ?
2Nel, the same property should be obtainable with much lower weights. Since ? is
unknown, a deterministic strategy for reducing weights may fail; instead, we use a
randomized strategy.
Step 2: Choose rn uniformly at random from fl,2,. .. ,(2AZ2)2?. For each i.
define w?2? = mod m.
Step 2 requires O(log Z+log 1V) random bits, Under wt2), the Z or fewer sets in,,c'
have weights in the interval [0,min(A'(2NZ2)2,2?+')i, which is a big improvement
for small values of Z. ?`e now claim that with good probability. these weights are
also distinct.
Claim 1 With probability at least 1/2, all sets in ? have distinct weights under ?vt2)
PROOF. Suppose ? = f3iSkt where k ? Z. Consider the (unknown) integer
I = fi ( w(')(5?) w?')(55) ).
1<i<j<k
Clearly the properties of wt') ensure that I $ 0 and that I ? 22NZ2, The following
number-theoretic proposition, together with the Chinese Remainder Theorem estab-
lishes that when m is chosen uniformly from fi, 2,..., (2Nz2)2J, the probability that
m)(I is at least 1/2. Several versions of this proposition appear in the literature; this
version is from [125] and the constant 89 has been improved to 3 by Thrash [122].
Proposition 2 Let L > 89 and let S be any subset of .1..... L2t such that 5 _
-21L2. Then, the least common multiple of the elements of 5 exceeds 2L
We claim that if m/I, then all sets in ? have distinct weights under u?2). if not
then w(2)(5?) --H w(2)(Sj) for some i < j. Then we have m (w(1)(Sj) w(')(Sj)).
S7
which contradicts the fact that m
E]
Remark. We say that Step 2 succeeds if sets in ? get distinct weights under g,(2)
The success probability of this step can be boosted to any constant lesser than 1, since
it follows from the results of de Bruijn [34] that for any fixed p < 1. there exist con-
stants ci and C2 such that if m is picked uniformly at random from f 1, 2,.. .,?vc1 zc2 1
in Step 2, then Step 2 will succeed with probability at least p.
Note that if z ? NC, then w:(2? ? y4c+3 and with probability at least 1/2, all
sets in ? have distinct weights. Later, we use this strong condition to design an
NC2 algorithm to find all perfect matchings in graphs with at most y0(') perfect
matchings. For larger Z, the weights are still too big and in Steps 3 and 4 we reduce
them further. Step 2 assigns q-bit weights to the Xj'5 where q = min(N,log?) ?
rnin(At,4log Z + 2log(2N)). Let t = [q/logN]
Step 3: For each i, write as q-bit number. Split these bits into t blocks of
size log N bits each as shown. Let bi,j be the number in [0 N --H 1] formed by the
bits in block j.
Let be the linear form
bi,t--Hi			bi,1			bi,o
log N
bi4 yj over the variables Yo , Yt--Hi
j=O
Claim 2 If 5tep 2 succeeds then the linear forms wt3)(Sj) where Sj ? are all
distinct.
Proof Assume that Step 2 succeeds, i.e., that all the weights w(2)(S5) are distinct,
where Sj ? ?. Note that each w?? evaluated at Yk --H 9kiogY 0 ? k ? t --H 1, is
exactly w:?2? This implies that each w(3)(Sj) evaluates to the distinct value w(2)(5j)
at Yk --H 9kiogN 0 < k < t --H 1 which implies that the forms w?3)(Sj)'s must be
distinct.			I
Note that each u(3)(Sj) is a linear forms with coefficients in the range [0 X(A --H
1)]. ?Ve will use this property in a crucial way in the analysis of the next and final
step.
Step 4: Choose ro,..., r??1 uniformly and independently at random from
?t,2,... ,1vsJ. For each i, set wf4) to be the evaluation of wt?3? at Yk = rk,
o <k < t --H 1.
We claim that w?4? achieves the requirements of the Generalized Isolating Lemma.
Clearly, Step 4 requires Slog J\? x t = 0(log Z + log N) random bits and since Step 2
has a similar randomness complexity, the overall procedure requires 0(log Z + log V)
random bits. It is easy to check that each w:?4? is in [0, A?7). Since Step 2 succeeds
with probability at least 1/2, by using C --H ?wt3)(S5) S5 E ?? and k = 2 in the
following proposition, we obtain that the weight assignment w?4) achieves isolation
with probability at least 1/4.
Proposition 3 Let C be any collection of distinct linear forms over at most IV
vanables y = yo,yi,??.,yN--Hi with coefficients in fo,l,...,?Vk --H 1?. Choose a
random r = r0,... , rN?1 by assigning each r? uniformly and independently from
[1,2,... ,j'V2k+lJ. Then in the assignment y= r there will be a unique linear form
with minimum value, with probability at least 1/2.
Proof of Proposition: Our proof parallels that of [S3j; we define a variable y? to be
singular under an assignment r to the variables y iff under this assignment there
exist two minimum valued linear forms in C having different coefficients of y?.
For each y?, we upper bound the probability that it is singular. Fix i and
partition C into Nk classes C0,C1,. . . ,CNk?l based on the coefficient of y?, i.e.,
forms in Cj have j as the coefficient of y?. By definition, y? is singular under
an assignment r iff all the minimum valued forms do not come from a unique
class Cj. Assume that r?i) = ..... .,r??1, rielrN?1 have been assigned val
ues a?i) = aoa??1,a??i,Gy?1 and that rj is not yet chosen. Consider the
probability that the choice of rj makes y? singular, conditioned on i'(i) = a?i). Under
this partial assignment, each form f E Cj has weight b + 5r? where I) is completely
determined. Call b the partial weight of f. For each j, let p? be the minimum partial
weight among all forms in Cj. Since any form with minimum weight after the choice
of r? must have had the minimum partial weight in its class, it follows that the prob-
ability that rj makes gi singular is exactly the probability that the minimum valued
element in the list D = [pc,pi + r?,p2 + 2r?PNk?1 + (Ak --H l)rj] is riot unique.
This is at most the probability that under a random r? the values of elements in D
are not all distinct. For each pair of elements in D, at most one choice of r? makes
their values equal. Thus the probability that yj becomes singular by choice of r?
conditioned on ff(i) = a(i) is at most (4k) /N2k+1 ? l/2N, since r? is independent
of r?i).
This upper bound does not depend on the particular choice of J(i) and hence we
get Prob[yj is singular] ? l/2N. Since this holds for each i, Prob[sorne y? is singular]
_ x l/2AT = 1/2. Thus with probability at least 1/2, all ?j'5 are non-singular and
hence there is a unique minimum valued form in C.
Note that the Generalized Isolating Lemma can be implemented in NC2. Finally,
we note that the Generalized Isolating Lemma provides an explicit construction of
many weight assignment functions for the ground set S = fxi, r2
such that for any set system (S, F) with ?i ? Z, at least half of these functions will
produce a unique minimum weight set in ?. For a given ground set S, the number
of set systems (S,?) with ? ? Z is zz. i (27); hence, the technique of Adleman
[1] combined with the original isolating lemma shows nonconstructively that there
exists a set of O(log(?Z. 1 (27))) = O(NZ) many weight assignment functions such
that for any set system (5, ?) with ? < Z, at least one of these functions will
90
produce a unique minimum weight set in ?.
4.1.2 Lower bound for the Isolation problem
We now establish a matching lower bound of ?(log Z+log N) random bits on the ran-
domness complexity of the generalized Isolating Lemma, and in fact for the following
weaker problem, thus showing that the randomness complexity of our Generalized
Isolating Lemma is optimal to within a constant factor.
Generalized Isolation Let S = fxixNl and let k be a constant. Assign
weights to the elements of S in the range [1, 1Vk] such that for any family ? C 2? of
size at most Z, the minimum weight set in ? is unique with nonzero probability.
We first prove a lower bound of ?(log 7) random bits for any randomized scheme
for the above problem. The following theorem shows that at least t + 1 different
weight assignments are needed in order to achieve isolation in all possible families of
size at most 7, where t = min( [?Z2J, [fffi+3iJ) Hence any randomized scheme must
use ?(log t) = ?(log Z) random bits.
Theorem 17 Let fi,f2,... , ft be any collection of weight assignments which assign
weights in the range [1, Bj to the elements of S. If t ? min(?Z2 %??)?3), then there
exists a family F of subsets of S with 1? < 2t < 7, such that the minimum weight
subset of? is not unique under any of these assignments.
PROOF. Given any t weight assignments we explicitly construct the family ? as
follows. First, for each f?, we compute the ?histogram" of the weights of the subsets
of S, i.e., for each nonempty subset X we place a mark above the weight fi(X) in
the histogram for fi; one distinct mark is made for each such subset X.
The weight of any subset is in the range [1, VB]. Initialize, for each i, a pointer
p? to 1 in the histogram for fi. The pointers pi advance according to the following
rules:
91
o+ If there are no marks on weight pi in the histogram of L. then advance p? to
p? + 1.
o+ If there is exactly one mark on weight pi which corresponds to some subset X,
remove the marks corresponding to X from alt the histograms and advance p?
to p? + 1
If more than one pointer can advance, we choose one among them arbitrarily. WQ
continue this process until the pointers cannot move any further. First we argue that
not all marks can be removed since every time a mark is removed, one of the pointers
advances and hence the potential function ? = ?i(pi--Hl) always increases. Since each
pi is bounded by NB +1, ? is bounded by tiVB. By assumption tiVB ? 2N --H3, and
hence when the pointers come to a halt there are at least two marks corresponding
to nonempty subsets Xii and Xj2 on weight pi in the histogram of fi, for each i. The
family ? = tXi1,Xi2I 1 < i ? t? has size at most 2t < Z and for each ?, there are
two nonempty minimum weight subsets, Xj1 and Xi2, of minimum weight pi, under
the assignment f?.			E)
Also when B = N0?'? and t = o(N/log iV), we can construct two sets which
have the same weight under all the assignments fi,f2,... , ft, as follows. Since fi
maps the nonempty subsets of S to the range [1, ]VB], there exists a family A1 of
nonempty subsets of S, all of which get the same weight under fi and such that
1A11 > ffiv?B' Similarly, there exists a family A2 C A1 with A2 > ?%m'?, such that
fj(X1) = fi(X2) VXi, X2 Ei A2, for i = 1, 2. Repeating this, we end up with a set At
of nonempty subsets of S with lAti > (%NBA, such that the minimum weight element
of At is not unique under none of fi,f2,. . . ,ft; since 2N?1 >2 fort = o(iV/log N),
(NB)t --H
our claim holds. Thus, any scheme for Generalized Isolation requires fl(log Z+log Ar)
random bits.
Theorem 17 also gives lower bounds when the elements are assigned superpolyno-
mial weights. In fact, we can prove the same lower bound even when the assignments
92
map subsets of S to an arbitrary linearly ordered set of size B. since the above proofs
use no property of the integers apart from their total ordering. Thus we have
Theorem 18 Let fi,, ... ?ft be a collection of assignments which are functions frnm
2? to a linearly ordered range D. If t ? min([#J, DJ ) or if t = o(A7log(ATD)),
then there exists a family ? of subsets of ? with ??? < Z, such that the minimum
weight subset of? is not unique under any of the assignments.
4.2 Applications to Matching Problems
The Generalized Isolating Lemma immediately yields randomness-efficient algorithms
for several problems related to matching. Here we only consider matchings in bipar-
tite graphs; with more work, identical results can be obtained for matchings in general
graphs [74,83]. The failure probabilities of our algorithms can be made inverse poly-
nomial with only a constant blowup in the time and randomness complexities and
a polynomial blowup in the number of processors using results of Chor & Goldreich
[28]. Our isolating lemma is a general tool to bound the randomness complexity of
a problem in terms of the number of its solutions. However, due to the large poly-
nomial weights used by our lemma, algorithms based directly on the lemma usually
suffer a processor penalty.
Notation \?re denote a bipartitegraph G by 0 = (U, V, ), where U = fui,. . . , u?12)
v = (vi,..., v?12J, and El = m. The phrase `? perfect matching" is abbreviated
p.m.". The determinant of a matrix A is denoted by det(A). The number of p.m. 5
in 0 is denoted by IMat(G)I.
4.2.1 Algorithms for matching
To apply the Isolating Lemma to perfect matching, we let the ground set be the set
of edges and let the family of subsets be the set of perfect matchings of 0, as in [83].
93
By the Generalized Isolating Lemma, given an upper bound Z on IMat((;)I, we can
assign polynomially bounded weights to the edges of G using 0(log Z+log n) random
bits such that there is a unique p.m. of minimum weight, with good probability.
We construct a matrix `I[l..n/2, l..n/2] with M[i,j] = 2?? if (uj, Lj) ? E and
M[i,5] = 0 otherwise, where Wij is the weight assigned to edge (ui,v5); a pm. in
G can now be found by inverting Al via the iVC2 algorithm of Csanky [32], as
shown in [83]. Note further that the additive O(log n) factor in the randomness
complexity can be absorbed by a processor penalty. Also for any graph C = (V. E),
1?Iat(G)1 ? HvEV degree(v), which is at most (2m7n)? since ?vEV degree(v) = 2m.
In the worst case, by letting Z = (2m/n)? we obtain a randomness complexity of
O(n log(m/n)) which improves the previous bound of O(rn log n) [83]. Thus, we get
Theorem 19 There is an RNC2 algorithm to find a perfect matching in a graph 0
(if one exists) which uses O(log Z) random bits, given an upper bound Z on Mat(G)i.
In the worst case, this algorithm uses O(nlog(m/n)) random bits.
A careful analysis of the proof of our isolating scheme when applied to the worst
case where Z = (2m/n)? and m --H 0(n2), shows that the following algorithm also
achieves the same randomness complexity: the edges of each vertex u ? U are
assigned weights pairwise independently in the range [1,rn7] This idea was, in fact,
the starting point for this research.
Perfect matching is a special case of the Exact Matching Problem: given a graph
O with edges colored red and blue arbitrarily and an integer k, an exact matching
is a p.m. of 0 with exactly k red edges (Papadimitrion & Nannakakis [93]). Even
though the problem of finding an exact matching is not known to be in P, the
Isolating Lemma has been used to solve it in RiVO2 [83]! By observing that isolation
isneeded only among the exact matchings, we can link the randomness complexity of
this problem to a bound X on the number of exact matchings and not on IMat(0)I;
here again, we follow the lead of [83]. Suppose we assign a nonnegative integral
94
weight Wij to every edge (uj, Vj) c E. Let y be an indeterminate; construct a
symbolic matrix Afy[l..n/2, l..n/21 such that
i?Jy[i,j1 = y Wjj if (ui,vj) E E and is colored red,
= wq if (Uj, vj) Ei E and is colored blue,
= 0, otherwise.
Then, it is easy to see that the contribution to the coefficient of yk in det(Ai??)
comes precisely from the exact matchings in G and that if there is a unique mini-
mum weight exact matching M*, then its weight equals the highest power of 2, say
p*, which divides this coefficient of yk Furthermore, any edge can be tested for
membership in ?I? by increasing its weight by 1 and testing if the new value of p*
is higher than its old value or not. Hence, this problem can be solved by using our
Generalized Isolating Lemma over the ground set E and the (unknown) family of
exact matchings of G. Since the determinant of a matrix in one variable can be
computed in NC2 (Borodin, Cook & Pippenger [21]), we get
Theorem 20 Given an upper bound X on the number of exact matchings in a graph
an exact matching (if any) in 0 can be found in RNC2 using O(log X) random
bits.
To find a maximum matching in 0, we proceed as follows. Note that 0 has a
maximum matching of size at least k iff the graph 0k = (Vk, Ek), got by adding to 0
an independent set of n --H 2k vertices, each of which is adjacent to every vertex in V
has a perfect matching; it is also straightforward to extract a matching in 0 of size
k, from a perfect matching in 0k ?Ve first make the failure probability of our perfect
matching algorithm at most n+12 and make parallel calls to find a perfect matching
in 0k for k = 0,1,... , n/2, using the same sequence of random bits for each call.
Then, with probability at least 1/2, all the calls will return the correct value and
we can then find a maximum matching in 0. ?`e require O(n log n) random bits for
this, since V?jlog(IE?I/IV??) = O(nlogn) for each k.
95
Theorem 21 A maximum matching in 0 can be found in RNC2 using 0(niog n)
random bits.
In contrast, the best-known randomness complexity for maximum matching so
far is O(n2 log n) [83].
4.2.2
Graphs with a polynomial number of perfect
matchings
Grigoriev & Karpinski [47] consider the problem of finding all the perfect matchings
in graphs with a polynomial bound on Mat(C)i. Using techniques from algebra,
they give an NC2 algorithm for this problem if Mat(O)1 ? l?g()5?? n, and an NC3
algorithm when IMat(C)I < nc for constant c. However, their matching extraction
aIgorithm is limited by their matching detection algorithm and hence they suggest
that log0?5?? n may be a limiting upper bound on IMat(&)I to find all (or even one)
of the perfect matchings in NC2via parallel algebra. We present a simple NC2
algorithm for the detection problem which provides enough information to find all
matchings in NC2 even when 1Mat(C)j < = nc
As before, we let S and ? be E and Mat(C) respectively, Running steps 1 and
2 of the generalized Isolating scheme, we use 0(log n) random bits to assign each
edge a polynomially bounded weight such that all the perfect matchings of G have
distinct weights with probability at least 1/2. Let Wi,j be the weight assigned to edge
(ui,vj). Construct a matrix A with A[i,j] --H 22wij if (ui,vj) e E, and 0 otherwise.
Since every matching has a distinct weight, det(A) gives information about every
p.m., as follows.
Assume that all the perfect matchings indeed get distinct weights. Recall that if
we denote the matrix obtained from Al by removing its ith row and jth column by
Mi1, then
(--H1)?+?det(Mjj)
= det(M)
96
Now, if there is a perfect matching of weight k containing the edge (uj, cj), then
det(AIq) is of the form
k--H1
+22k--H2wij + Z ae22??2wij + ? ac22??2wij,
e=o			e>k+1
where each ac is in f--Hl,o,lJ. Since
this is of the form
k--H1
? ae22e?2wijI ? 22k?1?2wij
+22k--H2wq + ?2k+2?2Wq?0 + 22k?1?2wijx
where flo is a nonnegative integer and 0 ? x ? 1. Similarly, if there is no perfect
matching of weight k containing the edge (uj, vj), then det(AI?5) is of the form
+?2k+2?2Wq?0 + 22k?1?2wijx.
Hence, there is a perfect matching of weight k containing the edge (uj, vj) iff
I(i'vI?')[5,i1det(N1).22wijI
22k?1
is of the form 8ni + y, where ni is an integer 1 < y ? 3. Thus, we have a way to
generate all the perfect matchings of C, since k is polynomially bounded. Since we
use O(log Z + log n) = O(log n) random bits, we can derandomize this by searching
over all points in the sample space in parallel to get
Theorem 22 There is an NC2 atgorithm to detect f a perfect matching exists and
to find alt perfect matchings in a graph 0, when given that INIat(0)I ? nc for some
constant c.
Grigoriev & Karpinski also give a Las Vegas RNC3 algorithm for the prob-
lem of testing if a graph has at most nc perfect matchings, which uses expected
O(n2c+8.5 log n) random bits. ?`e can, using the algorithm of Theorem 22 and by
97
extending ideas of [47,55], improve on the running time and substantially on the
number of random bits used, as follows. Narloff presents an NC2 algoritlim which
calls a perfect matching algorithm (oracle) on n + 1 graphs (each with n --H 1 or n ver-
tices and at most Tn edges) in parallel, and which, assuming that these calls worked
correctly, will announce correctly if 0 has a p.m. or not; in any case, if it ever says
that IMat(G)I = 0, then IMat(G)I is indeed zero. (Actually, Karloff's result is more
powerful--H it works for maximum matchings.) Note that in conjunction with any
Monte Carlo RNC2 algorithm to find a perfect matching (in particular, ours) this
yields a Las Vegas RiVC2 algorithm to decide if 0 has a perfect matching or not. and
if so, to produce one. We first make the error probability of our perfect matching al-
gorithm at most 2(n?+1) (as mentioned in Section 4.1, this can be done in RNC2), and
we now invoke Karloff's algorithm, which will make calls in parallel to our algorithm,
using the same sequence of random bits for each call; the probability that at least one
of these calls produced a wrong result is at most = 1/2. Hence, we can decide
if IMat(0)I > 0 or not correctly, using an expected O(nlog(m/n)) many random
bits. If we conclude that Mat(G) > 0, then we assume that INIat(G)I ? nc + 1,
and run the algorithm of Section 4.2.2 to generate all the perfect matchings of 0;
if 0 has at most nc perfect matchings, then at most nc perfect matchings will be
generated. Thus, we get
Theorem 23 There exists a Las Vegas RIVC2 algorithm to verify if a given graph
0 has IMat(0)I ? nC whose expected randomness complexity is O(nlog(m/n)).
4.2.3 Algorithms with no information on JMat(G)j
The algorithms of Theorem 19 can be used to obtain randomness?ffident algorithms
for perfect matching even when no upper bound on iMat(0)! is given, by using the
worst case upper bound of Z = (2m/n)?. Consider the following improved algorithm:
for i := 0 to [log2log2((2m/n)fl)? do
98
Let Z = 22? and run the algorithm of Section 4.2.1 to find a perfect matching;
exit if a perfect matching is found.
If iMat(O) > 0 then this will find a p.m. with good probabihty when 22? >
1Mat(G)j. If Al = IMat(G)f, the number of random bits used and the time taken
will be O(log `I) and O(log2 n log log V) respectively with good probability, and so
we get
Theorem 24 Let Al = IMat(C)I. There is an RNC algoHthrn to find a perfect
matching in G with no given upper bound on Al, which uses 0(log ?V) random bits
and runs in time O(log2 n log log M) = O(log3 n) with high probability, if i'A > 0; in
the worst case, it uses O(nlog(m/n)) random bits and 0(log3 n) time.
4.3 Other applications
Due to the wide applicability of the Isolating Lemma, our generalized Isolating
Lemma leads to randomness?fflcient solutions to various problems, which we show
below.
4.3.1 Randomized reductions from SAT to USAT
A seminal result in complexity theory is Valiant & Vazirani's randomized reduction
from SAT to &rSAT, the language of uniquely satisfiable Boolean formulae [131].
This reduction is at the core of several fundamental results in complexity theory,
most notably in the results of Toda on the unexpected power of counting classes [124].
Using the Isolating Lemma, an alternative randomized reduction for this problem was
derived in [83]. We apply the Isolating Lemma to get a slightly different reduction
and here, our generalization yields better randomness complexity.
GivenaformulaF(x1,x2,...,x?) we let the ground set Sbe fxi,x2,...,xnl and
let the family ? be the satisfying assignments of F, where a satisfying assignment
is represented by the subset of variables that are true in it. Using our generalized
99
Isolating Lemma, we construct a formula F' such that if F c SAT, then F' E USAT
with probability at least and if F ? SAT. then F' is not satisfiable. as follows,
We construct an iVP machine AI(f(x1, x2xn), ? Wi, LL'2W? >, y), whose
inputs are a boolean formula f over the variables Xl, X2X?, weights Wj for the
Xj'5, and an integer y. Given these inputs, AJ will nondeterministically guess a
satisfying assignment S? for f and accept the input iff the weight of the subset
corresponding to S? is y, under w. Now given F, we first generate random weights
Wi, W2W? as specified by the Generalized Isolating Lemma for (5, ?), and also
pick an integer r uniformly at random from [0,?.... , n8 --H 1]. F' is defined to be
the boolean formula corresponding to M(F(x1,x2,. .. ,xn),< Wl,W2,. ..,W? >,r),
as given by the Cook-Levin reduction. If P ? SAT, then with probability at least
1/2, the Generalized Isolating Lemma will succeed and given this, r is the weight of
the unique minimum weight satisfying assignment with probability at least and
hence, F' ? USAT with probability at least 4. On the other hand, if F ? SAT,
then F' is clearly not satisfiable.
This requires O(log Z + log n) random bits, where Z an upper bound on the
number of satisfying assignments of F. In the worst case this randomness complexity
is 0(n), which improves on the O(nlogn) bound of [83], the O(n2) bound of [131]
and matches the result of Tarui [121]. If the number of satisfying assignments of F
is polynomially bounded, as in the class FewP, we can derandomize this to get a
polynomial number of formulas such that F is satisfiable iff one of these formulas is
uniquely satisfiable. This directly yields simple new proofs of the results FewP C ?P
[22] and FewP C ?P [64]. ?P and G,,? are counting versions of NP where
acceptance is based on the number of accepting paths being odd and exactly half of
the total respectively.
Theorem 25 There is a randomized reduction from SAT to USAT which uses
O(log Z + log n) random bits, where n and Z are the number of variabtes and an up-
per bound on the number of satisfying assignments respectively. Hence, FewP 0 ?P
loo
and FewP C ?P.
4.3.2 Improved parallel randomness complexity for
problems on matroids
The Isolating Lemma has been used to obtain RNC algorithms for basic prob-
lems on linearly representable matroids such as matroid intersection and matching
(Narayanan, Saran & Vazirani [86]). These problems are generalizations of bipartite
and general graph matching. A set system ??I = (S,?) is a matroid if:
9 E
o+ if A C I? C 5 and B ? ?, then A ? ?, and
o+ if A1 ? ? and A2 E ?, then a? ? A2 --H A1: A1 U fx) ?
Every element of ? is called an independent set, A basic consequence of the matroid
axioms is that every maximal independent set is also a maximum independent set,
and the cardinality of any maximum independent set of ?Ni is also called the rank
of jki. A matroid M of rank r over a ground set 5 of cardinality n is tinearly
representable over a field F if there exists a matrix C ? FTxn whose columns are
indexed by the elements of 5, such that a subset of 5 is independent in M iff the
corresponding set of columns of C are linearly independent over F. ]? is linearly
represented if it is presented as the matrix B. Henceforth, the field F is assumed
to be the field of rationals, Q. See, for example, ?`elsh [137] for an introduction
to matroid theory. All the results of [86] and of this section hold onty for linearly
represented matroids. See Camerini, Galbiati & Maffioli [23] for related work on
randomized algorithms for matroid problems.
The matroid intersection problem is to find a maximum cardinality independent
set in both of two given matroids ?jj and 112, each of rank r and over the same
ground set 5 of cardinality n. In this case, we observe that the size of the family
on which the Isolating Lemma operates, as presented in [86], is at most J( (rffih) )2
101
where h is the size of the largest set in Ali fl Al2 and I is the number of sets of size
h in Aj? n 1w2. Since this can be bounded by (nr2)h ? (nr2)r ? n3T, we can use
generalized isolation to reduce the randomness complexity from the previous bound
of O((n + r2)logn) [S6] to O(rlogn), improving by a factor of at least ?(7ii) in all
cases, and by more in general. if h > r --H O(log? n) and there is a polynomial (in
n) bound on I then since I( (rLh) )2 --H n0?'?, we get the first NC algorithms. in the
case of bipartite matching which is a special case of matroid intersection, h = r and
a polynomial bound on I corresponds exactly to a polynomial bound on IMat(G)I.
Thus, this generalizes of the results of [47] to matroids.
Theorem 26 i'latroid intersection for linearly represented matroids of rank r can
be solved in RNC2 using O(r log n) random bits. Jf the cardinality of the maxim urn
intersection is given to be at least r --H O(log? n) and if the number of maximum
cardinality intersections is given to be bounded by a given potynomial of n, then it
can be solved in NC2.
For linearly represented vectors, the matroid matching problem is the following:
given m pairs of vectors over Q2fl, pick the largest number of pairs so that the
vectors picked are linearly independent. For this problem, the RNC2 algorithm of
[86] uses the isolating Lemma on a set family over a ground set of m + (22fl) elements
and at most
m+?(22fl) < (m + 2n2)n
subsets and thus by invoking the Generalized isolating Lemma, we improve the
randomness complexity from O((m + n2)log(m + n2)) [86] to O(nlog(m + n2)).
Tiwari, using sparse interpolation techniques, gives NC algorithms for this problem
when the number of full dimensional solutions (i. e., those in which n pairs get picked)
is given to be polynomially bounded in n and m [123].
Theorem 27 Matroid matching can be solved in RNC2 using O(nlog(m + n2))
random bits.
102
4.3.3 Randomness--Hefficient hash functions
Given finite sets A and B, Carter & W?gman [24] define a family of functions H from
A to B to be k-universal if when a random function f is picked from H, then the
random variables ?f(a) : a ? A? are k--Hwise independent. Their original construction
requires O-(k(log Al + log IBI)) random bits to sample a function from H, and this
is also optimal in general. Universal hash functions have wide-ranging applications.
However, the known constructions have the drawback that even in cases where we
only have some small but arbitrary subset of A to be hashed, they still require
?(k(log IA + log IBI)) random bits. This is rectified by using ideas from step 2 of
the Generalized Isolating Lemma. The following theorem has already been proved
earlier (Dietzfelbinger, Gil, Niatias & Pippenger [35]), but we present here a proof
for conciseness.
Theorem 28 Given finite sets A and B an unknown subset C C A and an upper
bound iW on we can construct, using O(log log Al +log `I) random bits a family
of hash functions H from C to B which is k--Huniversal with probability at least 1/2
and which is samplable with O(k(log log lAl + log IBI + log Al)) random bits.
PROOF. Since each element of A can be indexed by a subset of [1, 2,..., [log2 lAli 1,
any C c A with ?Cj < 1W can be thought of as a family of at most 1W subsets of
the ground set [1,2,..., [log2 lAl? t. By applying steps 1 and 2 of our General-
ized Isolating Lemma to this set system, we effectively get weights in the range
to, .... . , (Miog2 lAl)?? to each element of C (where a is a constant) such that no
two of these weights are the same, with probability at least 1/2. \?`e can thus index
each element of C by a distinct element of [0,1,..., (Al log2 lAl)?? and then pick a
k--Huniversal family from [0,1,..., (iWlog2 Al)?? to B.
As an application, consider the situation where any n jobs from a large (2n0(I)
sized) universe may compete for a resource at a given time and one of them must be
selected at random. The jobs are isolated from each other but can read from a shared
103
random source. Since the Ifis of the n currently competing jobs are unknown, the
random source cannot be used to sample a global process ID because it is unlikely
that any competingjob will have that ID. The usage of existing schemes (e.g., Rabin?s
lemma [99]) requires n independent random sources and ?(n) random bits. Vsing our
hash fundions with k = 2, we get an optimal solution which uses 0-(log n) random
bits and succeeds with probability at least 1/4 --H e for any fixed e > 0, as follows.
Theorem 29 Given any n objects from a un?verse with upto 2?0(1) objects, a unique
wlnner among the objects can be picked with probability at least 1/4 --H ? for any fixed
e > 0, using only 0-(log n) random bits from a common random source and with no
other means of communication.
PROOF. In the notation of Theorem 28, let A = U, B = f1,2,. . .,2n], C be
the (unknown) set of competing processes P1?P2p?, and k = 2. Pick a hash
function h : C H B as outlined in Theorem 28, and define random bit Xi to be 1
iff h(pi) = 1, for i = 1,2n. We now claim that assuming that the constructed
family H : H BJ is indeed 2?universal, then exadly one Xi, i = 1,2n, will
be 1 with probability at least 1/4; once this event occurs, the corresponding process
can be allotted the resource. Assume that H is indeed 2--Huniversal. Then given this
Pr(?! i : Xi = 1) =
1A1 Pr(Xi =1)Pr AXj=OIXi=1
j$i
(due to 2?umversality)
A1 Pr(X? = 1) 1 --H ? Pr(Xj = liXi = 1)
j$i
Z Pr(Xi
1$D)?Y1)
1/4.
Now since step 2 of the Generalized Isolating Lemma can be made to succeed with
probability arbitrarily close to 1, this whole scheme can be made to succeed with
104
probability at least 1/4 --H ?, for a?v fixed ? > 0. The randomness complexity of this
algoritlim is 0-(log n), by Theorem 28; this is clearly optimal to within a constant
factor since [log2 n] random bits are necessary to index all of the n processes. E)
The motivation for this algorithm came from Rabin's probabilistic lemma [99].
4.3.4 Conclusions and open problems
The main contribution of this chapter is our Generalized Isolating Lemma. For
various problems solvable via randomness in general [83,86,131] and deterministically
when the number of solutions is small [67,47,22]., this lemmabounds their randomness
complexity in terms of the number of solutions. Our results span the spectrum of the
number of solutions, imply, and sometimes improve the previously known results at
the extremes. On the other hand, our lower bound for isolation implies that attempts
to solve these problems deterministically based on isolation must also exploit their
structure, rather than view them merely as abstract unknown collections of sets.
A direct open problem falling out of the Generalized Isolating Lemma is to reduce
the magnitude of the weights assigned by it to the elements of the ground set; cur-
rently, they can be as high as AT7, In the context of matching, this wifl lead to more
processor??fficieut RNC algorithms. Another interesting problem is to see if there
is a result analogous to what we get for perfect matching, for maximum matching-
given an upper bound ?? on the number of maximum matchings in a graph, can we
find a maximum matching in it in RiVC, using 0(log M) random bits?
Chapter 5
Conclusions and Open Problems
In this work, we have added a few tiny drops to the ocean of Large Deviation Theory,
in Chapters 2 and 3, It is our hope that the methods of Chapter 3, in particular,
will be understood better and/or put to more use in future. In Chapter 4, we have
studied the amount of randomness needed for a versatile probabilistic lemma. the
Isolating Lemma.
Several interesting open problems remain, in the areas of randomness-efficient
computation and derandomization. Of the many on which this author's attempts
have been unsuccessful, here are a few.
(i) There are some classical problems for which "almost polynomial-time" solutions
are known: randomized polynomial-time algorithms for them, which require just
poly(log n)--Hwise independence (n is the problem size) are known, e.g., explicit con-
structions of Ramsey graphs (Erdo's [38]) and set-discrepancy in parallel [17,81]. Can
these algorithms be derandomized?
(ii) Our understanding of ??biased random variables is far from satisfactory; see
Section 3.1.5 for the relevant definitions. For instance, if X1, X2,. . . , are binary ?-
biased random variables, then the only known upper bound on F[((Z1ffi--H1 Xi)?n/2)k]
105
106
is
--H ?72)k] ?
i=1
k
tn ?,			(5.1)
where * is the value of E[((Z:?=i Xi) --H n/2)k1 when the Xis are unbiased and
independent. This follows from the simple fact that E[((zw1 Xi) --H n/2)k] is the
sum of ?k terms, each of the form
c Pr(Xi1 = Xi2 =			= Xie = 1)
Can we explicitly construct small ?--Hbiased sample spaces for which we can get much
better guarantees than (5.1)?
(iii) R?ecent breakthrough results on pseudorandom generators for space bounded
computation have shown, in particular, that testing if a given ?nd?rectedgraph on n
vertices is connected, is achievable deterministicatly in 0(log'?5 n) space [87,s8,s9].
Note that this problem is in randomized Logspace (Aleliunas Karp Lipton, Lovasz
& Rackoff [41), and in fact, in zero-error probabilistic Logspace and polynomial time.
ZPP (Borodin, Cook, Dymond, Ruzzo & Tompa [20]). Can the amount of random-
ness needed for this problem be further reduced, making it solvable deterministically
in o(log'?5 n) space? Indeed, is this problem in Logspace?
Bibliography
[1] L. Adleman. Two theorems on random polynomial time. In Proc. JEFF Sym-
posium on Foundations of Computer Science, pages 75--H83, 1978.
[2] L. Adleman and M.-D. A. Huang. Recognizing primes in random polynomial
time. In Proc. ACIl Symposium on Theory of Computing. pages 462--H469,
1987.
[3]
[4]
[5]
[6]
M. Ajtai, J.
LOGSPAC E.
132--H140, 1987.
Koml6s, and E. Szemere'di. Deterministic simulation in
In Proc. AC?? Symposium on Theory of Computing, pages
R. Aleliunas, R. NI. Karp, R. J. Lipton, L. Lovasz, and C. Rackoff. Random
walks, universal traversal sequences, and the complexity of maze problems. In
Proc. JEFF Symposium on Foundations of Computer Science, pages 218--H223,
1979.
N. Alon, L. Babai, and A. Itai. A fast and simple randomized parallel algoritlim
for the maximal independent set problem. Journal of Algorithms, 7:567--H583,
1986.
N. Alon, 0. Goldreich, J. Hastad, and R. Peralta. Simple constructions of
almost k--Hwise independent random variables. Random Structures and Algo-
rithms, 3(3):289--H303, 1992.
[7] N. Alon and J. Spencer. Personal communication. February 1993.
[8] N. Alon, J. Spencer, and P. Erdo's. The Probabilistic Ajethod. ?Viley-
Interscience Series, John Wiley & Sons, Inc., New York, 1992.
[9] D. Angluin and L.G. Valiant. Fast probabilistic algorithms for Hamiltonian
circuits and matchings. Journal of Computer and System Sciences, 18:155--H
193, 1979.
107
108
[10] B. Awerbuch, A. V. Goldberg, XI. Luby, and 5. A. Plotkin. Network decom-
position and locality in distribt'ted computation. In Proc. JEBE Symposium
on Foundations of Computer Science, pages 364--H369, 1989.
[11] Y. Azar, R. Motwani, and J. Naor. Approximating arbitrary probability dis-
tributions using small sample spaces. -Manuscnpt, 1990.
[12] E. Bach. Realistic analysis of some randomized algoritlimsnal of Corn-
puter and 5ystem Sciences, 42:30--H53,1991.
[13] I. Barany and Z. Fiiredi. Computing the volume is difficult. Discrete and
Computational Geometry, 2:319--H326,1987.
[14] M. Bellare, 0. Goldreich, and 5. Goldwasser. Randomness in interactive proofs.
In Proc. JEEF Symposium on Foundations of Computer Science, pages 563--H
573,1990.
[15] M. Bellare and J. Rompel. Randomness efficient sampling of arbitrary func-
tions. Technical Report MIT/LCS/TM-433.b, Laboratory for Computer Sci-
ence, Massachusetts Institute of Technology, July 1990.
[16] B. Berger and L. Cowen. Fast deterministic constructions of low-diameter
network decompositions. MIT-LCS Technical Memo #460, April 1991.
[17] B. Berger and J. Rompel. Simulating (logc n)-wise independence in NC. J.
Assoc. Comput. :?ach., 38(4):1026--H1046, 1991.
[18] M. Blum and 5. Micali. How to generate cryptographically strong sequences
of pseudo-random bits. SIA1'J J. Comput., 13:850--H864,1984.
[19] B. Bolloba's. Graph Theory. Springer Verlag, New York, 1979.
[20] A. Borodin, 5. A. Cook, P. W. Dymond, W. L. Ruzzo, and M. Tompa. Two
applications of inductive counting for complementation problems. SLA?? J.
Comput., 18(3):559--H578, 1989. See also Volume 18, Number 6, page 1283,
1989.
[21] A. Borodin, 5. A. Cook, and N. Pippenger. Parallel computation for well-
endowed rings and space-bounded probabilistic machines. Information and
Control, 58:113--H136,1983.
[22] J-Y. Cai and L. Hemachandra. On the Power of Parity Polynomial Time.
?IaThematical Systems Theory, 23(2):95--H106, 1990.
[23] P. M. Camerini, G. Galbiati, and F. Maffioli. Random pseudopolynomial al-
gorithms for exact matroid problems. Journal of Algonthms, 13(2):258--H273,
1992.
109
[24] J. L. Carter and M. N. ?Vegman. Universal classes of hash functions. Journal
of Computer and Syste?n Sciences, 18:143--H154, 1979.
[25] 5. Chari, P. Rohatgi, and A. Srinivasan. Improved approximations of indepen-
dent probability distributions, Manuscript in preparation. 1993.
[26]
5. Chari, P. Rohatgi, and A. Srinivasan. Randomness-optimal unique element
isolation, with applications to perfect matching and related problems. In Proc.
ACM Symposium on Theory of Computing, pages 458--H467, 1993.
[27] H. Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based
on the sum of observations. Annals of Mathematical Statistics, 23:493--H509,
1952.
[28] B. Chor and 0. Goldreich. On the power of two--Hpoint sampling. Journal of
Complexity, 5:96--H106,1989.
[29] B. Chor, 0. Goldreich, J. Hastad, J. Friedman, 5. Rudich, and R. Smolensky.
The bit extraction problem or t--resilient functions. In Proc. JEFF Symposium
on Foundations of Computer Science, pages 396--H407,1985.
[30] V. Chva'tal. The tail of the hypergeometric distribution. Discrete Mathematics,
25:285--H287, 1979.
[31] A. Cohen and A. Wigderson. Multigraph amplification. Manuscript, 1989.
[32] L. Csanky. Fast parallel matrix inversion algorithms. SL4M J. Comput., 5:618-
623,1976.
[33] E. Dahlhaus and NI. Karpinski. The matching problem for strongly chordal
graphs is in iVC. Technical Report 855-CS, University of Bonn, 1986.
[34] N. de Bruijn. On the number of positive integers ? x and free of prime factors
> y. Proc. Kon. Ved. Akad. PVet.(Indag. Math. 13), A54:50--H60. 1951.
[35]
[36]
M. Dietzfelbinger, J. Gil, Y. Matias, and N. Pippenger. Polynomial hash
functions are reliable. In Automata, Languages, and Programming (JCALP),
Lecture Notes in Computer Science # 623, pages 235--H246. Springer-Verlag,
Berlin, 1992.
NI. Dyer, A. Frieze, and R. Kannan. A random polynomial time algorithm
for approximating the volume of convex bodies. J. Assoc. Comput. Mach..
38(1):1--H17, 1991
[37] G. Elekes. A geometric inequality and the complexity of computing volume.
Discrete and Computational Geometry, 1:289--H292,1986.
110
[38] P. Erd6s. Some remarks on the theory of graphs. Bulletin of the American
i?athematics Society, 53:292294, 1947.
[39] P. Erdo's and M. Kac. The Gaussian law of errors in the theory of additive
number theoretic functions. American Journal of Ilathematic& 62:73S--H742,
1940.
[40] P. Erd6s and J. L. Seifridge. On a combinatorial game. Journal of Combina-
torial Theory, Series A, 14:298--H301,1973.
[41]
G. Even, 0. Goldreich, M. Luby, N. Nisan, and B. Velickovic'. Approximations
of general independent distributions. In Proc. ACM Symposium on Theory of
Computing, pages 10-16,1992.
[42] J. Gill. Computational complexity of probabilistic Turing machines. S4A1 J.
Comput., 6:675--H695, 1977
[43] B.V. Gladkov. Sums of random variables, any r of which are independent.
?at. Zametkii, 32:385--H399, 1982.
[44]
B.V. Gladkov. A central limit theorem for sums of random variables, any r of
which are independent. Diskretnaia ALat., 1:22-28, 1989. English translation
by VA. Vatutin, in Discrete Mathematics and Applications, No. 1 (1991),
pages 73--H79.
[45] A. V. Goldberg, 5. A, Plotkin, D. B. Slimoys, and E. Tardos. Using interior-
point methods for fast parallel algorithms for bipartite matching and related
problems. SLAilI J. Comput., 21(1):140--H150, 1992.
[46] A. V. Goldberg, 5. A. Plotkin, and P. M. Vaidya. SubhnearAime parallel algo-
rithms for matching and related problems. Journal of Algorithms, 14(2):180--H
213,1993.
[47]
D. Yu. Grigoriev and M. Karpinski. The matching problem for bipartite graphs
with polynomially bounded permanents is in ??C. In Proc. JEFF Symposium
on Foundations of Computer Science, pages 166--H172, 1987. See also D. Yu.
Grigoriev, M. Karpinski and M. F. Singer, Fast parallel algorithms for multi-
variate polynomial interpolation over finite fields, SLAM J. Comput., Vol. 19,
1990,1059--H1063.
[48] L. K. Grover. Fast parallel algorithms for bipartite matching. In Proceedings of
the Jnteger Programming and Combinatorial Optimization Conference, pages
367--H384,1992.
[49] J. Hastad. Pseudo-random generators under uniform assumptions. In Proc.
ACM Symposium on Theory of Computing, pages 395--H404,1990.
ill
[50] W. Hoeffding. Probability inequalities for sums of bounded random variables,
American Statistical Association Journal, pages 13--H30,1963.
[51] M. Hofri. Probabilistic Analysis of Algorithms. Springer--HVerlag, 1987.
[52] A. Levin, and M. Luby. Pseudorandom generation from
In Proc. ACM Symposium on Theory of Computing, pages
R. Impagliazzo, L.
one-way functions.
12--H24, 1989.
[53] R. Impagliazzo and D. Zuckerman. How to recycle random bits. In Proc. JEFF
Symposium on Foundations of Computer Science, pages 248--H253, 1989.
[54] A. Joffe. On a set of almost deterministic k-independent random variables.
The Annals of Probability, 2(1):161--H162, 1974.
[55] H. J. Karloff. A Las Vegas RNC algoritlim for maximum matching. Combi-
natorica, 6(4):387--H391, 1986.
[56] H. J. Karloff and P. Raghavan. Randomized algorithms and pseudorandom
numbers. In Proc. AC? Symposium on Theory of Computing, pages 310-321,
1988.
[57] H. J. Karloff and D. B. Shmoys. Efficient parallel algoritlims for edge coloring
problems. Journal of Algorithms, 8:39--H52, 1987
[58] R. NI. Karp. An introduction to randomized algorithms. Discrete Applied
Mathematics, 34:165--H201,1991.
[59] R. M. Karp, N. Pippenger, and M. Sipser. A time-randomness trade-off. In
AMS Conference on Probabilistic Computational Compiexity? Durham, NH,
1985.
[60] R. NI. Karp and V. Ramachandran. Parallel algorithms for shared-memory ma-
chines. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science,
Volume A, pages 871--H941. Elsevier, New York, 1990.
[61] R. M. Karp, E. Upfal, and A. ?Vigd&son. Constructing a perfect matching is
in Random NC. Combinatorica, 6(1):35--H48, 1986.
[62] R. M. Karp and A. Wigderson. A fast parallel algorithm for the maximal
independent set problem. J. Assoc. Comput. Mach., 32:762--H773,1985.
[63] P. W. Kastelyn. Graph theory and crystal physics. In F. Harary, editor.
Graph Theory and Theoretical Physics, pages 43--H110. Academic Press, New
York, 1967.
112
[64]
J. Ko?bler, U. Sch5ning, 5. Toda. and A. Toran. Turing machines with few
accepting computations and low sets for PP. Journal of Computer and Syst?m
Sciences, 44:272--H286, 1992.
[65] D. koller and N. Megiddo. Constructing small sample spaces satisfying given
constraints. In Proc. ACAl 5ymposium on Theory of Computing, pages 268-
267, 1993.
[66] D. C. Kozen. Semantics of probahilistic programs. Journal of Computer and
System Sciences, 22:328--H350, 1981.
[67] D. C. Kozen, U V. Vazirani, and V. V. Vazirani. NC algorithms for compa-
rability graphs, interval graphs, and testing for unique perfect matching. In
Proceedings, FST & TCS Conference, Lecture Notes in Computer Science #
206, pages 496--H503. Springer-Verlag, Berlin, 1985.
[68] C. Kruskal, L. Rudolph, and M. Snir. A complexity theory of efficient parallel
algorithms. Theoretical Computer Science, 71:95--H132,1990.
[69] F. T. Leighton, B. Maggs, and 5. Rao. Universal packet routing algorithms. In
Proc. JEFF Symposium on Foundations of Computer Science, pages 256--H269,
1988.
[70] G. F. Lev, N. Pippenger, and L. G. Valiant. A fast parallel algoritlim for
routing in permutation networks. JEFF `Transactions on Computers, 30:93--H
100,1981.
[71] N. Linial and N. Nisan. Approximate inclusion-exclusion. Combinatorica,
10(4):349--H365, 1990.
[72] N. Linial and NI. Saks. Decomposing graphs into regions of small diameter. In
Proc. ACM/SIAM Symposium on Discrete Algorithms, pages 320-330,1991.
[73] C. II. C. Little. An extension of Kastelyn's method of enumerating the 1--H
factors of planar graphs. In D. Holton, editor, Combinatorial Afathematics,
Proc. Second Australian Conference, pages 63--H72. Lecture Notes in Mathe-
matics 403, Springer--HVerlag, Berlin, 1974.
[74] L. Lova'sz and M. Plummer. AThtching Theory. North--HHolland, Amsterdam,
1986.
[75] M. Luby. A fast parallel algorithm for the maximal independent set problem.
SJA?'I J. Comput., 15(4):1036--H1053, 1986.
M. Luby. Removing randomness in parallel computation without a processor
penalty. In Proc. JEFF Symposium on Foundations of Computer Science, pages
162--H173,1988.
[76]
113
[77] N. A. Lynch. Upper bounds for static resource allocation in a distributed
system. Journal of Computer and System Sciences, 23:254--H278, 1981.
[78] F. J. Mac?Tifliams and N. J. A. Sloane. The Theory of Error Correcting Codes.
North--HHolland, Amsterdam, 1977.
[79]
[80]
[81]
[82]
iK. Niehihorn and U. Vislikin. kandomized and deterministic simulations of
PRAMs by parallel machines with restricted granularity of parallel memories.
Acta Informatica. 21:339--H374, 1984.
G. L. Miller and J. N.aor. Flow in planar graphs with multiple sources and
sinks. In Proc. JEEF Symposium on Foundations of Computer Science, pages
112--H117,1989.
R. Motwani, J. Naor, and M. Naor. The probabilistic method yields deter-
ministic parallel algorithms. In Proc. IEEE Symposium on Foundations of
Computer Science, pages 8--H13,1989.
N. Mulmuley. Randomized geometric algorithms and pseudo-random genera-
tors. In Proc. IFEE Symposium on Foundations of Computer Science, pages
90--H100,1992.
[83] K. Mulmuley, U. V. Vazirani, and V. V. Vazirani. Matching is as easy as
matrix inversion. Combinatorica, 7( 1):105--H1 13,1987.
[84]
S.V. Nagaev and I.F. Pinelis. Some inequalities for the distribution of the sums
of independent random variables. Theory of Probability and its Applications,
22:248--H256,1977.
[85] J. Naor and M. Naor. Small--Hbias probability spaces: efficient constructions
and applications. In Proc. AC? Symposium on Theory of Computing, pages
213--H223,1990.
[86]
H. Narayanan, H. Saran, and V. V Vazirani. Randomized parallel algorithms
for matroid union and intersection, with applications to arborescences and
edge--Hdisjoint spanning trees. In Proc. ACM7SIAAf Symposium on Discrete
Algorithms, pages 357--H366,1992.
[87] N. Nisan. Pseudorandom generators for space--Hbounded computation. In Proc.
ACM Symposium on Theory of Computing, pages 204--H212,1990.
[88] N. Nisan. RL c SC. In Proc. AC1? Symposium on Theory of Computing,
pages 619--H623,1992.
[89] N. Nisan, E. Szem&?di, and A. Wigderson. Undirected connectivity in
O(log1'5 n) space. In Proc. IEEE Symposium on Foundations of Computer
Science, pages 24--H29,1992.
114
[90] N. Nisan and D. Zuckerman. More deterministic simulation in Logspace. In
Proc, ACNI Symposium on Theory of Computing, pages 235--H244, 1993.
[91] A. Panconesi and A. Srinivasan. Fast randomized algorithms for distributed
edge coloring. In Proc. AC? Symposium on Principles of Distributed Comput-
ing, pages 251--H262,1992.
[92]
A. Panconesi and A. Srinivasan. Improved distributed algorithms for coloring
and network decomposition problems. In Proc. AC?'J Symposium on Theory
of Computing, pages 581--H592, 1992.
[93] C. II. Papadimitriou and M. Yannakakis. The complexity of restricted spanning
tree problems. J. Assoc. Comput. i?ach., 29(2):285--H309, 1982.
[94]			that approximate symmetric boolean
Theory of Computing, pages 468--H474,
R. Paturi. On the degree of polynomials
functions. In Proc. ACi? Symposium on
1992.
[95] D. Peleg and E. Upfal. A time--Hrandomness tradeoff for oblivious routing.
SIA1? J. Comput., 19:256--H266,1990.
[96] N. Pippenger and J. Spencer. Asymptotic behavior of the chromatic index for
hypergraphs. Journal of Combinatorial Theory, Series A, 51:24--H42, 1989.
[97] 5. A. Plotkin, D. 13. Shmoys, and E. Tardos. Fast approximation algorithms
for fractional packing and covering problems. In Proc. JEFF Symposium on
Foundations of Computer Science, pages 495--H504, 1991. Expanded version
available as Technical Report 999, School of Operations Research and Indus-
trial Engineering, Cornell University, 1992.
[98] M. 0. Rabin. Probabilistic algorithm for testing primality. Journal of iVumber
Theory, 12:128--H138,1980.
[99]
M. 0. Rabin. N-process mutual exclusion with bounded waiting by 4 log2 N-
valued shared variable. Journal of Computer and System Sciences, 25(1) :66--H75,
1982.
[100] P. Raghavan. Probabilistic construction of deterministic algorithms: approxi-
mating packing integer programs. Journal of Computer and System Sciences,
37:130--H143,1988.
P. Raghavan. Lecture notes on randomized algorithms. Technical Report
RC 15340 (#68237), IBNI T.J.Watson Research Center, January 1990. Also
available as C5661 Lecture Notes, Technical report YALE/DCS/RR-757, De-
partment of Computer Science, Yale University, January 1990.
[101]
115
[102]
R. Raman. The power of Collision: Randomized parallel algorithms for chain
ing and integer sorting. In Proceedings, 10th Annual FST ? TCS Conference,
Lecture Notes in Computer Science # 472, pages 161--H175. Springer-Verlag,
Berlin, December 1990. Also available as University of Rochester CS Dept.
TR 336, Nlarch 1990 (Revised January 1991).
[103] 5. Ramanujan. Collected papers, 1927.
[104] A. Ranade. How to emulate shared memory. Journal of Computer and System
Sciences, 41:307--H326,1991.
[105] H. Robbins. A remark on Stirling's formula. Amer. i?ath. Alonthly, 62:26-29,
1955.
[106] V. R6dl. On a packing and covering problem. European Journal of Combina-
torics, 5:69--H78,1985.
[107] NI. Santha and U. V. Vazirani. Generating quasi-random sequences from semi-
random sources. Journal of Computer and System Sciences, 33:?5--H87, 1986.
[108]
[109]
J. P. Schmidt, A. Siegel, and A. Srinivasan. Chernoff-Hoeffding bounds for
applications with limited independence. In Proc. ACM/SIAAi Symposium on
Discrete Algorithms, pages 331--H340,1993. Expanded version available as Tech-
nical Report TR 92-1305, Department of Computer Science, Cornell University,
October 1992.
J.P. Schmidt and A. Siegel. On aspects of universality and performance for
closed hashing. In Proc. AC?? Symposium on Theory of Computing, pages
355--H366, 1989.
[110] J.P. Schmidt and A. Siegd. The analysis of closed hashing under limited
randomness. In Proc. ACM Symposium on Theory of Computing, pages 224--H
234, 1990.
[111] L. J. Schulman. Small spaces uniform on neighborhoods. In Proc. ACAf Sym-
posium on Theory of Computing, pages 17--H25, 1992.
[112] D. B. Shmoys, C. Stein, and J. ?`ein. Improved approximation algorithms
for shop scheduling problems. In Proc. AC?/SIA?? Symposium on Discrete
Algorithms, pages 131--H140, 1991.
[113] A. Siegel. Toward a usable theory of Chernoff Bounds for heterogeneous and
partially dependent random variables. Nlanuscript, September 1992.
A. Siegel. On universal classes of fast hash functions, their time-space tradeoff,
and their applications. In Proc. IEEE Symposium on Foundations of Computer
Science, pages 20--H25,1989.
[114]
116
[115] NI. Sipser. Expanders, randomness or time vs. space. Journal of Computer
and System Sciences, 36:379--H383, 1988.
[116] R. Solovay and V. Strassen. A fast Monte Carlo test for primality. SL4M J.
Comput., 6(1):84--H85, 1977. See also SlAM J. Comput., Volume 7, 1978.
[117] J. Spencer. Ten Lectures on the Probabilistic AThthod. SlAM, Philadelphia,
1987.
[118] A. Srinivasan. On the distribution of the number of prime factors of integers.
In preparation, 1993.
[119]
C. Stein. Approximation algorithmsfor multicommodityflow and shop scheduP
ing problems. Ph.D. dissertation, Laboratory for Computer Science, Mas-
sachusetts Institute of Technology, 1992. Available as A?IT/LCS/TR --H 550.
[120] E. Styer and G. L. Peterson. Improved algorithms for distributed resource
allocation. In Proc. AC? Symposium on Principles of Distributed Computing,
pages 105--H116,1988.
[121]
[122]
J. Tarni. Randomized polynomials, Threshold circuits and the Polynomial
hierarchy. In Annual Symposium on Theoretical Aspects of Computer Science,
Lecture Notes in Computer Science #480, pages 238--H250,1991.
W. Thrash. A note on the least common multiples of dense sets of integers.
Technical Report #93-02-04, Department of Computer Science & Engineering,
University of Washington, Seattle, February 1993.
[123] P. Tiwari. Parallel algorithms for instances of linear matroid parity with small
number of solutions. Technical Report PC 12766, IBM T.J.Watson Research
Center, May 1987.
[124] 5. Toda. PP is as hard as the Polynomial-time Hierarchy. SL4?NI J. Comput.,
20(5):865--H877, 1991.
[125]
M. Tompa. Lecture notes on probabilistic algorithms and pseudorandom gen-
erators. Technical Report #91-07-05, Department of Computer Science &
Engineering, University of Washington, Seattle, July 1991
[126] P. Turan. On a theorem of Hardy and Ramanujan. Journal of the London
?athematics Society, 9:274--H276,1934.
[127] P. M. Vaidya. Reducing the parallel complexity of certain linear programming
problems. In Proc. JEFE Symposium on Foundations of Computer Science,
pages 583--H589,1990.
117
[128] L. G. Valiant. A scheme for fast parallel communication. SJAAi J. Comput.,
11:350--H361,1982.
[129] L. G. Valiant. A theory of the learnable. Communications of the ACAL
27(11):1134--H1142, 1984.
[130] L. G. Valiant and G. J. Brebner. Universal schemes for parallel communication.
In Proc. AC?I Symposium on Theory of Computing pages 263--H277, 1981.
[131] L. G. Valiant and V. V. Vazirani. NP is as easy as detecting unique sohitions.
Theoretical Computer Science9 47(1):85--H93, 1986.
[132] U. V. Vazirani. Randomness, Adversaries and Computation. Ph.D. disserta-
tion, EECS, University of California at I3erkeley, 1986.
[133] U. V. Vazirani and V. V. Vazirani. Random polynomial time is equal to
slightly-random polynomial time. In Proc. IFEE Symposium on Foundations
of Computer Science, pages 417--H428,1985. See also U. V. Vazirani and V. V.
Vazirani, Random polynomial time is equal to semi-random polynomial time,
Technical Report 88-959, Department of Computer Science, Cornell University,
1988.
[134]
V. V. Vazirani. NC algorithms for computing the number of perfect match-
ings in K???4ree graphs and related problems. Information and Computation.
80:152--H164,1989.
[135] V. G. Vizing. On an estimate of the chromatic class of a pgraph. Diskret.
Anal., 3:25--H30,1964. In Russian.
[136]
M.N. Wegman and J.L. Carter. New hash functions and their use in authenti-
cation and set equality. Journal of Computer and System Sciences, 22:265--H279,
1981.
[137] D. J. A. Welsh. 4Jatroid Theory. Academic Press, New York, 1976.
[138]
A. Wigderson and D. Zuckerman. Expanders that beat the eigenvalue bound:
Explicit construction and applications. In Proc. ACM Symposium on Theory
of Computing, pages 245--H251, 1993.
[139] A. C. Yao. Theory and applications of trapdoor functions. In Proc. JEFE
Symposium on Foundations of Computer Science, pages 80--H91,1982.
[140] D. Zuckerman. Simulating BPP using a general weak random source. In Proc.
IFEE Symposium on Foundations of Computer Science, pages 79--H89,1991.
