\documentclass{article}
\usepackage{fullpage}
\usepackage{enumerate}
\usepackage{amsmath,amssymb, amsthm}
\usepackage{graphicx}
\newtheorem{definition}{Definition}
\newtheorem{lemma}[definition]{Lemma}
\newtheorem{theorem}[definition]{Theorem}
\newcommand{\E}{\mathcal{E}}
%\newcommand{\Pr}{\mathrm{Pr}}
\begin{document}
\noindent\framebox[\textwidth]{\parbox{.98\textwidth}{\textbf{CS
683: Advanced Design \& Analysis of Algorithms} \hfill January 23, 2008 \\
\center{\LARGE Lecture 2} \\[12pt]
\textit {Lecturer: John Hopcroft} \hfill \textit{Scribe: Hu Fu, June}}}\\\\
\begin{section}{Giant components in real world graphs}
\begin{itemize}
\item \textbf{Graph of Protein Interactions} \\
Using data from a paper (\emph{Science}, July 30, 1999, 285, pp751--753)
that recorded 3602 pairwise interactions among 2735 proteins, a
graph was formulated by representing each protein with a vertex and
then connecting two vertices with an edge if the corresponding
proteins interact with each other. The numbers of the connected
components of different sizes are shown in the table blow:
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|}
\hline Size & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 &
$\cdots$ & 15 & 16 & $\cdots$ & 1850 & 1851 \\
\hline Number & 48 & 179 & 50 & 25 & 14 & 6 & 4 & 6 & 1 & 1 & 1 & 0
& $\cdots$ & 0 & 1
& $\cdots$ & 0 & 1 \\
\hline
\end{tabular}
It can be seen that a giant component dominates the graph, while all
other connected components are of considerably small sizes.
\item \textbf{Graph of Papers that Share Authors} \\
A database of papers was used to construct a graph, where each paper
is represented by a vertex, and two vertices are linked if the
papers they represent share an author. A count of the
connected components in this graph shows that, except some small
components of sizes up to~14, the remaining vertices are all part of
a giant component of size~27488 --- a phenomenon similar to the one
found in the graph of protein interactions.
\item \textbf{Graph of Synonyms} \\ Another study was performed on a
large number of words, and a graph of synonyms was obtained. Each word
is present in the graph as a vertex, and two vertices are linked by an
edge if the corresponding words can be used as synonyms in some
context. The sizes of the connected components in this graph are as
follows: 1, 2, 3, 4, 5, 14, 16, 18, 48, 117, 125, 1128, 30242. This
time, there is again a giant component, but unlike the above cases,
there are other components of sizes that are not negligible. \\[5pt]
However, if we see the graphs as growing, where the number of edges is
steadily increasing, then it can be imagined that the giant component
emerges by ``swallowing up'' smaller ones, and that this last graph of
synonyms is on the verge of the appearance of a more dominant giant
graph, where the components of intermediate sizes are yet to be merged
into the giant one. \\[5pt] Experiments and theoretical analysis with
$G(n, p)$ (introduced in the last lecture, where n is the number of
vertices and p is the probability of each possible edge in the graph
existing) verifies this view.
\end{itemize}
\end{section}
\begin{section}{Behavior of $G(n, p)$ as $p$ goes up}
In a random graph $G(n, p)$ where $n$ is arbitrarily large, we
increase $p$ from~$0$, and the following phenomena can be
sequentially observed:
\begin{itemize}
\item $p = 0$: the graph consists of $n$ isolated vertices.
\item $p = {1 \over n^2}$: the expected number of edges in the graph
is one.
\item $p = {d \over n^2},\ (d > 1)$: The expected number of edges
is~$d$, and almost surely, all the components in the graph are of
sizes one or two.
\item $p = {\log n \over n}, \ \ p = {1 \over n^{3 / 2}},
\cdots$, as long as $p \le O({1 \over n})$, there are (almost
surely) only trees of size at most~$\log n$.
\item $p = {d \over n}, \ (d < 1)$: there are a constant number of
components with cycles in them, where ``constant'' means
\emph{independent of $n$}. Almost surely, all components are trees
or unicyclic and are of size at most~$\log n$.
\item $p = {1 \over n}$: a component of size~$n^{2 / 3}$ emerges, and
it is almost surely a tree, because new edges will morelikely appear
in smaller components.
\item $p = {d \over n}, (d > 1)$: a giant component of constant
fraction of vertices appears, and all other components are of size at
most~$\log n$ (so there are no two giant ones). This occurs because
the probability of an edge connecting two large components is high.
\item $p = \frac{1}{4} \frac{\log n}{n}$: the giant component
swallows up the components of intermediate sizes, and the graph is
left with only the giant component and isolated vertices.
\item $p = \frac{\log n}{n}$: all isolated vertices have been
swallowed up by the giant component, and the graph becomes
connected.
\item $p$ is a constant: almost surely the diameter of the graph is two
\end{itemize}
\end{section}
\begin{section}{Exemplary properties of~$G(n, p)$}
\begin{subsection}{Finding cliques in~$G(n, 1/2)$}
If we look at~$G(n, 1/2)$, it is really a dense graph --- the
expected number of edges in it is $n^2 / 2$, while a clique of
$n$~vertices has only $\frac{n(n - 1)}{2}$ edges. So how large a
clique could we find in~$G(n, 1/2)$? \\[5pt]
Finding a clique of size~$\log n$ in~$G(n, 1/2)$ is trivial. We
arbitrarily pick a vertex~$v_1$, and with high probability, it has
at least $n/2$~edges. We arbitrarily pick one of them and the
vertex~$v_2$ that it reaches. With high probability, $v_2$ has at
least $n/4$~neighbors that are also neighbors of~$v_1$. We
arbitrarily pick a vertex again and continue this process. With high
probability we can pick $\log n$ vertices adjacent to each other,
and they constitute a clique of size~$\log n$. \\[5pt]
There is a similar but much harder problem: it can be proved that,
given any $\epsilon > 0$, with high probability there is a clique of
size $(2 - \epsilon) \log n$, but to find such a clique turns out to
be a very hard problem and have implications in P vs. NP.. \\[7pt]
The phenomena that we observed in Section 2 were obtained by
increasing~$p$ while taking arbitrarily large~$n$. A simulation
that, while increasing~$p$, fixes $n$ as a fairly large number (say,
$100,000$), produces results that do not agree with the theoretical
analysis on the values of~$p$ when certain phenomenon occurs. This
is suggested as a problem for a project.
\end{subsection}
\begin{subsection}{Expected number of triangles in~$G(n, d/n)$}
\textsl{Claim:} The expected number of triangles in~$G(n, d/n)$ is
constant as $n$ goes to infinity. \\[6pt]
Two observations give some intuition of the claim: As $n$ increases,
\begin{enumerate}
\item the probability that there is an edge between two fixed vertices $u$
and~$v$ decreases.
\item the number of triples of vertices increases.
\end{enumerate}
The effects of these two observations counter act each other and
result in the constant number of triangles as n increases.\\[5pt]
\begin{proof} Given any three vertices in~$G(n, d/n)$, the
probability that there are edges between each pair of them is $(d /
n)^3$. Therefore,
$$\text{The expected number of triangles} = \binom{n}{3} {d \over n}^3
= \frac{n(n-1)(n-2)}{n} \cdot \frac{d^3}{n^3} \to \frac{d^3}{6}, \
(n \to \infty)$$
\end{proof}
Note that this proof is not confined to triangles, but is applicable
to any pattern that includes $k$~vertices and $k$~edges (e.g. the
rectangles). Additionally, these effects can be seen in graphs of
1000 vertices.
\end{subsection}
\begin{subsection}{Diameter of~$G(n, p)$ ($p$ a constant)}
\textsl{Claim:} When $p$ is a constant and $0 < p < 1$, the diameter
of~$G(n, p)$ is almost surely two.
\begin{proof} If $G$ has a diameter of at least three, then there
exists non-adjacent vertices $u$ and~$v$ such that for all $w \in
V$, $w$ is not adjacent to both $u$ and~$v$. We call such $u, v$ a
bad pair, and show that the expected number of such pairs is zero,
which proves the claim.
Let $X$ be the number of bad pairs. We label all pairs of vertices
by $1, 2, \cdots, \binom{n}{2}$, and let
$$ X_i = \left\{ \begin{array}{ll}
1, & \text{if the $i^{\text{th}}$ pair is bad;} \\
0, & \text{otherwise.}
\end{array} \right. $$
Then $$X = \sum_{i = 1}^{\binom{n}{2}} X_i.$$
$$E[X_1] = \textsl{Pr}\:(\text{a pair $(u, v)$ is bad}) = (1 - p) (1 - p^2)^{n - 2}, $$
$$E[X] = \binom{n}{2} E[X_1] = \binom{n}{2} (1 - p) (1 - p^2)^{n -
2} \to 0, \ (n \to \infty)$$
\end{proof}
Note: the fraction of graphs that fail this is ~$1/n$, which $((1/n) \to 0)$ as $(n \to \infty)$.
Also, ~$G(n, p)$ have average degrees evenlyspread, real world graphs have clustering.
\end{subsection}
\end{section}
\begin{section}{Phase transitions}
\textsl{Definition:} If there exists a $P(n)$ such that, for
$\lim_{n \to \infty} \frac{P_1 (n)}{P(n)} = 0$, then almost surely
$G(n, P_1(n))$ does not have a property, and for $\lim_{n \to
\infty} \frac{P_2(n)}{P(n)} = \infty$, then almost surely $G(n,
P_2(n))$ has the property, we say that $P(n)$ is a threshold for the
property.
\\[6pt]
\textsl{Definition:} If there exists a $P(n)$ such that, $G(n,
cp(n))$ for $c < 1$ does not have a property and $G(n, cP(n))$ for
$c > 1$ has the property, then we say that $P(n)$ is a sharp
threshold for the property. \\[7pt]
\begin{figure}[htp]
\centering
\includegraphics[height=4.5in]{stepfunction.jpg}
\caption{Transition Phase}\label{fig:stepfunction.jpg}
\end{figure}
It is interesting to ask which properties have thresholds and which
have sharp thresholds. Is there a necessary and sufficient condition
for each? This is suggested for a project, and is perhaps an open problem.
\end{section}
\end{document}