CS410, Summer 1998
Lecture 24 Outline
Dan Grossman
Goals:
Applications of DFS:
topological sort
strongly-connected components.
Directed graphs -- all edges one of 4 kinds: tree, forward, back,
cross. Depth-first-search: no cross edges in the forest.
TOPOLOGICAL SORT
Dags often represent dependencies (cycles mean ill-defined).
Eg. getting dressed.
Lemma: G is a dag iff its depth-first forest has no back edges.
Proof:
Assume a back edge (u,v). Tree edges make a path from v to u.
So there is a cycle.
Assume a cycle c. Let v be first vertex on the cycle discovered. All
descendants will be reached, including some vertex u such that (v,u) is
an edge. This will be a back edge.
Topological Sort algorithm:
Do DFS, but change
leave[t] = time++
to
sorted[--i] = t (with i initialized to n)
That is, run DFS and sort the vertices in reverse-leave
time (highest leave time first).
Running time: Same as DFS of course. See how proving the running time
of a general approach is useful.
Correctness: We just need to show that if (u,v) in graph, then leave[v] <
leave[u]. Basically, this is true because there are no back edges:
In DFS, we look at each edge exactly once. When we look at (u,v) one of
the following cases applies:
* v is already part of another tree. Then leave[v] is already set and
therefore will be less than leave[u].
* v has not been visited. Then we will visit it. By DFS, we will leave
it before leaving u.
* v is part of the same tree. That would make (u,v) a back edge, which
is impossible in a dag.
STRONGLY-CONNECTED COMPONENTS
Recall a SCC is a maximal set of vertices such that there is a path
from any vertex in the set to any other vertex in the set. Notice "u
and v are in the same SCC" is an equivalence relation.
Finding SCCs is useful for "collapsing graphs" -- see homework 8.
We need one more definition before giving the algorithm. The
transpose of a directed graph is the graph with "all the edges turned
around". That is, G and G^T (the transpose graph of G) have the same
vertices and (u,v) is an edge in G^T iff (v,u) is an edge in G. We
can compute the transpose easily or just maintain it as we build G.
Algorithm for finding SCC of G:
(1) call DFS(G) and compute leave times
(2) call DFS(G^T) picking as next root the unvisited node with the
largest leave time in (1).
The nodes in each tree of the DFS forest of (2) are exactly the SCCs of
G!
We will prove this for the rest of lecture. First let's get some
intuition as to why this works:
Intuition 1: Let our whole graph be u--->v. If (1) starts with u,
we'll have one tree in its forest, else two. But either way, leave[u]
> leave[v]. So in (2), we start at u first and it has no out-going
edges in the transpose graph. So u and v end up in separate trees in
(2) which is correct -- they are not in the same SCC.
Intuition 2: simple cycle u--->v--->w
^ /
\_______/
No matter how (1) assigns leave times to these three nodes, whichever
is reached first in (2) will cause the others to be put in its same
tree.
Let us now proceed to prove the correctness of the algorithm.
Lemma 1: If u,v are in the same SCC, no path between them ever leaves
the SCC.
Proof easy: if w is on a u to v path, then we can get from w to u and
u to w. So w must be in the SCC.
Lemma 2: In _any_ DFS forest, all vertices of SCC are in the same tree.
Proof: During the DFS, some element of the SCC must be visited first.
Since we can reach all other elements from this one, they will all be
put in the current tree.
Let us now define for all vertices u the forefather of u in a DFS
forest: the forefather of u in a DFS forest is the vertex w that is
reachable from u with the largest leave time. (This is well-defined
for all vertices because at the very least u is reachable from u.)
Lemma 3: For all u, forefather(u) is an ancestor of u.
Proof: When u is first visited, we have cases to consider:
* forefather(u) is already in a different tree. But then leave(u) >
leave(forefather(u)), so it wasn't really the forefather(u).
* forefather(u) is not yet visited. But it is reachable from u, so it
will be a descendant of u. But then leave(u) >
leave(forefather(u)), so it wasn't really the forefather(u).
* forefather(u) is an ancestor of u. That's what we said to begin with. :-)
Corollary: u and forefather u are in the same SCC.
Proof: By definition forefather(u) is reachable from u.
By Lemma 3 and a sequence of tree edges, u is reachable
from forefather(u).
Lemma 4: u,v are in the same SCC iff they have the same forefather.
Proof:
Assume they have the same forefather. Then to go from u to v, go from
u to the common forefather and then to v. To go from v to u, go from
v to the common forefather and then to u.
Assume they're in the same SCC. Then they can reach the exact same
vertices. So whichever one of these exact same vertices has the
largest leave time is the forefather for both.
Theorem: SCC algorithm is correct.
Proof: By induction on number of trees in step (2) of the algorithm.
I.H. is that all trees already in forest are SCCs.
Base: Trivial for i (the number of trees) = 0. There are no trees, so
it is true of all of them.
Inductive: Say we are building a tree T, starting with vertex r. By
Lemma 2, T will have everything in r's SCC. So we just need to show T
won't have any other vertices: Suppose v is not in r's SCC. By Lemma
4, forefather(v) != forefather(r). We have two cases:
* leave(forefather(v)) > leave(forefather(r))
Since we started T with r and leave(forefather(r)) > leave(r), it must
be that forefather(v) is already in some tree. By the corollary, v and
forefather(v) are in the same SCC. By the I.H., v is already in
forefather(v)'s tree in the forest. So v will not be added to r's
tree.
* leave(forefather(v)) < leave(forefather(r))
Then by definition of forefather, there is no path from v to
forefather(r) in the original graph. By the corollary, r and
forefather(r) are in the same SCC. So there is no path from v to r in
the original graph. So there is no path from r to v in the transpose graph.
So the DFS will not reach v while building r's tree.