CS410, Summer 1998 Lecture 24 Outline Dan Grossman Goals: Applications of DFS: topological sort strongly-connected components. Directed graphs -- all edges one of 4 kinds: tree, forward, back, cross. Depth-first-search: no cross edges in the forest. TOPOLOGICAL SORT Dags often represent dependencies (cycles mean ill-defined). Eg. getting dressed. Lemma: G is a dag iff its depth-first forest has no back edges. Proof: Assume a back edge (u,v). Tree edges make a path from v to u. So there is a cycle. Assume a cycle c. Let v be first vertex on the cycle discovered. All descendants will be reached, including some vertex u such that (v,u) is an edge. This will be a back edge. Topological Sort algorithm: Do DFS, but change leave[t] = time++ to sorted[--i] = t (with i initialized to n) That is, run DFS and sort the vertices in reverse-leave time (highest leave time first). Running time: Same as DFS of course. See how proving the running time of a general approach is useful. Correctness: We just need to show that if (u,v) in graph, then leave[v] < leave[u]. Basically, this is true because there are no back edges: In DFS, we look at each edge exactly once. When we look at (u,v) one of the following cases applies: * v is already part of another tree. Then leave[v] is already set and therefore will be less than leave[u]. * v has not been visited. Then we will visit it. By DFS, we will leave it before leaving u. * v is part of the same tree. That would make (u,v) a back edge, which is impossible in a dag. STRONGLY-CONNECTED COMPONENTS Recall a SCC is a maximal set of vertices such that there is a path from any vertex in the set to any other vertex in the set. Notice "u and v are in the same SCC" is an equivalence relation. Finding SCCs is useful for "collapsing graphs" -- see homework 8. We need one more definition before giving the algorithm. The transpose of a directed graph is the graph with "all the edges turned around". That is, G and G^T (the transpose graph of G) have the same vertices and (u,v) is an edge in G^T iff (v,u) is an edge in G. We can compute the transpose easily or just maintain it as we build G. Algorithm for finding SCC of G: (1) call DFS(G) and compute leave times (2) call DFS(G^T) picking as next root the unvisited node with the largest leave time in (1). The nodes in each tree of the DFS forest of (2) are exactly the SCCs of G! We will prove this for the rest of lecture. First let's get some intuition as to why this works: Intuition 1: Let our whole graph be u--->v. If (1) starts with u, we'll have one tree in its forest, else two. But either way, leave[u] > leave[v]. So in (2), we start at u first and it has no out-going edges in the transpose graph. So u and v end up in separate trees in (2) which is correct -- they are not in the same SCC. Intuition 2: simple cycle u--->v--->w ^ / \_______/ No matter how (1) assigns leave times to these three nodes, whichever is reached first in (2) will cause the others to be put in its same tree. Let us now proceed to prove the correctness of the algorithm. Lemma 1: If u,v are in the same SCC, no path between them ever leaves the SCC. Proof easy: if w is on a u to v path, then we can get from w to u and u to w. So w must be in the SCC. Lemma 2: In _any_ DFS forest, all vertices of SCC are in the same tree. Proof: During the DFS, some element of the SCC must be visited first. Since we can reach all other elements from this one, they will all be put in the current tree. Let us now define for all vertices u the forefather of u in a DFS forest: the forefather of u in a DFS forest is the vertex w that is reachable from u with the largest leave time. (This is well-defined for all vertices because at the very least u is reachable from u.) Lemma 3: For all u, forefather(u) is an ancestor of u. Proof: When u is first visited, we have cases to consider: * forefather(u) is already in a different tree. But then leave(u) > leave(forefather(u)), so it wasn't really the forefather(u). * forefather(u) is not yet visited. But it is reachable from u, so it will be a descendant of u. But then leave(u) > leave(forefather(u)), so it wasn't really the forefather(u). * forefather(u) is an ancestor of u. That's what we said to begin with. :-) Corollary: u and forefather u are in the same SCC. Proof: By definition forefather(u) is reachable from u. By Lemma 3 and a sequence of tree edges, u is reachable from forefather(u). Lemma 4: u,v are in the same SCC iff they have the same forefather. Proof: Assume they have the same forefather. Then to go from u to v, go from u to the common forefather and then to v. To go from v to u, go from v to the common forefather and then to u. Assume they're in the same SCC. Then they can reach the exact same vertices. So whichever one of these exact same vertices has the largest leave time is the forefather for both. Theorem: SCC algorithm is correct. Proof: By induction on number of trees in step (2) of the algorithm. I.H. is that all trees already in forest are SCCs. Base: Trivial for i (the number of trees) = 0. There are no trees, so it is true of all of them. Inductive: Say we are building a tree T, starting with vertex r. By Lemma 2, T will have everything in r's SCC. So we just need to show T won't have any other vertices: Suppose v is not in r's SCC. By Lemma 4, forefather(v) != forefather(r). We have two cases: * leave(forefather(v)) > leave(forefather(r)) Since we started T with r and leave(forefather(r)) > leave(r), it must be that forefather(v) is already in some tree. By the corollary, v and forefather(v) are in the same SCC. By the I.H., v is already in forefather(v)'s tree in the forest. So v will not be added to r's tree. * leave(forefather(v)) < leave(forefather(r)) Then by definition of forefather, there is no path from v to forefather(r) in the original graph. By the corollary, r and forefather(r) are in the same SCC. So there is no path from v to r in the original graph. So there is no path from r to v in the transpose graph. So the DFS will not reach v while building r's tree.