DFS and BFS (Depth-first and Breadth-first search) Reading: CLR secs. 23.2,3 pp. 469-485. G = (V,E) a directed graph The DFS and BFS algorithms given here differ from those in CLR mainly in that they push and pop edges rather than vertices. There are some advantages to this approach: -the algorithms come out a little cleaner -we don't need the vertex coloring scheme used in CLR -most interestingly, DFS and BFS turn out to be essentially the same algorithm, except that DFS explores the edges in a LIFO (stack) fashion and BFS in a FIFO (queue) fashion. DFS: edges will be either tree edges to a child--these edges will form a forest back edges to an ancestor forward edges to a descendant, but not a tree edge cross edges to a vertex that is neither an ancestor nor a descendant In addition, we will assign each vertex a number when it is expanded, which will be its preorder number. DFS() { mark all vertices initially unexpanded; d = 0; //next preorder number while (there exists an unexpanded vertex) { A: r = next unexpanded vertex; expand(r); //subroutine below while (stack nonempty) { (u,v) = pop stack; //next edge to explore B: if (v not expanded yet) { C: make (u,v) a tree edge; D: expand(v); } else { //(u,v) will be some other kind of edge //to be determined later } } } } expand(Vertex u) { DFSnumber(v) = d++; //assign a preorder number to v push all edges (u,v) onto stack; } Example: vertices a,b,c,d,e edges (a,b),(a,d),(b,c),(b,d),(c,a),(c,e),(d,c),(d,e) vertex a chosen as first r in line A gives DFS tree with preorder numbers a=0, b=1, c=2, e=3, d=4 tree edges (a,b),(b,c),(b,d),(c,e) forward edge (a,d) back edge (c,a) cross edges (d,c),(d,e) To get BFS from this, simply change -"stack" to "queue" -"push" to "enqueue" -"pop" to "dequeue" Correctness: -tree edges really do form a forest: -they form a dag, since all edges go from a vertex with a lower DFS number to a higher DFS number, so there can be no cycles -they form a forest, since no vertex can have more than one incoming tree edge, since (u,v) can become a tree edge only at line C, v is expanded immediately afterward at D, and thereafter no other edge coming into v can become a tree edge because of the test at line B -every vertex reachable from r assigned at line A is reachable by a path of tree edges (induction on length of path). Complexity: O(n+m), since at most a constant amount of work for each vertex and edge. We know what the tree edges are. How do we tell the other kinds of edges? The DFS number is the preorder number. Walk the forest and computing postorder numbers as well. Then (u,v) is the follow- if the following relations hold between ing kind of edge preorder and postorder numbers tree pre(u) < pre(v) and post(u) > post(v) forward pre(u) < pre(v) and post(u) > post(v) back pre(u) > pre(v) and post(u) < post(v) cross pre(u) > pre(v) and post(u) > post(v) 1st application of DFS: acyclicity. Theorem. G is a dag iff the DFS forest has no back edges. Proof. If there are no back edges, then all edges go from higher to lower postorder number, so there can be no cycles. Conversely, if there is a back edge (u,v), then there is a cycle consisting of the back edge (u,v) and a sequence of tree edges v -> ... -> u. This gives an O(n+m) algorithm for acyclicity: form the DFS forest and check whether there are any back edges. Strongly connected components We can use the DFS computation to compute strong components of a graph in linear time. We want a linear-time algorithm to mark the graph so that can answer query, "are u and v in the same strong component?" in constant time. Simplifying step: delete all forward edges, since they are not needed for computing strong components; can use tree edges instead. Def. Let low(u) be the ancestor of u with the lowest preorder number (i.e., highest in the tree) reachable by a path from u. (Here "ancestor" is taken to be reflexive, so low(u) could be u itself.) Note u == low(u), since u is also reachable from low(u) by tree edges. Order the vertices by preorder number; thus we say u < v if the preorder number of u is < the preorder number of v. ^ refers to min in this order. Lemma. If u == v then there is a common ancestor of u and v in the strong component of u and v. Proof. Suppose v < u. It suffices to show that an ancestor of u is reachable from v. Let A = {s | s < u} B = {ancestors of u}. Then B is a subset of A. If v in B, we are done. Otherwise v in A - B. The set A - B is closed under tree edges; i.e., if s is in A - B and (s,t) is a tree edge, then t is in A - B. But since v < u, on a path from v to u there must be an edge (s,t) such that s < u and t >= u. Since s < t, (s,t) must be a tree edge; since t >= u, s is not in A - B; and since s < u, s is in A. Thus s must be an ancestor of u. QED Theorem. u == v iff low(u) = low(v). Proof. If low(u) = low(v), then u == low(u) = low(v) == v, so by transitivity u == v. Conversely, if u == v, then by transitivity low(u) == low(v). By the lemma, low(u) and low(v) must have a common ancestor w in their strong component; but then w = low(u) = low(v). QED It follows from the theorem that every strong component has a unique root, i.e. a vertex that is an ancestor of every other vertex in the strong component. A vertex u is the root of its strong component iff it satisfies u = low(u). There may be some descendants of u that are not in the strong component of u, but the first such vertex encountered on any path of tree edges starting at u must be the root of a strong component. Now we give a linear time algorithm to compute the strong components. We will walk the tree in DFS order, doing some calculations that will allow us to determine, when we leave a vertex u for the last time, whether u is the root of its strong component. If that is the case, then we delete u and all its remaining descendants from the graph; these will be exactly the vertices in the strong component of u. As we walk the tree, we calculate for each vertex u the lowest preorder-numbered vertex C(u) reachable from u through a path of only tree edges followed by single a cross or back edge. If there is no such vertex with lower preorder number than u, we take C(u) = u. This information can be calculated inductively; at each vertex u, take C(u) to be the vertex of minimum preorder number among C(v) for all children v of u, vertices reachable from u directly by a cross or back edge, and u itself. Now suppose we have explored the subtree under u and are about to leave u for the last time on our way back up the tree. We can assume by the induction hypothesis (induction on postorder number) that any vertex v satisfying v = low(v) that has already been visited for the last time (i.e., that has a lower postorder number) has been deleted, along with all members of its strong component. Now the claim is: u = low(u) iff u = C(u). Suppose first that u = C(u). Then no path starting at u can escape the subtree rooted at u. That is because the only way to escape that subtree is via a cross or back edge to a lower preorder-numbered vertex than u, and that edge would be reachable from u by a path of tree edges; so if this were possible, then C(u) would have a lower preorder number than u. Thus for any descendant v of u, low(v) is in the subtree rooted at u. If low(v) were a proper descendant of u, then low(v) = low(low(v)) and v would have already been deleted. Thus for every remaining descendant v of u, low(v) = u and all the remaining descendants of u are in the strong component of u. Now suppose u != C(u). There is a path of tree edges from u to some descendant v of u and a cross or back edge (v,C(u)) to the vertex C(u), which has smaller preorder number than u and is thus outside the subtree rooted at u. Then low(C(u)) must be an ancestor of u; if not, then low(C(u)) would have a lower postorder number than u, therefore would have been deleted, since low(low(C(u)) = low(C(u)). Since low(C(u)) is an ancestor of u, low(u) = low(C(u)) != u. The algorithm runs in linear time since there is a constant amount of work for each vertex and edge to walk the tree, plus a constant amount of work per vertex and edge to do the deletions.