DFS and BFS (Depth-first and Breadth-first search)

Reading: CLR secs. 23.2,3 pp. 469-485.

G = (V,E) a directed graph

The DFS and BFS algorithms given here differ from those
in CLR mainly in that they push and pop edges rather
than vertices.  There are some advantages to this approach:
-the algorithms come out a little cleaner
-we don't need the vertex coloring scheme used in CLR
-most interestingly, DFS and BFS turn out to be essentially
the same algorithm, except that DFS explores the edges
in a LIFO (stack) fashion and BFS in a FIFO (queue) fashion.

DFS: edges will be either
tree edges     to a child--these edges will form a forest
back edges     to an ancestor
forward edges  to a descendant, but not a tree edge
cross edges    to a vertex that is neither an ancestor nor a descendant

In addition, we will assign each vertex a number
when it is expanded, which will be its preorder number.

DFS() {
   mark all vertices initially unexpanded;
   d = 0; //next preorder number
   while (there exists an unexpanded vertex) {
A:    r = next unexpanded vertex;
      expand(r); //subroutine below
      while (stack nonempty) {
         (u,v) = pop stack; //next edge to explore
B:       if (v not expanded yet) {
C:          make (u,v) a tree edge;
D:          expand(v);
         } else {
            //(u,v) will be some other kind of edge
            //to be determined later
         }
      }
   }         
}

expand(Vertex u) {
   DFSnumber(v) = d++; //assign a preorder number to v
   push all edges (u,v) onto stack;
}

Example:
vertices a,b,c,d,e
edges (a,b),(a,d),(b,c),(b,d),(c,a),(c,e),(d,c),(d,e)
vertex a chosen as first r in line A
gives DFS tree with
preorder numbers a=0, b=1, c=2, e=3, d=4
tree edges (a,b),(b,c),(b,d),(c,e)
forward edge (a,d)
back edge (c,a)
cross edges (d,c),(d,e)           

To get BFS from this, simply change
-"stack" to "queue"
-"push" to "enqueue"
-"pop" to "dequeue"

Correctness:
-tree edges really do form a forest:
  -they form a dag, since all edges go from a vertex with a
   lower DFS number to a higher DFS number, so there can be
   no cycles
  -they form a forest, since no vertex can have more than one
   incoming tree edge, since (u,v) can become a tree edge only at
   line C, v is expanded immediately afterward at D, and thereafter
   no other edge coming into v can become a tree edge because of
   the test at line B
  -every vertex reachable from r assigned at line A is reachable
   by a path of tree edges (induction on length of path).

Complexity: O(n+m), since at most a constant amount of work for each
vertex and edge.

We know what the tree edges are.  How do we tell the other
kinds of edges?  The DFS number is the preorder number.  Walk
the forest and computing postorder numbers as well.  Then

(u,v) is the follow-    if the following relations hold between
ing kind of edge        preorder and postorder numbers

tree                    pre(u) < pre(v)  and  post(u) > post(v)
forward                 pre(u) < pre(v)  and  post(u) > post(v)
back                    pre(u) > pre(v)  and  post(u) < post(v)
cross                   pre(u) > pre(v)  and  post(u) > post(v)

1st application of DFS: acyclicity.

Theorem.  G is a dag iff the DFS forest has no back edges.
Proof.  If there are no back edges, then all edges go from higher
to lower postorder number, so there can be no cycles.  Conversely,
if there is a back edge (u,v), then there is a cycle consisting of
the back edge (u,v) and a sequence of tree edges v -> ... -> u.

This gives an O(n+m) algorithm for acyclicity: form the DFS forest
and check whether there are any back edges.

Strongly connected components

We can use the DFS computation to compute strong
components of a graph in linear time.  We want a linear-time
algorithm to mark the graph so that can answer query, "are u and v
in the same strong component?" in constant time.

Simplifying step: delete all forward edges, since they are not
needed for computing strong components; can use tree edges instead.

Def.  Let low(u) be the ancestor of u with the lowest preorder
number (i.e., highest in the tree) reachable by a path from u.
(Here "ancestor" is taken to be reflexive, so low(u) could be
u itself.)  Note u == low(u), since u is also reachable from
low(u) by tree edges.

Order the vertices by preorder number; thus we say u < v if the
preorder number of u is < the preorder number of v.  ^ refers
to min in this order.

Lemma.  If u == v then there is a common ancestor
of u and v in the strong component of u and v.

Proof.  Suppose v < u.  It suffices to show that an ancestor of
u is reachable from v.  Let
A = {s | s < u}
B = {ancestors of u}.
Then B is a subset of A.  If v in B, we are done.  Otherwise
v in A - B.  The set A - B is closed under tree edges; i.e., if
s is in A - B and (s,t) is a tree edge, then t is in A - B.  But
since v < u, on a path from v to u there must
be an edge (s,t) such that s < u and t >= u.  Since s < t,
(s,t) must be a tree edge; since t >= u, s is not in
A - B; and since s < u, s is in A.  Thus s must be an ancestor
of u.  QED

Theorem.  u == v iff low(u) = low(v).

Proof.  If low(u) = low(v), then
u == low(u) = low(v) == v, so by transitivity u == v.

Conversely, if u == v, then by transitivity low(u) == low(v).
By the lemma, low(u) and low(v) must have a common ancestor w
in their strong component; but then w = low(u) = low(v).  QED

It follows from the theorem that every strong component
has a unique root, i.e. a vertex that is an ancestor of every
other vertex in the strong component.  A vertex u is the root of
its strong component iff it satisfies u = low(u).  There may be
some descendants of u that are not in the strong component of
u, but the first such vertex encountered on any path of tree
edges starting at u must be the root of a strong component.

Now we give a linear time algorithm to compute the strong components.
We will walk the tree in DFS order, doing some calculations that will
allow us to determine, when we leave a vertex u for the last time, whether
u is the root of its strong component.  If that is the case, then
we delete u and all its remaining descendants from the graph; these
will be exactly the vertices in the strong component of u.

As we walk the tree, we calculate for each vertex u the lowest
preorder-numbered vertex C(u) reachable from u through a path of only
tree edges followed by single a cross or back edge.  If there is no
such vertex with lower preorder number than u, we take C(u) = u.
This information can be calculated inductively; at each vertex u,
take C(u) to be the vertex of minimum preorder number among C(v)
for all children v of u, vertices reachable from u directly by a
cross or back edge, and u itself.

Now suppose we have explored the subtree under u and are about to
leave u for the last time on our way back up the tree.  We can assume
by the induction hypothesis (induction on postorder number) that any
vertex v satisfying v = low(v) that has already been visited for the
last time (i.e., that has a lower postorder number) has been deleted,
along with all members of its strong component.

Now the claim is: u = low(u) iff u = C(u).  Suppose first that u = C(u).
Then no path starting at u can escape the subtree rooted at u.  That
is because the only way to escape that subtree is via a cross or back edge
to a lower preorder-numbered vertex than u, and that edge would be reachable
from u by a path of tree edges; so if this were possible, then C(u) would
have a lower preorder number than u.  Thus for any descendant v of u, low(v)
is in the subtree rooted at u.  If low(v) were a proper descendant of u, then 
low(v) = low(low(v)) and v would have already been deleted.  Thus for every
remaining descendant v of u, low(v) = u and all the remaining descendants of
u are in the strong component of u.

Now suppose u != C(u).  There is a path of tree edges from u to some
descendant v of u and a cross or back edge (v,C(u)) to the vertex C(u),
which has smaller preorder number than u and is thus outside the subtree
rooted at u.  Then low(C(u)) must be an ancestor of u; if not, then
low(C(u)) would have a lower postorder number than u, therefore would
have been deleted, since low(low(C(u)) = low(C(u)).  Since low(C(u)) is
an ancestor of u, low(u) = low(C(u)) != u.

The algorithm runs in linear time since there is a constant amount of
work for each vertex and edge to walk the tree, plus a constant amount
of work per vertex and edge to do the deletions.