22. Graph Traversals

Many questions we wish to answer about graphs require us to perform a traversal, visiting the vertices in a graph one at a time to learn about the graph’s structure. While we could simply iterate over one of the private data structures that model the graph’s state (e.g., the vertices map in our graph class definitions from the previous lecture), this does not guarantee that the vertices are visited in an order conducive to understanding the graph’s structure. Instead, we’d like our traversal to “follow the structure” of the graph, using edges to discover neighbors and chaining together these discoveries to build up paths. These “structured” traversals will allow us to answer questions about the graph, such as detecting the presence of cycles, ordering the vertices according to some desired properties, or locating optimal paths from one vertex to another. We’ll focus primarily on this last objective over the next two lectures. Today, we’ll discuss two traversal strategies, depth-first (DFS) and breadth-first (BFS) searches, for visiting the vertices reachable from a given source vertex. In the next lecture, we’ll build on the ideas of BFS to develop Dijkstra’s algorithm for finding shortest paths in a directed weighted graph.

Throughout today’s lecture, we’ll work with a version of the map-based adjacency list graph representation that we developed in the previous lecture. We will restrict our attention to unweighted edges today, which we’ll model using an AdjListEdge record class storing just the tail and head vertices of each edge. The full source code of the AdjListGraph implementation that we’ll use is shown below and provided with the lecture release code.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139


/** Represents an unweighted directed graph using an adjacency list. */
public class AdjListGraph implements Graph<AdjListVertex, AdjListGraph.AdjListEdge> {
  /** Represents an unweighted edge in this directed graph. */
  public record AdjListEdge(AdjListVertex tail, AdjListVertex head) implements Edge<AdjListVertex> { }

  /** Represents a vertex in this graph responsible for tracking its neighbors. */
  public static class AdjListVertex implements Vertex<AdjListEdge> {
    /** The label of this vertex */
    String label;

    /**
     * A map associating the labels of this vertex's neighbors with the edges 
     * connecting to them.
     */
    LinkedHashMap<String, AdjListEdge> outEdges;

    /** Constructs a new vertex with the given `label`. */
    AdjListVertex(String label) {
      this.label = label;
      outEdges = new LinkedHashMap<>();
    }

    @Override
    public String label() {
      return label;
    }

    @Override
    public int degree() {
      return outEdges.size();
    }

    @Override
    public boolean hasNeighbor(String headLabel) {
      return outEdges.containsKey(headLabel);
    }

    @Override
    public AdjListEdge edgeTo(String headLabel) {
      assert hasNeighbor(headLabel);
      return outEdges.get(headLabel);
    }

    @Override
    public Iterable<AdjListEdge> outgoingEdges() {
      return outEdges.values();
    }
  }

  /** A map associating the labels of the vertices with their AdjListVertex objects. */
  HashMap<String, AdjListVertex> vertices;

  /** Constructs a new graph initially containing no vertices. */
  public AdjListGraph() {
    vertices = new HashMap<>();
  }

  @Override
  public int vertexCount() {
    return vertices.size();
  }

  @Override
  public int edgeCount() {
    int count = 0;
    for (AdjListVertex v : vertices()) {
      count += v.degree();
    }
    return count;
  }

  @Override
  public boolean hasVertex(String label) {
    return vertices.containsKey(label);
  }

  @Override
  public AdjListVertex getVertex(String label) {
    assert hasVertex(label); // defensive programming
    return vertices.get(label);
  }

  @Override
  public boolean hasEdge(String tailLabel, String headLabel) {
    return vertices.containsKey(tailLabel) && vertices.get(tailLabel).hasNeighbor(headLabel);
  }

  @Override
  public AdjListEdge getEdge(String tailLabel, String headLabel) {
    assert hasEdge(tailLabel, headLabel); // defensive programming
    return vertices.get(tailLabel).edgeTo(headLabel);
  }

  @Override
  public void addVertex(String label) {
    if (hasVertex(label)) {
      throw new IllegalArgumentException("Graph already contains vertex " + label);
    }
    vertices.put(label, new AdjListVertex(label));
  }

  @Override
  public AdjListEdge addEdge(String tailLabel, String headLabel) {
    if (!vertices.containsKey(tailLabel)) {
      throw new IllegalArgumentException("Graph does not have a vertex " + tailLabel);
    }
    AdjListVertex tail = vertices.get(tailLabel);

    if (!vertices.containsKey(headLabel)) {
      throw new IllegalArgumentException("Graph does not have a vertex " + headLabel);
    }
    AdjListVertex head = vertices.get(headLabel);

    if (tail.hasNeighbor(headLabel)) {
      throw new IllegalArgumentException("Graph already has edge from " + tailLabel 
          + " to " + headLabel);
    }

    AdjListEdge newEdge = new AdjListEdge(tail, head);
    tail.outEdges.put(headLabel, newEdge);
    return newEdge;
  }

  @Override
  public Iterable<AdjListVertex> vertices() {
    return vertices.values();
  }

  @Override
  public Iterable<AdjListEdge> edges() {
    ArrayList<AdjListEdge> edges = new ArrayList<>();
    for (AdjListVertex v : vertices()) {
      for (AdjListEdge e : v.outgoingEdges()) {
        edges.add(e);
      }
    }
    return edges;
  }
}

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139


/** Represents an unweighted directed graph using an adjacency list. */
public class AdjListGraph implements Graph<AdjListVertex, AdjListGraph.AdjListEdge> {
  /** Represents an unweighted edge in this directed graph. */
  public record AdjListEdge(AdjListVertex tail, AdjListVertex head) implements Edge<AdjListVertex> { }

  /** Represents a vertex in this graph responsible for tracking its neighbors. */
  public static class AdjListVertex implements Vertex<AdjListEdge> {
    /** The label of this vertex */
    String label;

    /**
     * A map associating the labels of this vertex's neighbors with the edges 
     * connecting to them.
     */
    LinkedHashMap<String, AdjListEdge> outEdges;

    /** Constructs a new vertex with the given `label`. */
    AdjListVertex(String label) {
      this.label = label;
      outEdges = new LinkedHashMap<>();
    }

    @Override
    public String label() {
      return label;
    }

    @Override
    public int degree() {
      return outEdges.size();
    }

    @Override
    public boolean hasNeighbor(String headLabel) {
      return outEdges.containsKey(headLabel);
    }

    @Override
    public AdjListEdge edgeTo(String headLabel) {
      assert hasNeighbor(headLabel);
      return outEdges.get(headLabel);
    }

    @Override
    public Iterable<AdjListEdge> outgoingEdges() {
      return outEdges.values();
    }
  }

  /** A map associating the labels of the vertices with their AdjListVertex objects. */
  HashMap<String, AdjListVertex> vertices;

  /** Constructs a new graph initially containing no vertices. */
  public AdjListGraph() {
    vertices = new HashMap<>();
  }

  @Override
  public int vertexCount() {
    return vertices.size();
  }

  @Override
  public int edgeCount() {
    int count = 0;
    for (AdjListVertex v : vertices()) {
      count += v.degree();
    }
    return count;
  }

  @Override
  public boolean hasVertex(String label) {
    return vertices.containsKey(label);
  }

  @Override
  public AdjListVertex getVertex(String label) {
    assert hasVertex(label); // defensive programming
    return vertices.get(label);
  }

  @Override
  public boolean hasEdge(String tailLabel, String headLabel) {
    return vertices.containsKey(tailLabel) && vertices.get(tailLabel).hasNeighbor(headLabel);
  }

  @Override
  public AdjListEdge getEdge(String tailLabel, String headLabel) {
    assert hasEdge(tailLabel, headLabel); // defensive programming
    return vertices.get(tailLabel).edgeTo(headLabel);
  }

  @Override
  public void addVertex(String label) {
    if (hasVertex(label)) {
      throw new IllegalArgumentException("Graph already contains vertex " + label);
    }
    vertices.put(label, new AdjListVertex(label));
  }

  @Override
  public AdjListEdge addEdge(String tailLabel, String headLabel) {
    if (!vertices.containsKey(tailLabel)) {
      throw new IllegalArgumentException("Graph does not have a vertex " + tailLabel);
    }
    AdjListVertex tail = vertices.get(tailLabel);

    if (!vertices.containsKey(headLabel)) {
      throw new IllegalArgumentException("Graph does not have a vertex " + headLabel);
    }
    AdjListVertex head = vertices.get(headLabel);

    if (tail.hasNeighbor(headLabel)) {
      throw new IllegalArgumentException("Graph already has edge from " + tailLabel 
          + " to " + headLabel);
    }

    AdjListEdge newEdge = new AdjListEdge(tail, head);
    tail.outEdges.put(headLabel, newEdge);
    return newEdge;
  }

  @Override
  public Iterable<AdjListVertex> vertices() {
    return vertices.values();
  }

  @Override
  public Iterable<AdjListEdge> edges() {
    ArrayList<AdjListEdge> edges = new ArrayList<>();
    for (AdjListVertex v : vertices()) {
      for (AdjListEdge e : v.outgoingEdges()) {
        edges.add(e);
      }
    }
    return edges;
  }
}

Remark:

In today's lecture, when we start to use graphs from the client side, you'll hopefully begin to appreciate the complicated generic code that we wrote in the previous lecture. It will help to keep our interactions with graphs simple while enabling the code that we write to naturally adapt to graphs with other types of vertices and edges. In summary, our implementation achieves good parametric polymorphism.

Depth-First Search

Imagine that we wanted to write code to solve a maze (or even to perform the simpler task of confirming to us that the maze is solvable in the first place, before we devote time to trying to find the solution ourselves). How might we do this?

To start, we’ll need a way to model the maze in our program. We can do this using a graph. We can view each of the square cells of the maze as a vertex in the graph (labeled with its coordinates), and we can draw edges (in both directions, since travel in a maze is not direction-specific) between adjacent vertices that do not have a wall between them.

Our maze-solving problem has been transformed into a question about a graph:

Does the graph that we constructed contain a path from vertex (0,0) to vertex (4,4)?

In the language of graph theory, we are asking whether the vertices (0,0) and (4,4) are connected.

Definition: Connected

Two vertices $u$ and $v$ in a directed graph are connected if the graph contains a path from $u$ to $v$.

Now that we have a familiar object, a graph, on which our code can operate, let’s think about how we can solve this problem. If we were truly standing inside the maze, unable to view it from above, all we could do is start to walk around. We’ll move between adjacent squares, repeating this process until we either find the end or are confident that we have explored all of the possible paths. At certain points of our exploration, we may hit a dead end and need to turn around, retracing our steps until we reach an intersection with a path that we have not yet explored.

Translating this to our graph, we’ll begin at the starting vertex (which we’ll call the source). From there, we’ll follow an outgoing edge from the source vertex to reach a new vertex, repeating this process to work our way deeper into the maze. If we ever reach a dead end, we’ll need a way to backtrack and choose a new path. If we ever reach the ending vertex (which we’ll call the destination), we will have discovered the solution and would like a way to “back-calculate” the path we took to get there.

This graph traversal strategy is referred to as a depth-first search since it follows one path as deep into the graph (or maze) as it can before considering alternate paths.

Definition: Depth-First Search (DFS)

In a depth-first search, we follow one outgoing edge from each vertex beginning at the source until we can no longer make progress. Then, we backtrack to the most recent decision point, revise our choice, and explore an alternate path, continuing this process until all edges have been followed.

Remark:

Note that the DFS procedure is underspecified; it does not tell us which particular outgoing edge to choose in each step of the algorithm. Different choices will lead to different traversals, all of which can be classified as depth-first searches.

Recursive Implementation

DFS admits a natural recursive algorithm. Each time we reach a vertex, we can check whether it is the destination (the base case). If not, we can launch a new depth-first search (the recursive call) for the destination from each of the current vertex’s neighbors. Since one of these recursive calls will fully evaluate before the next one begins, we will deeply explore one path before starting to consider an alternate path.

An initial attempt to code up this approach is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * (Attempts to) perform a DFS to locate a path from the `source` vertex to
 * the `destination` vertex. Returns whether such a path exists.
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfs(V source, V destination) {
  if (source == destination) { // base case, we've reached the destination
    return true;
  }

  for (E edge : source.outgoingEdges()) { // explore outgoing edges
    V neighbor = edge.head();
    if (dfs(neighbor, destination)) {
      return true;
    }
  }
  return false; // couldn't locate path to destination
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * (Attempts to) perform a DFS to locate a path from the `source` vertex to
 * the `destination` vertex. Returns whether such a path exists.
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfs(V source, V destination) {
  if (source == destination) { // base case, we've reached the destination
    return true;
  }

  for (E edge : source.outgoingEdges()) { // explore outgoing edges
    V neighbor = edge.head();
    if (dfs(neighbor, destination)) {
      return true;
    }
  }
  return false; // couldn't locate path to destination
}

Can you find an issue with this implementation? It may help to trace through a smaller example than our larger maze. For example, consider calling dfs() with source vertex “s” and destination vertex “t” in the following graph:

What’s the problem?

Here, the issue stems from the under-specification of DFS and the presence of a cycle in the graph. It is possible that in every recursive call with source equal to the vertex labeled "b", we explore the outgoing edge to "c" before "t", as demonstrated in the following animation:

To circumvent this issue, we must find a way to keep track of the vertices that we have already discovered so that we do not get caught looping around the graph without making any progress. To do this, we’ll need to augment our search with a set of discovered vertices. Before we can formalize this, let’s settle on some terminology that we’ll use to describe the state of the vertices during our search.

Definition: Vertex States: (Un)discovered, Visited, Settled

Initially, all of the vertices except the source vertex are undiscovered.
We discover a vertex (i.e., it becomes discovered) the first time that we identify it as the head() of an edge we are considering.
Over the course of our search, we visit one vertex (i.e., it becomes visited) in each step (recursive call or, later, iteration). During this visit, we consider all of its outgoing edges and potentially take other actions.
After we finish visiting a vertex, it becomes settled; we have extracted all useful information from it to aid in our search process.

In DFS, we visit a vertex as soon as it is discovered; since we are prioritizing the depth of our search, we pause our visit of the current vertex to go visit its newly discovered neighbor. To avoid the aforementioned problem, we never want to discover (or visit) a vertex more than once. Thus, we’ll add a vertex to a discovered set just before its (first) visit, and we’ll add logic to never revisit a discovered vertex.

Since we need to pass this discovered set through the recursive calls, we’ll need to modify the method signature. We’ll use our, hopefully familiar, technique of delegating to a separate recursive helper method. The corrected DFS code is given below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


/**
 * Uses a recursive helper method to carry out a depth-first search for the given `destination`
 * vertex starting from the given `source` vertex. Returns whether this search was successful
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfs(V source, V destination) {
  Set<String> discovered = new HashSet<>();
  discovered.add(source.label());
  return dfsRecursive(source, destination, discovered);
}

/**
 * The recursive helper method for our depth-first search. Returns whether the `current` vertex
 * has a path to the `destination`. 
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfsRecursive(
  V current, V destination, Set<String> discovered) {

  if (current == destination) { // base case, we've reached the destination
    return true;
  }

  for (E edge : current.outgoingEdges()) { // complete visit by "discovering" all neighbors
    V neighbor = edge.head();
    if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
      discovered.add(neighbor.label()); // discover it
      if (dfsRecursive(neighbor, destination, discovered)) {
        return true;
      }
    }
  }

  return false;
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


/**
 * Uses a recursive helper method to carry out a depth-first search for the given `destination`
 * vertex starting from the given `source` vertex. Returns whether this search was successful
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfs(V source, V destination) {
  Set<String> discovered = new HashSet<>();
  discovered.add(source.label());
  return dfsRecursive(source, destination, discovered);
}

/**
 * The recursive helper method for our depth-first search. Returns whether the `current` vertex
 * has a path to the `destination`. 
 */
public static <V extends Vertex<E>,E extends Edge<V>> boolean dfsRecursive(
  V current, V destination, Set<String> discovered) {

  if (current == destination) { // base case, we've reached the destination
    return true;
  }

  for (E edge : current.outgoingEdges()) { // complete visit by "discovering" all neighbors
    V neighbor = edge.head();
    if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
      discovered.add(neighbor.label()); // discover it
      if (dfsRecursive(neighbor, destination, discovered)) {
        return true;
      }
    }
  }

  return false;
}

Visualizing the Search

Step through the following animation to visualize the execution of DFS on a small example graph. To introduce some visual vocabulary that we will use throughout the rest of this and the next lecture,

We’ll (continue to) use green shading to indicate the vertex that we are currently visiting. From our earlier terminology, we know that this vertex is discovered but not yet settled.
We’ll use light red shading to indicate all other vertices that have been discovered but not yet settled. Later, we will refer to these as the frontier vertices, since one of them will always be the next vertex that we visit.
We’ll use dark red shading to indicate that a vertex has been settled.
All unshaded vertices are undiscovered.

From this animation, we can observe some invariants of our recursive DFS implementation. These particularly relate to the frontier vertices, the vertices that have been discovered but not yet settled (including the vertex that is currently being visited). Notice that throughout the algorithm, there is one active stack frame for each frontier vertex. Moreover, these frontier vertices form a single path from the source vertex to the vertex that we are currently visiting, and the order of the vertices along this path is their order (bottom to top) on the runtime stack. Together, these observations tell us that at the point where we hit the base case by reaching the destination vertex, the active stack frames correspond to a path from the source to the destination. We can use this insight to modify the specifications of our DFS method to produce this path (see Exercise 22.4).

Complexity Analysis

Now that we have a working DFS implementation and better understand how it works, we can carry out a complexity analysis. We’ll need to reason about the total time and space usage across all the recursive calls of dfsRecursive().

Let’s start with the time complexity. We’ll focus on the visitation loop in the dfsRecursive() method, as this will dominate the runtime; all of the other operations in both methods run in (expected) $O(1)$ time, and there are a total of $O(|V|)$ calls to dfsRecursive() (one per vertex that we visit), giving a total time complexity of $O(|V|)$ outside of the visitation loop.

1
2
3
4
5
6
7
8
9


for (E edge : current.outgoingEdges()) { // complete visit by "discovering" all neighbors
  V neighbor = edge.head();
  if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
    discovered.add(neighbor.label()); // discover it
    if (dfsRecursive(neighbor, destination, discovered)) {
      return true;
    }
  }
}

1
2
3
4
5
6
7
8
9


for (E edge : current.outgoingEdges()) { // complete visit by "discovering" all neighbors
  V neighbor = edge.head();
  if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
    discovered.add(neighbor.label()); // discover it
    if (dfsRecursive(neighbor, destination, discovered)) {
      return true;
    }
  }
}

Remark:

The analysis of DFS is a bit subtle. Similar to Merge Sort, we cannot use our usual strategy of separately bounding the non-recursive work in each call and then adding these over all the recursive calls. This will give too loose of a bound. Instead, we'll reason about the total number of executions of each line across all of the recursive calls at once. Read the following very carefully.

The work done to advance the iterators in this enhanced-for loop is $O(1)$ per iteration. Across all of the recursive calls, there are $O(|E|)$ iterations of these loops. There is one iteration per outgoing edge from each current vertex, and each vertex can be the current vertex only once over the course of the search. Thus, each edge corresponds to at most one loop iteration.
The method calls in lines 2 and 3 each require $O(1)$ time and are run at most $O(|E|)$ times across all the recursive calls for an overall $O(|E|)$ contribution to the runtime.
We can enter the body of the outer if-statement at most $O(|V|)$ times, once per vertex that we discover. The non-recursive work on lines 4 and 6 requires $O(1)$ time per execution, for an overall $O(|V|)$ contribution to the runtime.
The work to set up the $O(|V|)$ call frames over the algorithm’s execution contributes $O(|V|)$ to the runtime.

Adding all these contributions, we find that the overall runtime of our recursive DFS implementation is $O(|V| + |E|) = O(|E|) $. Here, the latter simplification follows since we can only reach a new vertex (and do work as we visit that vertex) by following an edge, meaning the number of vertices we visit is asymptotically upper-bounded by the number of edges we traverse.

For the space complexity, the dfs() method allocates a HashSet on the heap that can grow to include $O(|V|)$ elements, requiring $O(|V|)$ space. Each of the $O(|V|)$ dfsRecursive() calls utilizes $O(1)$ stack space, for an overall space complexity of $O(|V|)$.

DFS Traversals

The DFS procedure has many use cases beyond identifying the existence of a path between two vertices in a graph. Since it guarantees to visit each vertex in a (strongly connected) graph exactly once, it provides us with a systematic way to traverse and perform some “action” at each of the vertices. By carefully choosing these actions, we can answer many different graph-theoretic tasks (that you’ll likely discuss more in a discrete math or algorithms class), such as detecting cycles (see Exercise 22.5), determining whether a graph is bipartite (see Exercise 22.6), or computing a topological order of a graph’s vertices (see Exercise 22.7). Next, we’ll see how we can generalize our DFS code to accommodate these more general actions.

An “action” is a list of behaviors that we wish for our code to perform during the traversal. Recall that functional interfaces provide a mechanism in Java to package and pass behaviors to a method. We’ll use the Consumer functional interface, instantiating the generic type T = String to model a function acting on each vertex label during the traversal. When we visit a vertex v during our traversal, we will call the accept() method, passing in v.label(), to carry out the action requested by the client.

There are two possible times when a client may wish for an action to be performed during the visitation of a vertex.

An action can be performed at the beginning of the visit, before any of the vertex’s outgoing edges are explored. We call this a “pre” action, since it is analogous to how a pre-order traversal produces the root of a subtree before traversing either of its child subtrees.
An action can be performed at the end of the visit, after the outgoing edges are explored and just before the vertex becomes settled. We call this a “post” action, since it is analogous to how a post-order traversal produces the root of a subtree after traversing both of its child subtrees.

The recursive DFS traversal method that we’ll write will be parameterized on both of these, allowing its client to specify both a “pre” and a “post” action. Similar to our search, our public dfsTraverse() method will delegate most of its work to its private dfsVisit() helper.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


/**
 * Carries out a DFS traversal of the vertices reachable from `source` in its graph.
 * Performs `preAction.accept()` when a vertex is first visited and performs
 * `postAction.accept()` when a vertex is settled.
 */
public static <V extends Vertex<E>,E extends Edge<V>> void dfsTraverse(V source,
    Consumer<String> preAction, Consumer<String> postAction) {
  Set<String> discovered = new HashSet<>();
  discovered.add(source.label());
  dfsVisit(source, preAction, postAction, discovered);
}

/**
 * Traverses all *undiscovered* vertices reachable from `v` using a DFS, performing 
 * `preAction.accept()` on each one as it is first visited and `postAction.accept()` 
 * on each one as it is settled. `discovered` must contain labels of all discovered 
 * vertices and will be added to as new vertices are discovered.
 */
private static <V extends Vertex<E>,E extends Edge<V>> void dfsVisit(V v, 
    Consumer<String> preAction, Consumer<String> postAction, Set<String> discovered) {
  preAction.accept(v.label()); // start of v visit, perform preAction
  for (E edge : v.outgoingEdges()) { // complete visit by "discovering" all neighbors
    V neighbor = edge.head();
    if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
      discovered.add(neighbor.label()); // discover it
      dfsVisit(neighbor, preAction, postAction, discovered); // visit it
    }
  }
  postAction.accept(v.label()); // v is settled now, perform postAction
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


/**
 * Carries out a DFS traversal of the vertices reachable from `source` in its graph.
 * Performs `preAction.accept()` when a vertex is first visited and performs
 * `postAction.accept()` when a vertex is settled.
 */
public static <V extends Vertex<E>,E extends Edge<V>> void dfsTraverse(V source,
    Consumer<String> preAction, Consumer<String> postAction) {
  Set<String> discovered = new HashSet<>();
  discovered.add(source.label());
  dfsVisit(source, preAction, postAction, discovered);
}

/**
 * Traverses all *undiscovered* vertices reachable from `v` using a DFS, performing 
 * `preAction.accept()` on each one as it is first visited and `postAction.accept()` 
 * on each one as it is settled. `discovered` must contain labels of all discovered 
 * vertices and will be added to as new vertices are discovered.
 */
private static <V extends Vertex<E>,E extends Edge<V>> void dfsVisit(V v, 
    Consumer<String> preAction, Consumer<String> postAction, Set<String> discovered) {
  preAction.accept(v.label()); // start of v visit, perform preAction
  for (E edge : v.outgoingEdges()) { // complete visit by "discovering" all neighbors
    V neighbor = edge.head();
    if (!discovered.contains(neighbor.label())) { // neighbor hasn't been discovered yet
      discovered.add(neighbor.label()); // discover it
      dfsVisit(neighbor, preAction, postAction, discovered); // visit it
    }
  }
  postAction.accept(v.label()); // v is settled now, perform postAction
}

We can use this dfsTraverse() method to print out two different traversal orders of our graph.

DFS Visitation Order

First, we’ll consider the DFS visitation order.

Definition: DFS Visitation Order

Given a directed graph $G$ with a designated source vertex $s$, a DFS visitation order lists the vertices of $G$ (that are reachable via paths from $s$) in the order that they are first visited by a DFS traversal beginning at $s$.

We can also call this order a DFS discovery order; in DFS, vertices are visited as soon as they are discovered. To print the vertices in visitation order, the “pre” action of our traversal should print the vertex label, and the “post” action should do nothing. We can achieve this in client code by writing

1

dfsTraverse(g.getVertex("s"), v -> System.out.print(v + " "), v -> {});

1

dfsTraverse(g.getVertex("s"), v -> System.out.print(v + " "), v -> {});

Here, we assume that g references an AdjListGraph object with a source vertex labeled “s”. The lambda expression v -> System.out.print(v + " ") is instantiated as a Consumer whose accept() method prints out the vertex label. The lambda expression v -> {} is instantiated as a Consumer whose accept() method does nothing.

Step through the following animation to trace through the traversal of the following directed graph to see how a DFS visitation order is computed.

Remark:

As we noted earlier, DFS is an underspecified procedure since it does not specify in which order the (undiscovered) neighbors of a vertex should be visited. Therefore, the DFS visitation order (as well as the DFS settlement order that we'll introduce next) is not unique. Exercise 22.2(a) asks you to determine the other possible DFS orders of this graph.

DFS Settlement Order

Next, we’ll consider the DFS settlement order.

Definition: DFS Settlement Order

Given a directed graph $G$ with a designated source vertex $s$, a DFS settlement order lists the vertices of $G$ (that are reachable via paths from $s$) in the order that they are settled by a DFS traversal beginning at $s$.

To print the vertices in settlement order, the “pre” action of our traversal should do nothing, and the “post” action should print the vertex label.

1

dfsTraverse(g.getVertex("s"), v -> {}, v -> System.out.print(v + " "));

1

dfsTraverse(g.getVertex("s"), v -> {}, v -> System.out.print(v + " "));

Step through the following animation to trace through the traversal of the following directed graph to see how a DFS settlement order is computed.

BFS Traversals

Depth-first searches work by considering one path from the source for as long as possible before backtracking to consider alternate paths. This can work well when our neighbor choices allow us to quickly near the destination. However, if we are unlucky with our initial choices then it may take a while before we backtrack all the way to the start to consider the correct path. An alternate search method can “simultaneously” advance along all of the search paths, fanning out the search from the source to give equal consideration to all paths. This approach is known as a breadth-first search since it prioritizes “broadening” the search at the expense of quickly advancing deeply along any one particular path.

Definition: Breadth-First Search (BFS)

In a breadth-first search, we proceed in "levels" of the graph, systematically discovering all neighboring vertices in one search level before proceeding to the next level.

In a BFS, our search no longer conforms to our physical intuition of stepping between adjacent nodes. Rather, we will be jumping from one area of the graph to another as we “fan out” our search. We’ll need a data structure to help keep track of the bookkeeping for which vertex to visit next. In this case, a queue is a natural choice. As we are visiting the vertices in one “level” of the graph, we can enqueue() the (undiscovered) neighbors of these vertices (which comprise the next “level”) to visit after. Overall, this gives rise to the following BFS traversal procedure:

Initialize an initially empty queue of vertices to visit and an initially empty set of discovered vertices.
Initiate the traversal by add()ing the source vertex to the queue and marking it as discovered.
While the queue is not empty, remove() the first vertex from the queue and visit this vertex. During the visit, perform the desired action, and then iterate over the neighbors of this vertex. For any that are undiscovered, discover them and then add() them to the queue to visit later. At this point, our vertex has been settled.
Once the queue is empty, all (reachable) vertices have been settled, so the traversal is complete.

The code for this procedure is given below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


/**
 * Uses a queue to carry out a BFS traversal of the vertices reachable from `source` 
 * in its graph. Performs `action.accept()` when a vertex is first visited.
 */
public static <V extends Vertex<E>,E extends Edge<V>> void bfsQueue(V source,
    Consumer<String> action) {

  // Set of discovered vertices
  Set<String> discovered = new HashSet<>();

  // Queue of discovered vertices that have not yet been visited
  Queue<V> frontier = new LinkedList<>();

  discovered.add(source.label());
  frontier.add(source);

  while(!frontier.isEmpty()) {
    V v = frontier.remove();

    action.accept(v.label()); // start of v visit, perform action

    for (E edge : v.outgoingEdges()) { // enqueue unvisited neighbors
      V neighbor = edge.head();
      if (!discovered.contains(neighbor.label())) {
        discovered.add(neighbor.label());
        frontier.add(neighbor);
      }
    }
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


/**
 * Uses a queue to carry out a BFS traversal of the vertices reachable from `source` 
 * in its graph. Performs `action.accept()` when a vertex is first visited.
 */
public static <V extends Vertex<E>,E extends Edge<V>> void bfsQueue(V source,
    Consumer<String> action) {

  // Set of discovered vertices
  Set<String> discovered = new HashSet<>();

  // Queue of discovered vertices that have not yet been visited
  Queue<V> frontier = new LinkedList<>();

  discovered.add(source.label());
  frontier.add(source);

  while(!frontier.isEmpty()) {
    V v = frontier.remove();

    action.accept(v.label()); // start of v visit, perform action

    for (E edge : v.outgoingEdges()) { // enqueue unvisited neighbors
      V neighbor = edge.head();
      if (!discovered.contains(neighbor.label())) {
        discovered.add(neighbor.label());
        frontier.add(neighbor);
      }
    }
  }
}

We refer to the queue as a frontier since it contains all of the vertices that have been discovered but have not yet been visited. Step through the following animation to visualize a BFS traversal of our graph that prints the vertices in the order they are visited.

1

bfsQueue(g.getVertex("s"), v -> System.out.print(v + " "));

1

bfsQueue(g.getVertex("s"), v -> System.out.print(v + " "));

We sometimes refer to a BFS traversal of a graph as a level-order traversal, since it visits the vertices in increasing order of level.

Definition: Level, Level-Order Traversal

In a graph $G$ with a designated source vertex $s$, the level of a vertex $v$ is the length of the shortest path from $s$ to $v$. By convention, $s$ has level 0 and all vertices that cannot be reached on a path from $s$ have level $\infty$.

The BFS visitation order is sometimes referred to as a level-order traversal since it visits the vertices in increasing order of level.

The levels of the vertices in our example graph are visualized below:

Since our BFS traversal proceeds systematically by level, it will never need to visit vertices that are farther from the source (i.e., have a greater level) than the destination. In this way, BFS avoids the pitfall of DFS of spending a long time exploring a deep path that ultimately proves to be a dead end. However, the large “fan out” of BFS means that a lot of work can be done exploring many paths in the initial levels besides the correct path to the destination.

We can uncover a lot of additional structure by considering the levels of the vertices in a BFS traversal and also the set of edges that are used to add new vertices to the frontier. We will pick up from here at the start of the next lecture as motivation for Dijkstra’s shortest path algorithm. If you’d like to explore some of these ideas yourself, take a look at Exercises 22.10 and 22.11. To conclude today’s lecture, we’ll analyze the time and space complexities of our iterative BFS implementation.

Complexity Analysis

Let’s begin by analyzing the space complexity of our BFS implementation. Since this is an iterative method, we do not need to worry about the call stack. Instead, we need only to consider the space occupied by our discovered set and frontier queue. At most, each of these structures can contain one copy of each vertex, for an overall $O(|V|)$ size. Thus, our BFS implementation has an $O(|V|)$ space complexity.

For the time complexity, we’ll consider the method line by line.

The initializations before the while-loop runs in $O(1)$ time. The guard on the while-loop is an $O(1)$ check.
In each iteration of the loop, one vertex is removed from the frontier queue. Over the course of the loop, each vertex is added to the frontier queue at most once. Thus, the loop runs for $O(|V|)$ iterations.
Within the loop, the queue removal is an $O(1)$ operation.
We’ll assume that the call to action.accept() is an $O(1)$ operation since the time processing one vertex will likely be a “local” operation that does not depend on the overall size of the graph.
Similar to our analysis of DFS, over the course of all of the outer-loop iterations, our inner loops will iterate over each graph edge at most once (since each edge has a distinct tail vertex). Thus, over the course of the bfsQueue() method, the inner for-loop runs for $O(|E|)$ iterations.
Within this inner for-loop, all of the lines are $O(1)$ operations.

Adding all these contributions, we find that the overall runtime of our iterative BFS implementation is $O(|V| + |E|) = O(|E|) $, the same runtime as our DFS implementation.

Main Takeaways:

A graph traversal is a method that guarantees to visit each (reachable) vertex in a graph exactly once and perform a specified action during this visit.
A depth-first search traverses a graph by following a path of undiscovered vertices until it reaches a dead end, and then backtracking to the most recent branching point to continue the traversal.
The most natural implementation of a DFS is recursive, since this allows us to keep track of its progress by pushing and popping frames on the runtime stack.
A breadth-first search branches out to simultaneously explore all paths from the source vertex at the same rate. It visits the vertices in level order.
The most natural implementation of a BFS is iterative, using a queue to keep track of the frontier vertices that have been discovered but not yet visited.
Both DFS and BFS have a worst-case $O(|V|+|E|) = O(|E|)$ time complexity and a worst-case $O(|V|)$ space complexity.
Graph traversals are an important primitive for many graph calculations and algorithms.

Exercises

Exercise 22.1: Check Your Understanding

Exercise 22.2: Trace the Traversal

For each of the given graphs, list all possible DFS visitation, DFS settlement, and BFS visitation orders.

(a)

The 6-vertex graph as seen in lecture notes 22.1.2, starting at vertex $s$.

(b)

$K_4$ or the 4-complete graph, where the vertices are labeled $\{a, b, c, d\}$, starting at $a$. A complete graph contains an edge between every ordered pair of vertices.

(c)

$C_5$ or the 5-cycle graph, starting at vertex $a$.

(d)

The following graph, starting at vertex $a$.

Exercise 22.3: Connectedness

A directed graph is strongly connected if there exists a directed path between every (ordered) pair of its vertices. The graph of 22.2(c) is strongly connected. However, the graph of 22.2(d) is not strongly connected because there is no $b \rightsquigarrow a$ path.

(a)

Implement a method to determine if a graph is strongly connected. What is the runtime complexity of your method?

1
2


/** Returns whether `graph` is strongly connected. */
static <V extends Vertex<E>, E extends Edge<V>> boolean isStronglyConnected(Graph<V, E> graph) { ... }

1
2


/** Returns whether `graph` is strongly connected. */
static <V extends Vertex<E>, E extends Edge<V>> boolean isStronglyConnected(Graph<V, E> graph) { ... }

An undirected graph is connected if there exists a path from every pair of vertices. A connected component is a maximal subset of vertices such that every pair of vertices in the set is connected by a path. By maximal, we mean that we cannot add a vertex to the component and still maintain connectedness.

(b)

Implement a method to count the number of connected components in an undirected graph. What is the runtime complexity of your method?

1
2
3
4
5


/**
 * Returns the number of connected components in `graph`.
 * Requires `graph` is undirected.
 */
static <V extends Vertex<E>, E extends Edge<V>> int numComponents(Graph<V, E> graph) { ... }

1
2
3
4
5


/**
 * Returns the number of connected components in `graph`.
 * Requires `graph` is undirected.
 */
static <V extends Vertex<E>, E extends Edge<V>> int numComponents(Graph<V, E> graph) { ... }

Exercise 22.4: Maze Solving

We've seen that solving mazes is a classic application of graph traversals. Let's modify our existing implementations to better support maze solving.

(a)

Revise our iterative BFS traversal to better handle maze setting by adding a parameter V destination. Modify the return type to be boolean that indicates if destination is reachable from source.

(b)

Modify recursive DFS and its wrapper to print the maze path in the correct order. Hint: modify the return values and parameters to build up a list of vertices in a path.

(c)

Between BFS and DFS, which is the better approach for maze solving? Consider the worst case.

Exercise 22.5: Cycle Detection

Write a method to determine if a graph contains a directed cycle. What is the runtime of your method?

1
2


/** Returns whether `graph` has a directed cycle. */
static <V extends Vertex<E>, E extends Edge<V>> boolean hasCycle(Graph<V, E> graph) { ... }

1
2


/** Returns whether `graph` has a directed cycle. */
static <V extends Vertex<E>, E extends Edge<V>> boolean hasCycle(Graph<V, E> graph) { ... }

Exercise 22.6: Bipartite Graphs

A bipartite graph is an undirected graph whose vertices can be divided into two sets, $L, R \subseteq V$ such that $L \cap R = \emptyset$ (the sets are disjoint) and every edge connects a vertex from $L$ to a vertex from $R$. The following graph is bipartite with the two sets highlighted by color.

Implement a method to determine whether a graph is bipartite.

1
2


/** Returns whether `graph` is bipartite. Requires `graph` is undirected. */
static <V extends Vertex<E>, E extends Edge<V>> boolean isBipartite(Graph<V, E> graph) { ... }

1
2


/** Returns whether `graph` is bipartite. Requires `graph` is undirected. */
static <V extends Vertex<E>, E extends Edge<V>> boolean isBipartite(Graph<V, E> graph) { ... }

Hint: How can we relate BFS distance to which set a vertex should be in?

Exercise 22.7: Topological Sort

Suppose you're planning out the classes you want to take during your four years here. You quickly realize that many of the courses at Cornell (including this one) require one or more prerequisite courses. For instance, you'll need to have taken this course before enrolling in CS 3110. We can model this problem as a graph with courses as vertices and an edge from vertex $u$ to $v$ if $u$ is a prerequisite for $v$. Then, we can leverage topological sort to identify an order to take the classes while satisfying prerequisite requirements. Topological sort only works on a directed acyclic graph (DAG), which is a directed graph without cycles.

(a)

Consider the above graph of core Cornell CS courses with edges representing prerequisites. For simplicity, we upgrade corequisites to prerequisites. Perform a DFS starting at CS 1110.

(b)

Write the settlement order of the vertices. Reverse the order. What do you notice about this?

(c)

Implement a method to return a topological sort of a directed, acyclic graph.

Exercise 22.8: Pretty Traversals

Modify DFS to print both the visit and settlement orders interleaved with pretty indenting. You may want to modify the method parameters to keep track of the indenting level. For instance, suppose you run DFS on the 5-vertex graph in 22.1.1 starting at $s$. The following should be printed out.

visiting s
  visiting a
    visiting b
      visiting t
      settling t
      visiting c
      settling c
    settling b
  settling a
settling s

Exercise 22.9: Iterative DFS

By using a stack in a similar manner to BFS, we can mimic recursive DFS. The key distinction is that our set should keep track of settlement, not discovery (i.e., our set should only contain settled vertices).

(a)

To see why this is the case, suppose we modify our bfsQueue() to use a Stack instead of a Queue (using push() and pop() instead of add() and remove()). What is the visitation order of the vertices when running this modified version on the 6-vertex graph starting at vertex $s$. Is this a valid DFS?

(b)

Correct the implementation to properly conduct a DFS traversal.

(c)

What is the space complexity of iterative DFS? Consider the worst case in which we have a complete graph (i.e., a graph where each pair of vertices has an edge connecting them).

Exercise 22.10: Finding BFS Levels

Each vertex in a graph can be associated with the level when running BFS on a graph.

(a)

Implement a method to return a Map from each vertex’s label to its level when running BFS starting at some source vertex using bfsQueue().

1
2
3
4
5
6


/**
 * Returns a map that associates each vertex to the level it's on when running BFS
 * on `graph` starting from `source`. Requires `graph.hasVertex(source) == true`.
 */
static <V extends Vertex<E>, E extends Edge<V>> Map<String, Integer> 
                                    getLayers(Graph<V, E> graph, V source) { ... }

1
2
3
4
5
6


/**
 * Returns a map that associates each vertex to the level it's on when running BFS
 * on `graph` starting from `source`. Requires `graph.hasVertex(source) == true`.
 */
static <V extends Vertex<E>, E extends Edge<V>> Map<String, Integer> 
                                    getLayers(Graph<V, E> graph, V source) { ... }

(b)

Use getLayers() to return the length of the shortest path from $s$ to $t$ in a graph.

1
2
3
4
5
6
7


/**
 * Returns the length of the shortest path from `s` to `t` in `graph`. Requires
 * `graph` is unweighted, `graph.hasVertex(s) == true`, and
 * `graph.hasVertex(t) == true`.
 */
static <V extends Vertex<E>, E extends Edge<V>> int shortestPathLength
                                            (Graph<V, E> graph, V s, V t) { ... }

1
2
3
4
5
6
7


/**
 * Returns the length of the shortest path from `s` to `t` in `graph`. Requires
 * `graph` is unweighted, `graph.hasVertex(s) == true`, and
 * `graph.hasVertex(t) == true`.
 */
static <V extends Vertex<E>, E extends Edge<V>> int shortestPathLength
                                            (Graph<V, E> graph, V s, V t) { ... }

Exercise 22.11: Edges in a BFS Traversal

Study the graph in Exercise 22.2(d).

(a)

Run BFS on this graph starting at vertex $a$. Highlight the edges that are used to discover new vertices during the traversal.

(b)

What kind of structure do the vertices and these highlighted edges make? What relationship can we state about the levels of the tail and head vertices of a highlighted edge? What relationship can we state about the levels of the tail and head vertices of a non-highlighted edge?

22. Graph Traversals

Depth-First Search

Recursive Implementation

Visualizing the Search

Complexity Analysis

DFS Traversals

DFS Visitation Order

DFS Settlement Order

BFS Traversals

Complexity Analysis

Main Takeaways:

Exercises

On this page: