22. Graph Traversals
Many questions we wish to answer about graphs require us to perform a traversal, visiting the vertices in a graph one at a time to learn about the graph’s structure. While we could simply iterate over one of the private data structures that model the graph’s state (e.g., the vertices map in our graph class definitions from the previous lecture), this does not guarantee that the vertices are visited in an order conducive to understanding the graph’s structure. Instead, we’d like our traversal to “follow the structure” of the graph, using edges to discover neighbors and chaining together these discoveries to build up paths. These “structured” traversals will allow us to answer questions about the graph, such as detecting the presence of cycles, ordering the vertices according to some desired properties, or locating optimal paths from one vertex to another. We’ll focus primarily on this last objective over the next two lectures. Today, we’ll discuss two traversal strategies, depth-first (DFS) and breadth-first (BFS) searches, for visiting the vertices reachable from a given source vertex. In the next lecture, we’ll build on the ideas of BFS to develop Dijkstra’s algorithm for finding shortest paths in a directed weighted graph.
Throughout today’s lecture, we’ll work with a version of the map-based adjacency list graph representation that we developed in the previous lecture. We will restrict our attention to unweighted edges today, which we’ll model using an AdjListEdge record class storing just the tail and head vertices of each edge. The full source code of the AdjListGraph implementation that we’ll use is shown below and provided with the lecture release code.
|
|
|
|
In today's lecture, when we start to use graphs from the client side, you'll hopefully begin to appreciate the complicated generic code that we wrote in the previous lecture. It will help to keep our interactions with graphs simple while enabling the code that we write to naturally adapt to graphs with other types of vertices and edges. In summary, our implementation achieves good parametric polymorphism.
Depth-First Search
Imagine that we wanted to write code to solve a maze (or even to perform the simpler task of confirming to us that the maze is solvable in the first place, before we devote time to trying to find the solution ourselves). How might we do this?
To start, we’ll need a way to model the maze in our program. We can do this using a graph. We can view each of the square cells of the maze as a vertex in the graph (labeled with its coordinates), and we can draw edges (in both directions, since travel in a maze is not direction-specific) between adjacent vertices that do not have a wall between them.
Our maze-solving problem has been transformed into a question about a graph:
In the language of graph theory, we are asking whether the vertices (0,0) and (4,4) are connected.
Two vertices \(u\) and \(v\) in a directed graph are connected if the graph contains a path from \(u\) to \(v\).
Now that we have a familiar object, a graph, on which our code can operate, let’s think about how we can solve this problem. If we were truly standing inside the maze, unable to view it from above, all we could do is start to walk around. We’ll move between adjacent squares, repeating this process until we either find the end or are confident that we have explored all of the possible paths. At certain points of our exploration, we may hit a dead end and need to turn around, retracing our steps until we reach an intersection with a path that we have not yet explored.
Translating this to our graph, we’ll begin at the starting vertex (which we’ll call the source). From there, we’ll follow an outgoing edge from the source vertex to reach a new vertex, repeating this process to work our way deeper into the maze. If we ever reach a dead end, we’ll need a way to backtrack and choose a new path. If we ever reach the ending vertex (which we’ll call the destination), we will have discovered the solution and would like a way to “back-calculate” the path we took to get there.
This graph traversal strategy is referred to as a depth-first search since it follows one path as deep into the graph (or maze) as it can before considering alternate paths.
In a depth-first search, we follow one outgoing edge from each vertex beginning at the source until we can no longer make progress. Then, we backtrack to the most recent decision point, revise our choice, and explore an alternate path, continuing this process until all edges have been followed.
Note that the DFS procedure is underspecified; it does not tell us which particular outgoing edge to choose in each step of the algorithm. Different choices will lead to different traversals, all of which can be classified as depth-first searches.
Recursive Implementation
DFS admits a natural recursive algorithm. Each time we reach a vertex, we can check whether it is the destination (the base case). If not, we can launch a new depth-first search (the recursive call) for the destination from each of the current vertex’s neighbors. Since one of these recursive calls will fully evaluate before the next one begins, we will deeply explore one path before starting to consider an alternate path.
An initial attempt to code up this approach is shown below.
|
|
|
|
Can you find an issue with this implementation? It may help to trace through a smaller example than our larger maze. For example, consider calling dfs() with source vertex “s” and destination vertex “t” in the following graph:
What’s the problem?
To circumvent this issue, we must find a way to keep track of the vertices that we have already discovered so that we do not get caught looping around the graph without making any progress. To do this, we’ll need to augment our search with a set of discovered vertices. Before we can formalize this, let’s settle on some terminology that we’ll use to describe the state of the vertices during our search.
- Initially, all of the vertices except the source vertex are undiscovered.
- We discover a vertex (i.e., it becomes discovered) the first time that we identify it as the
head()of an edge we are considering. - Over the course of our search, we visit one vertex (i.e., it becomes visited) in each step (recursive call or, later, iteration). During this visit, we consider all of its outgoing edges and potentially take other actions.
- After we finish visiting a vertex, it becomes settled; we have extracted all useful information from it to aid in our search process.
In DFS, we visit a vertex as soon as it is discovered; since we are prioritizing the depth of our search, we pause our visit of the current vertex to go visit its newly discovered neighbor. To avoid the aforementioned problem, we never want to discover (or visit) a vertex more than once. Thus, we’ll add a vertex to a discovered set just before its (first) visit, and we’ll add logic to never revisit a discovered vertex.
Since we need to pass this discovered set through the recursive calls, we’ll need to modify the method signature. We’ll use our, hopefully familiar, technique of delegating to a separate recursive helper method. The corrected DFS code is given below.
|
|
|
|
Visualizing the Search
Step through the following animation to visualize the execution of DFS on a small example graph. To introduce some visual vocabulary that we will use throughout the rest of this and the next lecture,
- We’ll (continue to) use green shading to indicate the vertex that we are currently visiting. From our earlier terminology, we know that this vertex is discovered but not yet settled.
- We’ll use light red shading to indicate all other vertices that have been discovered but not yet settled. Later, we will refer to these as the frontier vertices, since one of them will always be the next vertex that we visit.
- We’ll use dark red shading to indicate that a vertex has been settled.
- All unshaded vertices are undiscovered.
previous
next
From this animation, we can observe some invariants of our recursive DFS implementation. These particularly relate to the frontier vertices, the vertices that have been discovered but not yet settled (including the vertex that is currently being visited). Notice that throughout the algorithm, there is one active stack frame for each frontier vertex. Moreover, these frontier vertices form a single path from the source vertex to the vertex that we are currently visiting, and the order of the vertices along this path is their order (bottom to top) on the runtime stack. Together, these observations tell us that at the point where we hit the base case by reaching the destination vertex, the active stack frames correspond to a path from the source to the destination. We can use this insight to modify the specifications of our DFS method to produce this path (see Exercise 22.4).
Complexity Analysis
Now that we have a working DFS implementation and better understand how it works, we can carry out a complexity analysis. We’ll need to reason about the total time and space usage across all the recursive calls of dfsRecursive().
Let’s start with the time complexity. We’ll focus on the visitation loop in the dfsRecursive() method, as this will dominate the runtime; all of the other operations in both methods run in (expected) \(O(1)\) time, and there are a total of \(O(|V|)\) calls to dfsRecursive() (one per vertex that we visit), giving a total time complexity of \(O(|V|)\) outside of the visitation loop.
|
|
|
|
The analysis of DFS is a bit subtle. Similar to Merge Sort, we cannot use our usual strategy of separately bounding the non-recursive work in each call and then adding these over all the recursive calls. This will give too loose of a bound. Instead, we'll reason about the total number of executions of each line across all of the recursive calls at once. Read the following very carefully.
- The work done to advance the iterators in this enhanced-
forloop is \(O(1)\) per iteration. Across all of the recursive calls, there are \(O(|E|)\) iterations of these loops. There is one iteration per outgoing edge from eachcurrentvertex, and each vertex can be thecurrentvertex only once over the course of the search. Thus, each edge corresponds to at most one loop iteration. - The method calls in lines 2 and 3 each require \(O(1)\) time and are run at most \(O(|E|)\) times across all the recursive calls for an overall \(O(|E|)\) contribution to the runtime.
- We can enter the body of the outer
if-statement at most \(O(|V|)\) times, once per vertex that we discover. The non-recursive work on lines 4 and 6 requires \(O(1)\) time per execution, for an overall \(O(|V|)\) contribution to the runtime. - The work to set up the \(O(|V|)\) call frames over the algorithm’s execution contributes \(O(|V|)\) to the runtime.
Adding all these contributions, we find that the overall runtime of our recursive DFS implementation is \(O(|V| + |E|) = O(|E|) \). Here, the latter simplification follows since we can only reach a new vertex (and do work as we visit that vertex) by following an edge, meaning the number of vertices we visit is asymptotically upper-bounded by the number of edges we traverse.
For the space complexity, the dfs() method allocates a HashSet on the heap that can grow to include \(O(|V|)\) elements, requiring \(O(|V|)\) space. Each of the \(O(|V|)\) dfsRecursive() calls utilizes \(O(1)\) stack space, for an overall space complexity of \(O(|V|)\).
DFS Traversals
The DFS procedure has many use cases beyond identifying the existence of a path between two vertices in a graph. Since it guarantees to visit each vertex in a (strongly connected) graph exactly once, it provides us with a systematic way to traverse and perform some “action” at each of the vertices. By carefully choosing these actions, we can answer many different graph-theoretic tasks (that you’ll likely discuss more in a discrete math or algorithms class), such as detecting cycles (see Exercise 22.5), determining whether a graph is bipartite (see Exercise 22.6), or computing a topological order of a graph’s vertices (see Exercise 22.7). Next, we’ll see how we can generalize our DFS code to accommodate these more general actions.
An “action” is a list of behaviors that we wish for our code to perform during the traversal. Recall that functional interfaces provide a mechanism in Java to package and pass behaviors to a method. We’ll use the Consumer functional interface, instantiating the generic type T = String to model a function acting on each vertex label during the traversal. When we visit a vertex v during our traversal, we will call the accept() method, passing in v.label(), to carry out the action requested by the client.
There are two possible times when a client may wish for an action to be performed during the visitation of a vertex.
- An action can be performed at the beginning of the visit, before any of the vertex’s outgoing edges are explored. We call this a “pre” action, since it is analogous to how a pre-order traversal produces the root of a subtree before traversing either of its child subtrees.
- An action can be performed at the end of the visit, after the outgoing edges are explored and just before the vertex becomes settled. We call this a “post” action, since it is analogous to how a post-order traversal produces the root of a subtree after traversing both of its child subtrees.
The recursive DFS traversal method that we’ll write will be parameterized on both of these, allowing its client to specify both a “pre” and a “post” action. Similar to our search, our public dfsTraverse() method will delegate most of its work to its private dfsVisit() helper.
|
|
|
|
We can use this dfsTraverse() method to print out two different traversal orders
of our graph.
DFS Visitation Order
First, we’ll consider the DFS visitation order.
Given a directed graph \(G\) with a designated source vertex \(s\), a DFS visitation order lists the vertices of \(G\) (that are reachable via paths from \(s\)) in the order that they are first visited by a DFS traversal beginning at \(s\).
We can also call this order a DFS discovery order; in DFS, vertices are visited as soon as they are discovered. To print the vertices in visitation order, the “pre” action of our traversal should print the vertex label, and the “post” action should do nothing. We can achieve this in client code by writing
|
|
|
|
Here, we assume that g references an AdjListGraph object with a source vertex labeled “s”. The lambda expression v -> System.out.print(v + " ") is instantiated as a Consumer whose accept() method prints out the vertex label. The lambda expression v -> {} is instantiated as a Consumer whose accept() method does nothing.
Step through the following animation to trace through the traversal of the following directed graph to see how a DFS visitation order is computed.
previous
next
As we noted earlier, DFS is an underspecified procedure since it does not specify in which order the (undiscovered) neighbors of a vertex should be visited. Therefore, the DFS visitation order (as well as the DFS settlement order that we'll introduce next) is not unique. Exercise 22.2(a) asks you to determine the other possible DFS orders of this graph.
DFS Settlement Order
Next, we’ll consider the DFS settlement order.
Given a directed graph \(G\) with a designated source vertex \(s\), a DFS settlement order lists the vertices of \(G\) (that are reachable via paths from \(s\)) in the order that they are settled by a DFS traversal beginning at \(s\).
To print the vertices in settlement order, the “pre” action of our traversal should do nothing, and the “post” action should print the vertex label.
|
|
|
|
Step through the following animation to trace through the traversal of the following directed graph to see how a DFS settlement order is computed.
previous
next
BFS Traversals
Depth-first searches work by considering one path from the source for as long as possible before backtracking to consider alternate paths. This can work well when our neighbor choices allow us to quickly near the destination. However, if we are unlucky with our initial choices then it may take a while before we backtrack all the way to the start to consider the correct path. An alternate search method can “simultaneously” advance along all of the search paths, fanning out the search from the source to give equal consideration to all paths. This approach is known as a breadth-first search since it prioritizes “broadening” the search at the expense of quickly advancing deeply along any one particular path.
In a breadth-first search, we proceed in "levels" of the graph, systematically discovering all neighboring vertices in one search level before proceeding to the next level.
In a BFS, our search no longer conforms to our physical intuition of stepping between adjacent nodes. Rather, we will be jumping from one area of the graph to another as we “fan out” our search. We’ll need a data structure to help keep track of the bookkeeping for which vertex to visit next. In this case, a queue is a natural choice. As we are visiting the vertices in one “level” of the graph, we can enqueue() the (undiscovered) neighbors of these vertices (which comprise the next “level”) to visit after. Overall, this gives rise to the following BFS traversal procedure:
- Initialize an initially empty queue of vertices to visit and an initially empty set of
discoveredvertices. - Initiate the traversal by
add()ing the source vertex to the queue and marking it as discovered. - While the queue is not empty,
remove()the first vertex from the queue and visit this vertex. During the visit, perform the desired action, and then iterate over the neighbors of this vertex. For any that are undiscovered, discover them and thenadd()them to the queue to visit later. At this point, our vertex has been settled. - Once the queue is empty, all (reachable) vertices have been settled, so the traversal is complete.
The code for this procedure is given below.
|
|
|
|
We refer to the queue as a frontier since it contains all of the vertices that have been discovered but have not yet been visited. Step through the following animation to visualize a BFS traversal of our graph that prints the vertices in the order they are visited.
|
|
|
|
previous
next
We sometimes refer to a BFS traversal of a graph as a level-order traversal, since it visits the vertices in increasing order of level.
In a graph \(G\) with a designated source vertex \(s\), the level of a vertex \(v\) is the length of the shortest path from \(s\) to \(v\). By convention, \(s\) has level 0 and all vertices that cannot be reached on a path from \(s\) have level \(\infty\).
The BFS visitation order is sometimes referred to as a level-order traversal since it visits the vertices in increasing order of level.
The levels of the vertices in our example graph are visualized below:
Since our BFS traversal proceeds systematically by level, it will never need to visit vertices that are farther from the source (i.e., have a greater level) than the destination. In this way, BFS avoids the pitfall of DFS of spending a long time exploring a deep path that ultimately proves to be a dead end. However, the large “fan out” of BFS means that a lot of work can be done exploring many paths in the initial levels besides the correct path to the destination.
We can uncover a lot of additional structure by considering the levels of the vertices in a BFS traversal and also the set of edges that are used to add new vertices to the frontier. We will pick up from here at the start of the next lecture as motivation for Dijkstra’s shortest path algorithm. If you’d like to explore some of these ideas yourself, take a look at Exercises 22.10 and 22.11. To conclude today’s lecture, we’ll analyze the time and space complexities of our iterative BFS implementation.
Complexity Analysis
Let’s begin by analyzing the space complexity of our BFS implementation. Since this is an iterative method, we do not need to worry about the call stack. Instead, we need only to consider the space occupied by our discovered set and frontier queue. At most, each of these structures can contain one copy of each vertex, for an overall \(O(|V|)\) size. Thus, our BFS implementation has an \(O(|V|)\) space complexity.
For the time complexity, we’ll consider the method line by line.
- The initializations before the
while-loop runs in \(O(1)\) time. The guard on thewhile-loop is an \(O(1)\) check. - In each iteration of the loop, one vertex is removed from the frontier queue. Over the course of the loop, each vertex is added to the frontier queue at most once. Thus, the loop runs for \(O(|V|)\) iterations.
- Within the loop, the queue removal is an \(O(1)\) operation.
- We’ll assume that the call to
action.accept()is an \(O(1)\) operation since the time processing one vertex will likely be a “local” operation that does not depend on the overall size of the graph. - Similar to our analysis of DFS, over the course of all of the outer-loop iterations, our inner loops will iterate over each graph edge at most once (since each edge has a distinct
tailvertex). Thus, over the course of thebfsQueue()method, the innerfor-loop runs for \(O(|E|)\) iterations. - Within this inner
for-loop, all of the lines are \(O(1)\) operations.
Adding all these contributions, we find that the overall runtime of our iterative BFS implementation is \(O(|V| + |E|) = O(|E|) \), the same runtime as our DFS implementation.
Main Takeaways:
- A graph traversal is a method that guarantees to visit each (reachable) vertex in a graph exactly once and perform a specified action during this visit.
- A depth-first search traverses a graph by following a path of undiscovered vertices until it reaches a dead end, and then backtracking to the most recent branching point to continue the traversal.
- The most natural implementation of a DFS is recursive, since this allows us to keep track of its progress by pushing and popping frames on the runtime stack.
- A breadth-first search branches out to simultaneously explore all paths from the source vertex at the same rate. It visits the vertices in level order.
- The most natural implementation of a BFS is iterative, using a queue to keep track of the frontier vertices that have been discovered but not yet visited.
- Both DFS and BFS have a worst-case \(O(|V|+|E|) = O(|E|)\) time complexity and a worst-case \(O(|V|)\) space complexity.
- Graph traversals are an important primitive for many graph calculations and algorithms.
Exercises
You are reading an implementation of a graph search algorithm. As new nodes are discovered, they are added to a singly linked list. On each iteration of the loop,
- A node is removed from the end of the list.
- All of the node's undiscovered neighbors are appended to the beginning of the list.
Implement a method to determine if a graph is strongly connected. What is the runtime complexity of your method?
|
|
|
|
Implement a method to count the number of connected components in an undirected graph. What is the runtime complexity of your method?
|
|
|
|
V destination. Modify the return type to be boolean that indicates if destination is reachable from source.
|
|
|
|
|
|
|
|
CS 1110.
visiting s
visiting a
visiting b
visiting t
settling t
visiting c
settling c
settling b
settling a
settling s
bfsQueue() to use a Stack instead of a Queue (using push() and pop() instead of add() and remove()). What is the visitation order of the vertices when running this modified version on the 6-vertex graph starting at vertex \(s\). Is this a valid DFS?
Implement a method to return a Map from each vertex’s label to its level when running BFS starting at some source vertex using bfsQueue().
|
|
|
|
Use getLayers() to return the length of the shortest path from \(s\) to \(t\) in a graph.
|
|
|
|
tail and head vertices of a highlighted edge? What relationship can we state about the levels of the tail and head vertices of a non-highlighted edge?