CS 3110 Lecture 22
Directed graph representations and traversals

A directed graph G is an ordered pair (V,E) of a set of vertices V={v1,...,vn}, and a set of edges E={(vi,vj) : vj,vjV}. Generally we denote the number of vertices by |V| and the number of edges by |E|.

Directed graphs are commonly represented as an adjacency list, which comprises an array or list of vertices, where each vertex vi stores a list of all the adjacent vertices (i.e., a list of all the vertices for which there is an edge originating at vi). Another common representation is an adjacency matrix, which is a two-dimensional array, where A[i,j] is non-zero when there is an edge from vi to vj. In practice many graphs are sparse, in the sense that most of the possible edges between pairs of vertices do not exist, i.e. |E| << &Omega |V|2. In such cases an adjacency list is generally preferable to an adjacency matrix representation.

In this lecture we will consider an adjacency list representation for directed graphs, where each edge can additionally have an integer weight (which for now we will always take to be 1).

Here is a simple directed graph of four vertices (or nodes), V={v1,v2,v3,v4} and four edges E={(v1,v2), (v2,v3),(v3,v1), (v3,v4)},

[1] <-- [3]     
 |      >|     
 |   /   |   
 v /     v 
[2]     [4]  

Consider the following abstract data type for a directed graph with weighted edges. Note that while this specification does not explicitly require any particular implementation, the required running times of some of these functions constrain the implementation in various ways. For instance, a naive adjacency matrix implementation would take Θ(|V|2) time to consider every array entry in producing a list of all the edges. However the edges function is required to do this in in O(|V|+|E|) time.

(* A signature for directed graphs.*)
module type WGRAPH = sig
  type graph  (* A directed graph consisting of a set of vertices
               * V and directed edges E with integer weights. *)
  type vertex (* A vertex, or node, of the graph *)
  type edge   (* A edge of the graph *)

  (* create an empty graph *)
  val create: unit -> graph

  (* return the id of the specified vertex *)
  val vertex_id: vertex -> int

  (* compare two vertices, returning 0 if their id's are equal, -1 if
     the first has a smaller id and +1 if first has a larger id.
     Suitable for comparators in sets and maps. *)
  val compare: vertex -> vertex -> int

  (* For an edge return (src,dst,w) where src is the source vertex of
   * the edge, dst is the destination vertex, and w is the edge
   * weight. *)
  val edge_info: edge -> vertex * vertex * int

  (* True if the given graph is empty (has no vertices).  
   * Run time O(1). *)
  val is_empty: graph -> bool

  (* A list of all vertices in the graph, without duplicates, in the order
   * they were added.
   * Run time: O(|V|). *)
  val vertices: graph -> vertex list

  (* A list of all vertices in the graph, without duplicates, in the order
   * they were added.
   * Run time: O(1). *)
  val num_vertices: graph -> int

  (* A list of all edges in the graph, without duplicates.
   *Run time: O(|V|+|E|). *)
  val edges: graph -> edge list

  (* A list of the edges leaving the vertex v.
   * Run time: linear in the length of the result. *)
  val outgoing: vertex -> edge list

  (* A list of the edges coming in to the vertex v, where each edge is
   * reversed to go from v back to the other vertex.  
   * Run time: linear in the length of the result. *)
  val incoming_rev: vertex -> edge list

  (* The number of incoming edges for the specified vertex.
   * Run time: O(1). *)
  val in_degree: vertex -> int

  (* The number of outgoing edges for the specified vertex. 
   * Run time: O(1). *)
  val out_degree: vertex -> int

  (* Effect: adds a new singleton vertex (a vertex with no incident
   * edges) to the specified graph, and returns that vertex. 
   * Run time: O(1).*)
  val add_vertex: graph -> vertex

  (* Effect: removes specified vertex from the specified graph, and
   * all edges that are incident on that vertex, i.e., all edges that
   * have the vertex as a source or destination. 
   * Run time: O(|E|). 
   * Note that the total time for removing all |V| vertices in the 
   * graph is also O(|E|), and not O(|V||E|)*)
  val remove_vertex: graph -> vertex -> unit

  (* Effect: add_edge(src,dst,w) adds an edge from src vertex to dst
   * vertex, with weight w. 
   * Run time: O(1).*)
  val add_edge: vertex * vertex * int -> unit

  (* Effect: remove_edge(src,dst) removes the edge from src vertex to
   * dst vertex, has no effect if the edge does not exist. 
   * Run time: O(|E|).*)
  val remove_edge: vertex * vertex -> unit

  (* Creates and returns a copy of the graph. 
   * Run time: O(|V|+|E|). *)
  val copy: graph -> graph

end

Note that the create function creates and returns an empty graph. The add_vertex function takes a graph as argument, adds a new singleton vertex to that graph (a vertex with no edges starting or ending at that vertex), and returns the new vertex. The function add_edge takes two vertices (and a weight) and joins the vertices together by an edge. These are all O(1) time operations.

The abstraction also provides operations for getting the list of vertices and edges of a graph, as well as the list of outgoing edges from a given vertex. There is also a function incoming_ref which returns a list of reversed edges, by taking all the incoming edges for a given vertex and turning them into edges that go in the opposite direction. This function is useful for exploring the back-edges in a graph (i.e., exploring the reversed graph where all the edge directions are swapped).

The Graph module in lec22.ml implements this WGRAPH abstraction using adjacency lists of vertices, where both the outgoing edge list and the incoming edge for each vertex are explicitly represented. That is, each edge is stored twice, once at its source vertex and once at its destination vertex. This makes it easy to traverse the edges of the graph in reverse, which is useful for things like computing connected components (described below).

In that implementation, a graph is represented as a pair of counter of the number of vertices and a list of vertices. A vertex is represented as a triple of a unique integer ID, an outgoing list and an incoming list. The outgoing list is a list of pairs of destination vertex and weight. The incoming vertex list is a list of pairs of source vertex and weight. Note that as each edge is stored twice, in the incoming list of its destination and the outgoing list of its source, it must be consistent in the two lists.

In that implementation an edge is a triple of a source vertex, destination vertex and weight, and is constructed on the fly from the vertex out lists rather than being stored in the data structure.

Traversals and Connected Components

One of the most basic graph operations is to traverse a graph, finding the nodes accessible by following edges from a particular node. You have already seen this kind of operation in courses such as CS2110, at least for undirected graphs. The key idea in graph traversal is to mark the vertices as we visit them and to keep track of what we have not yet explored. We will consider each vertex to be in one of the states undiscovered, or discovered. We'll start with a single discovered vertex, and consider each outgoing edge. If an edge connects to an undiscovered vertex, we'll mark the vertex as discovered and add it to our set of vertices to process. After we've looked at all the edges of a vertex, we'll grab another vertex from the set of discovered vertices. The order in which we explore the vertices depends on how we maintain the collection of discovered vertices.

If we use a queue for the above traversal process it is known as a breadth-first search, or BFS. If we use a stack to store the discovered vertices instead, we venture along a single path away from the starting vertex until there are no more undiscovered vertices in front of us. This is known as a depth-first search.

Of course, such a traversal will only find nodes where there is some path connecting them to the start node. If there are multiple disconnected components in the graph, DFS or BFS from a single node will not reach all of the nodes in the graph.

Here is an implementation of traversal in a directed graph, using the above abstraction. This implementation makes use of a set of vertices, of type VSet, to keep track of the visited vertices. It performs a BFS or DFS depending on whether the Queue or Stack package is opened. It also can traverse either the edges of the graph or of the ''reverse'' graph (in which all the edges have been reversed), based on the parameter dir.

module VSet = Set.Make(struct type t = Graph.vertex 
                       let compare = Graph.compare end);;

open Queue;; (* Queue for BFS, Stack for DFS *)

let traverse v0 dir =
  let disc = create()
  and visited = ref VSet.empty in
    (* Expand the visited set to contain everything v goes to,
     * and add newly seen vertices to the stack/queue. *)
  let expand(v) =
    let handle_edge(e) =
      let (v, v', _) = Graph.edge_info(e) in
	if not (VSet.mem v' !visited)
	then (visited := (VSet.add v' !visited);
              push v' disc)
	else ()
    in
      List.map handle_edge (if dir<0 then (Graph.incoming_rev v)
			    else (Graph.outgoing v))
  in
    (visited := VSet.add v0 !visited;
     push v0 disc;
     while (not (is_empty disc))
     do ignore(expand(pop disc)) done;
     !visited)

Recall that in an undirected graph, a connected component is the set of nodes that are reachable by traversal from some node. The connected components of an undirected graph have the property that all nodes in the component are reachable from all other nodes in the component. In a directed graph, however, this is not the case. For instance, for the graph above the component starting from any of nodes v1, v2 or v3 is the set {v1, v2, v3, v4}, whereas the component starting from node v4 is simply the singleton {v4}. These are termed weakly connected components (or WCC) of a directed graph.

The strongly connected components (or SCC) in a directed graph are defined in terms of the set of nodes that are mutually accessible from one another. In other words, when searching from some source node, not all reachable nodes are included, but rather only those for which there is a path back to the source node. It is relatively straightforward to show (left as an exercise) that the strongly connected component from a node vi can be found by searching for nodes that are accessible from vi both in G and in Grev, where Grev has the same set of vertices as G, and has the reverse of each edge in G. Thus the following simple algorithm finds the strongly connected components (which like the connected components in an undirected graph form a partition of the vertices):

let strong_component v0 =
  VSet.inter (traverse v0 1) (traverse v0 (-1))

let strong_components g =
  let vs = ref VSet.empty
  and cs = ref [] in
    (List.iter (function (v) -> vs := VSet.add v !vs) (Graph.vertices g);
     while (not (VSet.is_empty !vs)) do
       let c = strong_component (VSet.choose !vs) in
	 (vs := VSet.diff !vs c;
	  cs := c::!cs)
     done;
     !cs)
For instance, for the graph above the strongly connected components are {v1, v2, v3} and {v4}.

Topological Ordering

In a directed acyclic graph (DAG) the nodes can be ordered such that each node in the ordering comes before all the other nodes to which it has outbound edges. This is called a topological sort of the graph. In general there is not a unique topological order for a DAG. As soon as there are cycles in the graph, a toplogical ordering is no longer defined. Topological orderings have many uses for problems ranging from job scheduling to determining the order in which to compute quantities that depend on one another (e.g., in spreadsheets). The following figure shows a DAG and a toplogical ordering for the graph.

Here is a simple recursive function for computing a topological ordering, which operates by choosing a vertex with no incoming edges as the first node in the ordering, and then appending that to the result of recursively computing the ordering of the graph with that node removed. If in this process there ever is a graph where all the nodes have incoming edges, then the graph is cyclic and an error is signaled. The running time of this method is O(|V|2), whereas the asymptotically fastest methods are O(|V|+|E|).

let topological_rec g =
  let rec topological_destr gr =
    let vl = Graph.vertices(gr) in
      if vl = [] then []
      else
	let sl = List.filter (function (v) -> Graph.in_degree(v) = 0) vl in
	  if sl = [] (* No vertices without incoming edges, have a cycle *)
	  then raise(Failure "Graph cyclic")
	  else
	    let v = List.hd sl in
	      (Graph.remove_vertex gr v;
	       v::topological_destr gr) in
    topological_destr (Graph.copy g)

Here is an iterative version of topological sort which has O(|V|+|E|) running time. Note that while remove_vertex is O(|E|) time for a single vertex, it is also O(|E|) time when all O(|V|) vertices of the graph are removed, because each edge is considered a constant number of times overall in the process of removing all the vertices.

let topological_iter g =
  let gr = Graph.copy g in
  let sl = ref (List.filter
		  (function (v) -> Graph.in_degree(v) = 0) (Graph.vertices gr))
  and revorder = ref [] in
    while !sl <> [] do
      let v = List.hd !sl in
	(sl := List.tl !sl;
	 List.iter
	   (function (e) -> 
	      match Graph.edge_info(e) with (_,dst,_) ->
		if Graph.in_degree dst = 1
		then sl := dst::!sl else ())
	   (Graph.outgoing v);
	 Graph.remove_vertex gr v;
	 revorder := v::!revorder)
    done;
    if Graph.num_vertices gr = 0
    then List.rev !revorder
    (* Remaining vertices all with incoming edges, graph is cyclic *)
    else raise(Failure "Graph cyclic")