CS410, Summer 1998
Lecture 22 Outline
Dan Grossman

Goals: Intro. to Graphs

CLR Chapter 5.4, 23

Lots and lots of definitions:

* A graph is a set of vertices (nodes) and edges.  We usually take the vertices
  to be the numbers 0, 1, ... n-1.  The edges are pairs of vertices.
  We write G=(V,E) where E is a subset of V x V.

* Directed and undirected -- in an undirected graph, if (u,v) is an
  edge then (v,u) is an edge.  In a directed graph, this isn't necessarily true.
  In a directed graph the edge (u,v) means there is an edge _from_ u
  _to_ v.

* u is adjacent to v if (u,v) is an edge or (v,u) is an edge.

* The in-degree of a node is the number of edges going to it.  The out-degree
  is the number of edges coming out of it.  For directed graphs, the
  degree is the in-degree plus the out-degree.  For undirected graphs,
  in-degree, out-degree, and degree are all the same thing.

* A path from u to v is a sequence of vertices v_1, v_2, ..., v_(n-1) such
  that (u, v_1), (v_1, v_2), ..., (v_(n-1), v) are all edges.  If such a path
  exists then v is reachable from u.  If no vertex is repeated in the sequence,
  then the path is simple.  The length of the path is the number of
  vertices in the sequence.

Claim: If v is reachable from u, then there is a simple path from u to v.

* A cycle is a path from u to u.  If the path without the second u is
  a simple path, then the cycle is a simple cycle.  (That is, a simple
  cycle has no cycles "inside of it".)  A graph with no cycles is acyclic.  
  A graph with cycles is cyclic.  A directed acyclic graph is called a dag.

* The connected components of an undirected graphs are the sets of vertices such
  that:
    * every vertex in the set is reachable from every other
    * no superset of the set has this reachability property.
  For directed graphs, these are called the strongly connected componenets.
  A graph with one (strongly) connected component is called connected.

* In a weighted graph, every edge also has a real number assigned to
  it.  Sometimes we sill stipulate that the numbers are non-negative,
  but not always.

* A forest is a collection of trees.

* A directed graph is bipartite if no node has both incoming and
  outgoing edges.

We can compare many of the data structures we have learned.  Let i ->
j mean "if something is an i then it is a j":
 
                                bipartite graphs
                                         \
                                          V
    lists -> trees -> connected dags -> dags -> graphs -> weighted graphs
                  \        _____________/^
                   V      /
                   forests

Graphs can be used to represent all sorts of things.  Here's a small list:
* street maps
* computer network connections
* which functions call which other functions in a program
* which objects act on which others in a physical simulation
* mathematical relations
* the diagram above (expressing "all is are js") is a dag!
* much, much more

Unfortunately, computer memory is not made out of nodes and edges, so
we need representations for graphs:

* Adjacency Lists
  Keep an array of lists.  At index i, store a linked list of all vertices j
  for which there exists an edge from i to j.  (For undirected graphs, put the
  edge in both lists.)

* Adjacency Matrix
  Keep a 2-dimensional array.  At row i column j put a 1 if (i,j) is an edge
  else put a 0.  For an undirected graph, half the 2-D array is redundant and
  need not be stored.

Let n be the number of vertices and m the number of edges.  Clearly, m
<= n^2.  Also let d_i be the out-degree of vertex i. Let's compare
the representations:

				Adjacency List	Adjacency Matrix
Space				 O(n+m)          O(n^2)
Time for is(i,j) an edge?        O(d_i)          O(1)
Time for all edges out of i      O(d_i)          O(n)
Insert an edge                   O(1)            O(1)

Generally an adjacency matrix takes more space unless the graph is
dense (most possible edges are there) in which case it takes less
because each entry is one bit rather than a linked list node.  The
trade-off between the second and third line above must be resolved per
application.  If necessary, we could maintain both representations and
get the best-of-both-worlds time bounds in exchange for the increased
space.

The bottom line is that for our algorithms we will assume we have
whichever representation is more convenient to our present purpose.

It is straightforward to convert from one representation to the other
in time O(n^2).

Question: In a graph (unweighted), what is the (shortest) distance
from node s to every other node and what is some path with that distance.

Answer: Use breadth-first search (BFS)

mark each node unvisited (generally use a separate array)
initialize dist of each node to infinity // will hold shortest distance
initialize pred of each node to null // will point to node before it in
q = new Queue();                     // some shortest path

visited[s] = true;
dist[s] = 0;
q.enqueue(s);

while (!s.isempty())
  t = q.dequeue();
  for all edges out of t and into v
	if (!visited[v])
	   visited[v] = true;
           dist[v] = dist[t] + 1;
	   pred[v] = t;
	   q.enqueue(v);

We worked through an example in class.  Next time we will examine the
running time and prove that the algorithm is correct.