CS410, Summer 1998
Lecture 16 Outline
Dan Grossman

Goals:
  * Finish the proof of the running time for union-find with union-by-rank
    and path compression.
  * Begin our study of sorting algorithms

Theorem: The total running time of union-find with union-by-rank and
path compression over m finds and n-1 unions is O((m+n)log* n) where
log* is the inverse tower function.

Proof: 

Recall the implementation, definition of rank, and 6 facts from
yesterday's lecture.

Total running time = initialization time + link time + find time
                      ^^^^^^^^^^^           ^^^^^^^     ^^^^^^
			O(n)		    O(n)	???
O(n) for initialization is obvious.  O(n) link is just n-1 times O(1)
per link.  So it suffices to show the total find time is O((m+n)log* n)...

The total time for m finds is the total number of edges followed over
those finds.  We will count them in a very weird way in order to prove
the bound.  First we divide the possible ranks into groups for reasons
that will only become apparent later:

     If the rank is between      then we put it in 
	___  and ___                group
	
         0        1                   0
         2        2                   1
         3        4                   2
         5        16                  3
         17       65536               4
         65537    2^(65536)           5
             ...                     ...
(Although the ... is unnecessary here in reality.)

So if rank(x) is i, then rank(x) is in group log* i.

Now when following a path to the root during a find, we put each edge
followed in one of three categories:
  * The last edge followed -- i.e. the one right below the root
  * Edges crossing group boundaries -- i.e. edges from x to y where
    rank(x) and rank(y) are in different groups.
  * Edges within group boundaries -- i.e. all the others.
(For the last two categories, we mean "that are not last edges".  So
every edge is in exactly one of the three categories.)

find time = sum over all finds of (all edges traversed)

Using our categories, we have:

find time = sum over all finds of 
	(last edges traversed + boundaries traversed + within group traversed)
          
We can distribute summation across plus:

find time = (sum over all finds of last edges traversed) 
	    + (sum over all finds of boundaries traversed)
	    + (sum over all finds of within group edges traversed)

	  = m
	    + O(m log* n)
	    + (sum over all finds of within group edges traversed)

The first term is m because every find has exactly one last edge.  The
second term is O(m log* n) because fact 4 guarantees that every find
path has constantly increasing ranks, and fact 5 guarantees that ranks
can't get very big.  So in fact, on each find we cannot cross more
than log* n boundaries, so the total over m finds is m log* n.

So it suffices to prove:
 (sum over all finds of within group edges traversed) is O(n log* n)

Here we will use another piece of accounting creativity.  Instead of
summing over the "within group edges" for each find, we will sum over
the "within group edges" for each node.  That is,

(sum over all finds of "within group edges" traversed)
   = (sum over all nodes x of the number of times during all finds that a
      "within group edge" from x to another node is traversed)

All we have done is rearranged our summation to sum over nodes instead
of finds -- we are counting the exact same number of things.

Fact A: Let x be a particular node in group g.  The number of times
during all finds that a "withing group edge" leaving x is traversed is
<= F(g) where F is the tower function.

Proof: An edge only leaves x if it is not a root.  By Fact 2, x's rank
never changes once it is not a root.  So it is legal to talk about its
group g -- g will not change.  Now, the edges we are talking about
are, by definition, not last edges -- so every time one is traversed
we know PATH COMPRESSION will occur.  So by Fact 4, x will have a new
parent with an even greater rank.  So once x's parent is in another
group it will always be in another group and no more of the "within
group" edges will occur.  So the question is just how many times the
rank of the parent can increase before the parent must be in another
group.  This is bounded by the size of x's group which is less than
F(g).

Fact B: The number of nodes in a group g is <= n/(F(g)) where F is the
tower function.

Proof: 

nodes in group g <= sum over all ranks r in the group of the //by defn of group
			maximum number of nodes with rank r

	= sum from r=F(g-1)+1 to F(g) of the        //by defn of group
		maximum number of nodes with rank r

	= sum from r=F(g-1)+1 to F(g) of n/(2^r)    //by fact 6

        = n/(2^F(g-1)) * (1/2 + 1/4 + 1/8 + ... + 1/a big number) //by math
				
        < n/(2^F(g-1)) * 1 // by math
 
        = n/(F(g)) // by definition of the tower function

Putting Facts A and B together, the total number of "within group" edges
in group g is:

   (number of within group edges per node)*(number of nodes in group)

<= F(g) * n/F(g)  // by facts A and B

 = n	// by math

Fact C: There are at most log* n groups.
Proof: Explained earlier.

Putting A, B, and C together, the total total is log* n times the "within
group edges" per group.  So the total is n log* n.

As explained above, this is all we needed to prove the theorem.


===============

SORTING

Since we are going to spend several lectures on sorting, we ought to
specify what it means to sort and what other sort of specifications we
are concerned with.

Let a_1, a_2, ..., a_n be a collection of objects such that for any i
and j, a_i <= a_j or a_i > a_j.  Then to sort the collection means to
give a permutation of the object such that if a_i <= a_j then a_i
"appears before" a_j in the permutation.

Stability -- we call a sorting method stable if ties between elements
are always resolved by their original order.  Otherwise we call the
sort unstable.  Formally, stability means if a_i == a_j then a_i
appears before a_j if and only if i < j.

We will learn stable and unstable methods.  The unstable methods may
be faster, but sometimes we need stability for our application.  We
can always make an unstable method stable by putting an "original
position" field on each object and using it resolve what would
otherwise be ties.  However, in practice this slows down our
originally faster unstable method enough that it is usually wiser just
to use a stable method in the first place.

We will also consider whether various sorting methods are appropriate
for linked lists.  Of course, we could always:
	* convert the linked list to an array in O(n) time where n is the
	  length of the list.
	* sort the array
	* convert the array to a linked list in O(n) time 
This incurs overhead that can be avoided if we could simply sort the
linked list.  Array methods may be faster though.  Resolving the
trade-off may require empirical data.

Next time we will begin discussing particular methods.