CS410, Summer 1998
Lecture 20 Outline
Dan Grossman

Goals:
  * Finish median algorithms
  * Linear-Time Sorting
  * Radix Sorting

At the end of last class, we were inspired by quicksort to come up
with the following algorithm for finding the element that would be kth
were an array sorted:

quick-select(array, k, l, r) {
  p = partition(array, l, r)
  if (p > k)
      quick-select(array, k, l, p-1)
  if (p == k)
        return array[p]
  else
      quick-select(array, p-k, p+1, r)
}

This is just quicksort where we only recur on the side that matters!
This lowers the expected running time from O(n log n) to O(n).  The
rigorous proof is in the text.  For intuition, compare T(n/2) + O(n)
and 2T(n/2) + O(n).

This is expected linear time, and with randomization that's probably
sufficient.  There is a way to get guaranteed linear time which you
should see once.  Basically we will partition around a "median of
medians" in order to gurantee a good split:

Select:
* Divide the elements into groups of 5 and put the medians in a pile
* Find the median of the pile
* Partition around this median of medians.
* Recur on the correct side.

Claim: The element picked by the above algorithm guarantees that the
split is at worst 3/10 : 7/10.  That is, the element is at least the
3n/10 th and at most the 7n/10 th.

Proof: Half the elements in the pile are greater than the median of
medians.  So that means at least 3(1/2)(n/5) elements are greater.
Similarly 3(1/2)(n/5) are less.

Running time:

T(n) = T(n/5) +   T(7n/10)     + O(n)
       recursive  worst split  doing partition and n/5 O(1) medians

This is O(n) -- see text.  Note: a little sloppy, actually T(6 + 7n/10).

Constant factors are way too high -- use quick-select.  In fact,
quicksort can be modified to do this, but I would call that
slowsort. :-)

Linear-Time Sorting

The generic way to sort is with lessThan.  And we proved yesterday
that with just this, you cannot sort in time less than Omega(n log n).
But when we do know things about the data, we can leave the comparison
model and do better.  For the rest of today we'll examine a few such
methods.

Suppose we n objects and there are k possible keys.  When convenient
we assume the poosible keys are integers are between 0 and k-1.  If k
is much larger than n, then we're usually best with a comparison sort.
(An exception is bucket sort, described below.)

k being smaller or near n is not uncommon.  It can happen with a lot
of repeated values.  I have run across this in my programming.  When
analyzing your data in homework 6, you may want to organize by fields
that only have a few values.  (Probably a spreadsheet will do your
sorting, but it could do it efficiently if it determined that k is
small.)  It also happens during radix sort.

If k is small, we might sort this way:
* Make an array of linked lists of size k.   O(k)
* Put each element in the correct bucket.    O(n)
* Walk through the buckets in order to get final array. O(n+k)
Total: O(n+k)

By putting elements in in reverse order we are stable.

It turns out counting sort (below) has lower constant factors because
it doesn't build up linked lists.  But this is the inspiration for
"bucket sort" where each bucket can actually hold a range of keys
which we then sort.

* Divide the key space into m equal-size regions, make one bucket for each
* Put each element in the correct bucket (The correct bucket is key / (k/m).)
* Sort each bucket
* Concatenate buckets

If we assume that the keys are uniformly distributed over the possible
key values, then we expect each bucket to have n/m elements.  And
having significantly more is very unlikely.

We're using the same math we did with hash tables here, only we're not
hashing, just supposing uniform distribution of the keys already.  (We
can't hash -- it mixes things up and we're trying to sort!).  So we
expect n/m elements in each bucket and it gets exponentially less
likely that there are constant factors more than n/m in any bucket.
So we roughly expect: 

O(n) + mO((n/m)log(n/m)) which for good "load factor" (word not
actually used to my knowledge in sorting) is O(n).  This would do
great on your homework, because we're using random keys.

To repeat, bucket sort works fine with large k, if we make the
additional assumption of uniform distribution of keys.

Counting sort: Good when k is approximately n or less.

* Sort A, an array of size n with keys from 0 to k-1.
* Make B, an array of size n to hold the output.
* Make C, an array of size k, intialized with zeros.
* For i from 0 to n-1
	C[A[i]]++ // make C[i] the number of times key i appears
* For i from 0 to k-1
	C[i] += C[i-1] // make C[i] the number of times a key <= i appears
* For i from n-1 to 0 
	B[C[A[i]]-1] = A[i] // copy each element directly to the right place
	C[A[i]]--           // walk backwards through array for stability

Total running time is O(n+k).  And the constant factors are low -- just
array indexing, additions, and copying.

Radix Sort

Split keys into parts, stable sort on less significant parts first:

Radix Sort:
  for i = least significant part to most significant part
     stable sort all elements on i

The intuition is that previous iterations resolve ties correctly, so
we use a stable sort so as not to mess up the previous work.  An
example to keep in mind is sorting decimal numbers using one digit for
each part.  Notice for each stable sort in this example k=10, so a
counting sort would work well.

The running time is O(d*m) where d is the number of parts (think
digits) and m is the time to one stable sort.  If d < n (where n is
the number of elements), then m = n by using counting sort, so we have
O(dn).  Now d is really log k since we can express k different keys
using log k bits or digits.  If k is roughly n, then the running time
is really O(n log n).  So radix sort is really asymptotically the same
as our optimal comparison sorts.