Lecture 23: Priority Queues and Heaps

Today:
  - priority queues (a better way) and heaps

Priority Queues and Heaps
-------------------------

  * each element has a PRIORITY, an element
      of a totally ordered set (usually a number)
  * more important things come out *first*, 
     even if they were added later
  * convention for today: smaller number = higher priority


  (* imperative priority queues -- operations destructively update
   * the data structure *)
  signature IMP_PRIOQ =
    sig
        type 'a prioq

        (* creates a new, empty priority queue *)
        val empty : ('a * 'a -> order) -> 'a prioq

        (* insert the 'a element into the queue *)
        val insert : 'a prioq -> 'a -> unit

        (* remove the minimum element (that was first inserted) *)
        val extract_min : 'a prioq -> 'a option
    end

    - maintain the list ordered by key, min element at head
    - insert: O(n) (bubble new element in to its
        rightful place in the sorted list)
    - extract_min: O(1) (just remove first element of list)

  structure ListPrioq : IMP_PRIOQ =
    struct
      type 'a prioq = {compare: 'a * 'a -> order
                       elements: 'a ref list}

      fun empty (c:'a*'a->order) = {compare=c, elements=ref []}

      fun insert ({compare,elements}: 'a prioq) (x:'a) : unit = 
          let fun ins [] = [x]
                | ins (hd::tl) = 
		    (case compare(hd,x) of
                       LESS => hd::(ins tl)
                     | _ => x::(hd::tl)
          in
              elements := ins(!elements)
          end

     fun extract_min ({compare,elements}:'a prioq) : 'a option = 
         case (!elements) of
           [] => NONE
         | hd::tl => (elements := tl; SOME hd)
   end

Alternative list implementation:
    - do not maintain a sorted list
    - just add new elements at head
    - search list to find minimum and extract it
    - insert: O(1)
    - extract_min: O(n)

Which is better?  Not really any difference--one you
win on inserts but lose on extract_min's, the other
vice versa.

BUT today we will see:
    - Heap implementation
    - insert: O(log n)
    - extract_min: O(log n)

------------------------------------------------------------------

We can use a prioq to sort n numbers
  * Insert them in the queue, with the number as both priority and 
    data
  * Then take them out in priority (= numerical) order.
    
  Time:  O(n) insertions, taking O(n) each, for O(n^2)
         O(n) deletions, taking O(1) each.
  Total: O(n^2)

------------------------------------------------------------------

This is more expensive than it needs to be.  
We can implement priority queues more efficiently with a HEAP:
A tree in which each node has a PRIORITY
    - Priority of each node no larger than priorities of its children
    - So the node with minimum priority is on top (root) of the tree.

This will give
insert: O(log n)
extract_min: O(log n)

Thus sorting n numbers using this implementation of
priority queues can be done in O(n log n).

<<< Let's ignore data for a bit.  Numbers shown are just priorities >>>

              3
             / \
            /   \
           5     9
          / \   / \
         12  6 10  15

Heaps are easily represented arrays 

The root of the tree is at location 0 in the array and the
children of the node stored at position i are at
locations 2i+1 and 2i+2.

[3 5 9 12 6 10 15]

Read across the tree, row by row.  

Partial Ordering Property for heaps
  A[i] <= A[2i+1] and A[i] <= A[2i+2]
for 1 <= i <= floor(n/2)

------------------------------------------------------------------
insert: 

  Put the element at a *leaf*
  Switch it with its parent if its parent is larger, etc

              3
             / \
            /   \
           5     9        [3 5 9 12 6 10 15 4]
          / \   / \
         12  6 10  15
        /
       4
              3
             / \
            /   \
           5     9        [3 5 9 4 6 10 15 12]
          / \   / \
         4   6 10  15
        /
       12
              3
             / \
            /   \
           4     9        [3 4 9 5 6 10 15 12]
          / \   / \
         5   6 10  15
        /
       12

This operation requires only O(log n) time -- the tree is depth
ceil(log n), and we do a bounded amount of work on each level.

  * Finding your parent is easy:
    If you're node i>1, then your parent is ((i-1) div 2)

So the code does the following:
  * Check for full queue -- !next_avail >= Array.length(values)
  * Increment next_avail
  * Store new element in values[!next_avail]
  * Bubble it up 'til (prio parent) <= (prio child)
    
extract_min works by returning the element at the root.
  * Guaranteed to be the most important (smallest value) by the
    partial ordering property.
  * Now we have the two subtrees to put right, though.

Trick is, 
  * Copy a leaf (last element) to the root (first element)
  * If it's larger (less important) than one of the children, 
    bubble it down. 
    - Swap with the more important child, to make sure the parent 
      is always more important than both children.

Here's what the code does (see handout):
  * Save minimum element, it's the return value
  * put last element to first position, 
  * decrement next_avail counter
  * bubble the new top down the tree 'til it stops.

original heap, to delete top element from (leaves two subheaps)

              3
             / \
            /   \
           4     9        [3 4 9 5 6 10 15 12]
          / \   / \
         5   6 10  15
        /
       12

copy last leaf to root

              12
             / \
            /   \
           4     9        [12 4 9 5 6 10 15]
          / \   / \
         5   6 10  15

"push down"

              4
             / \
            /   \
           12     9        [4 12 9 5 6 10 15]
          / \   / \
         5   6 10  15

              4
             / \
            /   \
           5     9        [4 5 9 12 6 10 15]
          / \   / \
         12  6 10  15

Again an O(log n) operation.

We can sort using this implementation of priority queues.
How expensive is the sorting function built from this?

  n insertions, at O(log n) cost, for O(n log n) total
  n deletions, at O(log n) cost, for O(n log n) total.

  Thus, O(n log n) total cost.

It's called HEAPSORT and it's a reliable standard one.

If you have to sort by doing comparisons only, this is as fast as
possible (up to a constant factor).
  * There are plenty of other O(n log n) algorithms with different
    properties
    - smaller constant factor
    - very fast if the list is already sorted 

Some special cases will let you sort in O(n) time, but they're 
rare (can anyone tell me one?)
------------------------------------------------------------------
One last comment -- you might be worried about the fixed size
for the array of values.  There's two possible ways around
this:

(1) we could make the values an array ref.  When we insert
too many elements, we could allocate a new array (that's
larger), then copy the old array into the new array, and
use the new array.  But how much should we grow the array?
A standard trick is to double the size of the array.  
That way, you can amortize the cost of doing the copy
(an O(n) operation) across a larger number of inserts.
Also, if you are really dealing with a lot of data, then
the array will grow in size quickly.  The drawback is that
of course, you might be wasting a lot of space.  

(2) don't use arrays -- rather, use pointers to heap-allocated
objects.  However, you'll have to be able to find your "parent"
somehow -- this means that children should have links to their
parents.  Also, you'll need to be able to get to the last element
quickly -- so you'll need a pointer to the last element inserted.
A good homework problem is to try to figure out how to do a
heap without embedding it in an array...
------------------------------------------------------------------
<% ShowSMLFile("lec23.sml") %>