Lecture 23: Priority Queues and Heaps
Today:
- priority queues (a better way) and heaps
Priority Queues and Heaps
-------------------------
* each element has a PRIORITY, an element
of a totally ordered set (usually a number)
* more important things come out *first*,
even if they were added later
* convention for today: smaller number = higher priority
(* imperative priority queues -- operations destructively update
* the data structure *)
signature IMP_PRIOQ =
sig
type 'a prioq
(* creates a new, empty priority queue *)
val empty : ('a * 'a -> order) -> 'a prioq
(* insert the 'a element into the queue *)
val insert : 'a prioq -> 'a -> unit
(* remove the minimum element (that was first inserted) *)
val extract_min : 'a prioq -> 'a option
end
- maintain the list ordered by key, min element at head
- insert: O(n) (bubble new element in to its
rightful place in the sorted list)
- extract_min: O(1) (just remove first element of list)
structure ListPrioq : IMP_PRIOQ =
struct
type 'a prioq = {compare: 'a * 'a -> order
elements: 'a ref list}
fun empty (c:'a*'a->order) = {compare=c, elements=ref []}
fun insert ({compare,elements}: 'a prioq) (x:'a) : unit =
let fun ins [] = [x]
| ins (hd::tl) =
(case compare(hd,x) of
LESS => hd::(ins tl)
| _ => x::(hd::tl)
in
elements := ins(!elements)
end
fun extract_min ({compare,elements}:'a prioq) : 'a option =
case (!elements) of
[] => NONE
| hd::tl => (elements := tl; SOME hd)
end
Alternative list implementation:
- do not maintain a sorted list
- just add new elements at head
- search list to find minimum and extract it
- insert: O(1)
- extract_min: O(n)
Which is better? Not really any difference--one you
win on inserts but lose on extract_min's, the other
vice versa.
BUT today we will see:
- Heap implementation
- insert: O(log n)
- extract_min: O(log n)
------------------------------------------------------------------
We can use a prioq to sort n numbers
* Insert them in the queue, with the number as both priority and
data
* Then take them out in priority (= numerical) order.
Time: O(n) insertions, taking O(n) each, for O(n^2)
O(n) deletions, taking O(1) each.
Total: O(n^2)
------------------------------------------------------------------
This is more expensive than it needs to be.
We can implement priority queues more efficiently with a HEAP:
A tree in which each node has a PRIORITY
- Priority of each node no larger than priorities of its children
- So the node with minimum priority is on top (root) of the tree.
This will give
insert: O(log n)
extract_min: O(log n)
Thus sorting n numbers using this implementation of
priority queues can be done in O(n log n).
<<< Let's ignore data for a bit. Numbers shown are just priorities >>>
3
/ \
/ \
5 9
/ \ / \
12 6 10 15
Heaps are easily represented arrays
The root of the tree is at location 0 in the array and the
children of the node stored at position i are at
locations 2i+1 and 2i+2.
[3 5 9 12 6 10 15]
Read across the tree, row by row.
Partial Ordering Property for heaps
A[i] <= A[2i+1] and A[i] <= A[2i+2]
for 1 <= i <= floor(n/2)
------------------------------------------------------------------
insert:
Put the element at a *leaf*
Switch it with its parent if its parent is larger, etc
3
/ \
/ \
5 9 [3 5 9 12 6 10 15 4]
/ \ / \
12 6 10 15
/
4
3
/ \
/ \
5 9 [3 5 9 4 6 10 15 12]
/ \ / \
4 6 10 15
/
12
3
/ \
/ \
4 9 [3 4 9 5 6 10 15 12]
/ \ / \
5 6 10 15
/
12
This operation requires only O(log n) time -- the tree is depth
ceil(log n), and we do a bounded amount of work on each level.
* Finding your parent is easy:
If you're node i>1, then your parent is ((i-1) div 2)
So the code does the following:
* Check for full queue -- !next_avail >= Array.length(values)
* Increment next_avail
* Store new element in values[!next_avail]
* Bubble it up 'til (prio parent) <= (prio child)
extract_min works by returning the element at the root.
* Guaranteed to be the most important (smallest value) by the
partial ordering property.
* Now we have the two subtrees to put right, though.
Trick is,
* Copy a leaf (last element) to the root (first element)
* If it's larger (less important) than one of the children,
bubble it down.
- Swap with the more important child, to make sure the parent
is always more important than both children.
Here's what the code does (see handout):
* Save minimum element, it's the return value
* put last element to first position,
* decrement next_avail counter
* bubble the new top down the tree 'til it stops.
original heap, to delete top element from (leaves two subheaps)
3
/ \
/ \
4 9 [3 4 9 5 6 10 15 12]
/ \ / \
5 6 10 15
/
12
copy last leaf to root
12
/ \
/ \
4 9 [12 4 9 5 6 10 15]
/ \ / \
5 6 10 15
"push down"
4
/ \
/ \
12 9 [4 12 9 5 6 10 15]
/ \ / \
5 6 10 15
4
/ \
/ \
5 9 [4 5 9 12 6 10 15]
/ \ / \
12 6 10 15
Again an O(log n) operation.
We can sort using this implementation of priority queues.
How expensive is the sorting function built from this?
n insertions, at O(log n) cost, for O(n log n) total
n deletions, at O(log n) cost, for O(n log n) total.
Thus, O(n log n) total cost.
It's called HEAPSORT and it's a reliable standard one.
If you have to sort by doing comparisons only, this is as fast as
possible (up to a constant factor).
* There are plenty of other O(n log n) algorithms with different
properties
- smaller constant factor
- very fast if the list is already sorted
Some special cases will let you sort in O(n) time, but they're
rare (can anyone tell me one?)
------------------------------------------------------------------
One last comment -- you might be worried about the fixed size
for the array of values. There's two possible ways around
this:
(1) we could make the values an array ref. When we insert
too many elements, we could allocate a new array (that's
larger), then copy the old array into the new array, and
use the new array. But how much should we grow the array?
A standard trick is to double the size of the array.
That way, you can amortize the cost of doing the copy
(an O(n) operation) across a larger number of inserts.
Also, if you are really dealing with a lot of data, then
the array will grow in size quickly. The drawback is that
of course, you might be wasting a lot of space.
(2) don't use arrays -- rather, use pointers to heap-allocated
objects. However, you'll have to be able to find your "parent"
somehow -- this means that children should have links to their
parents. Also, you'll need to be able to get to the last element
quickly -- so you'll need a pointer to the last element inserted.
A good homework problem is to try to figure out how to do a
heap without embedding it in an array...
------------------------------------------------------------------
<%
ShowSMLFile("lec23.sml")
%>