# Priority Queues and Heaps

For Dijkstra's shortest-path algorithm, we needed a **priority queue**: a queue
whose elements are removed in priority order. A priority queue is an abstraction
with several important uses. And it can be implemented efficiently, as we'll see.

We've already seen that priority queues are useful for implementing Dijkstra's algorithm
and A^{*} search. Priority queues are also very useful for event-driven simulation,
where the simulator needs to handle events in the order in which they occur, and handling
one event can result in adding new events in the future, which need to be pushed onto the queue.
Another use for priority queues is for the compression algorithm known as
Huffman coding, the
optimal way to compress individual symbols in a stream. Priority queues can also
be used for sorting, since elements to be sorted can be pushed into the priority queue
and then removed in sorted order.

## Priority queue interface

A priority queue can be described via the following interface:

PriorityQueue.java

This interface suffices to implement Dijkstra's shortest path algorithm, for example.

Implementing `increasePriority()`

requires that it be possible to
find the element in the priority queue. This can be accomplished by using a
hash table to look up element locations, or by augmenting the elements
themselves with an extra instance variable that holds their location. We largely
ignore that issue here.

## Implementation 1: Binary Search Tree

One simple implementation of a priority queue is as a binary search tree, using element priorities as keys. New elements are added by using the ordinary BST add operation. The minimum element in the tree can be found by simply walking leftward in the tree as far as possible and then pruning or splicing out the element found there. The priority of an element can be adjusted by first finding the element; then the element is removed from the tree and readded with its new priority. Assuming that we can find elements in the tree in logarithmic time, and that the tree is balanced, all of these operations can be done in logarithmic time, asymptotically.

## Implementation 2: Binary Heap

However, a balanced binary tree is overkill for implementing priority queues;
it is conventional to instead use a simpler data structure, the **binary
heap**. The term *heap* is a bit overloaded; binary heaps should not be
confused with *memory heaps*. A memory heap is a low-level data structure
used to keep track of the computer's memory so that the programming language
implementation knows where to place objects in memory.

A *binary* heap, on the other hand, is a binary tree satisfying the
**heap order invariant**:

**(Order)** For each non-root node n, the priority of n is no higher than the
priority of n's parent. Equivalently, a heap stores its minimum element at
the root, and the left and right subtrees are also both heaps.

Here is an example of a binary heap in which only the priorities of the elements are shown:

Notice that the root of each subtree contains the highest-priority element.

It is possible to manipulate binary heaps as tree structures. However, additional speedup can be achieved if the binary heap satisfies a second invariant:

**(Shape)** If there is a node at depth h, then every possible node of depth h–1 exists
along with every possible node to the left of depth h. Therefore, the leaves of
the tree are only at depths h and h–1. This shape invariant may be easier
to understand visually:

In fact, the example tree above also satisfies this shape invariant.

The reason the shape invariant helps is because it makes it possible to represent
the binary heap as a *resizable array*. The elements of the array are “read out”
into the heap structure row by row, so the heap structure above is represented by
the following array of length 9, with array indices shown on the bottom.

How is it possible to represent a tree structure without pointers? The shape invariant guarantees that the children of a node at index i are found at indices 2i+1 (left) and 2i+2 (right). Conversely, the parent of a node at index i is found at index (i–1)/2, rounded down. So we can walk up and down through the tree by using simple arithmetic.

### Binary heap operations

#### Add

Adding is done by adding the element at the end of the array to preserve the Shape invariant.
This violates the Order invariant in general, though. To restore the Order invariant, we
*bubble up* the element by swapping it with its parent until it reaches either the
root or a parent node of higher priority. This requires at most lg n swaps, so the algorithm
is O(lg n). For example, if we add an element with priority 2, it goes at the end of the array
and then bubbles up to the position where 3 was:

#### ExtractMin

The minimum element is always at the root, but it needs to be replaced with something.
The last element in the array needs to go somewhere anyway, so we put it at the root
of the tree. However, this breaks the order invariant in general. We fix the order
invariant by *bubbling the element down*. The element is compared against the
two children nodes and if either is higher-priority, it is swapped with the higher
priority child. The process repeats until either the element is higher-priority than
its children or a leaf is reached. Bubbling down ensures that the heap order
invariant is restored along the path from the root to the last heap element.
Here is what happens with our example heap:

#### IncreasePriority

Increasing the priority of an element is easy. After increasing the priority, we simply bubble it up to restore Order.

## HeapSort

The heapsort algorithm sorts an array by first *heapifying* it to make it satisfy Order. Then
`extractMin()`

is used repeatedly to read out the elements in increasing order.

Heapifying can be done by bubbling every element down, starting from the
*last* non-leaf node in the tree (at index n/2 - 1) and working backward
and up toward the root:

for (i = (n/2)-1; i >= 0; i--) { bubble_down(i); }

The total time required to do this is linear. At most half the elements need to be bubbled down one step, at most a quarter of the elements need to be bubbled down two steps, and so on. So the total work is at most n/2 + 2·n/4 + 3·n/8 + 4·n/16 + ..., which is O(n).

## Treaps

A **treap** is a binary search tree that is balanced with high probability. This is
achieved by ensuring the tree has exactly the same structure that it would have had if
the elements had been inserted in random order. Each node in a treap contains both a
key and a randomly chosen priority. The treap satisfies the BST invariant with respect
to all of its keys so elements can be found in the treap in logarithmic time. The
treap satisfies the heap invariant with respect to all its priorities. This ensures that
the tree structure is exactly what you'd get if the elements had been inserted in
priority order.

For example, the following is a treap where the keys are the letters and the priorities are the numbers.

Elements are added to the treap in the usual way for binary search trees. However,
this may break the Order invariant on priorities. To fix that invariant, the element
is bubbled up through the treap using **tree rotations** that swap a node with its
parent while preserving the BST invariant. If the node is x and it is the right child
of a parent p, the tree rotation is performed by changing pointers so the data structure
on the left turns into the data structure on the right.

Notice that A, B, and C here represent entire subtrees that are not affected by the
rotation except that their parent node may change.
Conversely, if the node is p and it's the *left* child of a parent x, the tree
rotation to swap p with x transforms the data structure on the right to the one on the
left.

For example, adding a node D with priority 2 to the treap above results in the following rotations being done to restore the Order invariant:

Notice that this is exactly the tree structure we'd get if we had inserted all the nodes into a simple BST in the order specified by the priorities of the nodes.