Implementing ordered sets using treaps

Let's look at another way to implement ordered sets. Here is an ordered
set signature that is designed to support implementation of both set and map
abstractions. We've added some operations to show the added power of ordered sets. The ```
first
```

function gives the first element in the set, and ` fold_forward`

iterates
over the elements of the set in ascending order. We can similarly implement ` last`

and ` fold_backward`

from the set signature.

We have already seen red-black trees, which are one good way to implement ordered
sets. Red-black trees are nice because they guarantee O(lg *n*) insert, lookup, and
deletion time, with good constant factors. However, if we are willing to accept probabilistic assurances of
performance there are other, simpler options
for implementing ordered sets. Two well-known data structures for implementing ordered sets
use randomness to achieve good average-case performance: **skip lists**
and **treaps**. Treaps are simpler and probably faster.

The idea behind treaps is to use randomness to balance binary search trees. Binary search
trees are great as long as they are balanced. But if the elements of the tree
are inserted in an ordered way, the tree can turn into a linked list (or at
least become extremely unbalanced), leading to *O*(*n*) performance. On the other hand, if a set of elements is inserted in a *random
order*, the expected distance in the tree to a randomly chosen element is *O*(lg
*n*). To see why, imagine walking down the tree from the root to a leaf. At
any given point on the walk, there is a subtree of (say)
*n** *elements below the current
element. Suppose that we construct a sequence of all of the * n* elements in this
subtree in key order. Because the elements were inserted in random order, the element
at the current node is randomly positioned at some position * p* within the ordered
sequence, where * p* goes from 1 to *n*. If we are looking for a randomly chosen
element, then there is a 1/*n* probability that the current element is the one of
interest. The left subtree contains * p*`-`1 elements, so there is a (*p*`-`1)/*n*
probability that the element of interest. Correspondingly, the right subtree contains (*n*`-`*p*)
elements, and there is an (*n*`-`*p*)/*n* probability that the element is there.
The expected size of the subtree that is visited after one step of the walk assuming
position * p* is
therefore (*p*`-`1)·(*p*`-`1)/*n* + (*n*`-`*p*)·(*n*`-`*p*)/*n
+ *(1·1*/n*). All
values of * p* from 1 to * n* are equally likely, so the expected size of the next
subtree is therefore the sum of this expression for all * p* from 1 to *n*, divided
by * n*:

`-`

As this shows, each branch taken
shrinks the size of the subtree below the current node by a factor of
approximately 2/3. Therefore we expect to take *O*(lg
*n*) steps to walk to a randomly chosen element: more precisely, about log_{3/2 }*n* steps on average.

Treaps simulate the construction of a randomly constructed binary search
tree. Each node in a treap contains not only a value and pointers to the left
and right children, but also a *priority*. The nodes of the treap satisfy
the heap ordering invariant with respect to this priority. The idea is
that a treap always looks like the binary search tree you would get if you had
inserted the elements in priority order. If the priorities are generated
randomly, you have a **random treap** whose structure is the same as the
corresponding random binary search tree. In an ordinary binary search tree,
elements inserted later are always lower in the tree; therefore, the nodes in a
treap must satisfy the heap ordering invariant on the node priorities. A treap is both a *binary search tree* with respect to the node elements,
and a *heap* with respect to the node priorities. From this comes its name: "treap"
= "tree heap".

Given a set of elements and associated priorities it is not completely obvious that we can construct a treap that satisfies both invariants simultaneously. Clearly the root of the treap must be the node with highest priority. To satisfy the BST invariant, all the nodes whose keys are less than this node must be in the left subtree of this node, and the nodes whose keys are greater must be in the right subtree. Therefore, we can apply this tree construction recursively to the left and right subtrees, resulting in a treap.

Given an existing treap, how do we insert a new element? The algorithm
follows the same strategy as in red-black trees: it finds the unique leaf where
the element can be inserted while preserving the BST invariant. However, we also
assign this element a random priority. The final treap had better look like the
binary search tree that one would get if the newly inserted element had been
inserted according to its priority. This is achieved by performing a series of **tree
rotations** to enforce the heap ordering invariant.

A simple tree rotation is also useful to know about for other tree algorithms such as splay trees and AVL trees. Notice that the following two trees both satisfy the binary search tree invariant, and that all of the elements remain in the same order with respect to an in-order traversal, regardless of the structure of the subtrees A, B, C:

x y / \ / \ A y x C / \ / \ B C A B

A tree rotation converts a part of the tree that looks like one of these into the other. The advantage is that the relative position of x and y is swapped by the rotation. Thus, if y is higher priority than x but it is below x, thus breaking the heap-ordering invariant (as in the left-hand picture), a tree rotation to the right-hand configuration will restore the heap-ordering invariant because it puts x below y.

Suppose we want to insert the elements 1,2,3,4,5 into a treap. With an ordinary BST this would result in a very unbalanced tree. Suppose, however, that the elements receive the priorities 17,30,25,33,11 (probably the priorities would range over all integers in a realistic implementation.) The tree evolves as follows:

(1,17) (1,17) (1,17) \ \ (2,30) (2,30) \ (3,47)

So far no tree rotations have been necessary to enforce the heap ordering invariant. However, this will not be true on the next insertion because it has a higher priority:

(1,17) (1,17) \ \ (2,30) (2,30) \ => \ (3,47) (4,33) \ / (4,33) (3,47)

The final insertion will rotate the value all the way to the top because it has the highest priority:

(1,17) (1,17) (1,17) (5,11) \ \ \ / (2,30) (2,30) (5,11) (1,17) \ => \ => / => \ (4,33) (5,11) (2,30) (2,30) / \ / \ \ (3,47) (5,11) (4,33) (4,33) (4,33) / / / (3,47) (3,47) (3,47)

Of course, this particular tree doesn't look very balanced, but that is just an artifact of the priorities we used in the example. Typically the tree will be more balanced.

Here is code that implements treaps:

functor Treap(structure Params: ORDERED_SET_PARAMS) = struct type key = Params.key type elem = Params.elem val compare = Params.compare val keyOf = Params.keyOf type prio = Rand.rand datatype tree = Empty | Node of {left: tree, right: tree, value: elem, priority: prio} type node = {left: tree, right: tree, value: elem, priority: prio} (* Rep Invariant: * For Node{value,priority,left,right}: * 0. Binary Search Tree: all of the values in the tree "left" have * keys are less than the key of "value", and all * of the values in "right" have keys greater than the key of * "value". * 1. Heap ordering: all of the priorities in the left and right * subtrees are at least as large as "priority". *) fun lookup(t:tree,k:key): elem option = case t of Empty => NONE | Node {value,priority,left,right} => (case compare (k, keyOf(value)) of EQUAL => SOME value | LESS => lookup(left, k) | GREATER => lookup(right, k)) fun add(t:tree, e: elem, p: prio): tree * bool = let (* Given a < xv < b < yv < c, heap_rotate(xv,xp,yv,yp,a,b,c) is * a node for a tree that satisfies the rep invariant and contains * all of the elements in question. *) fun heap_rotate(xv,xp, yv,yp, a: tree, b: tree, c: tree): node = if xp < yp then {value = xv, priority = xp, left = a, right = Node{value = yv, priority = yp, left = b, right = c}} else {value = yv, priority = yp, right = c, left = Node{value = xv, priority = xp, left = a, right = b}} fun add_node(t: tree, e: elem, p:prio): node * bool = case t of Empty => ({value=e, priority=p, left=Empty, right=Empty}, false) | Node{value, priority, left, right} => case compare(keyOf(e),keyOf(value)) of EQUAL => ({value=e, priority=priority, left=left, right=right}, true) | LESS => let val ({value=xv, priority=xp, left=a, right=b}, dup) = add_node(left, e, p) in (heap_rotate(xv, xp, value, priority, a, b, right), dup) end | GREATER => let val ({value=yv, priority=yp, left=b, right=c}, dup) = add_node(right, e, p) in (heap_rotate(value, priority, yv, yp, left, b, c), dup) end val (n, dup) = add_node(t,e,p) in (Node(n), dup) end fun first(t: tree): elem option = case t of Empty => NONE | Node{value, priority, left, right} => case first(left) of NONE => SOME value | eo => eo fun fold_forward(f: elem*'b->'b) (b:'b) (k:key) (t:tree) = case t of Empty => b | Node {value,priority,left,right} => (case compare(keyOf(value), k) of EQUAL => fold_forward f (f(value,b)) k right | LESS => fold_forward f b k right | GREATER => let val lft = fold_forward f b k left in fold_forward f (f(value,lft)) k right end) end

Here, ` heap_rotate`

is the function that figures out which of the two tree
configurations above is appropriate, given two elements ` x`

and ` y`

and their
associated priorities. This code doesn't actually build the tree nodes for the
result until it has to, resulting in some performance improvement. The function
add_node walks to the bottom of the tree, then uses ` heap_rotate`

as it
reconstructs the tree on the way back up so that the heap ordering invariant is
always maintained. Note that ` first`

and ` fold_forward`

work exactly the same way for
all binary trees.

This code assumes that a priority is provided when elements are added to the
data structure. We want this priority to be randomly chosen from a large space
so that the tree is likely to be approximately balanced. SML provides some library functions for
generating pseudo-random numbers. For this use, it doesn't matter too much how
good the pseudo-random number generator is. Here is how we can use a random
number generator to produce **random treaps**, a good set
implementation. We haven't implemented remove here, but it's done using
rotations too.

functor TreapSet(structure Params: ORDERED_SET_PARAMS) :> ORDERED_FUNCTIONAL_SET where type key = Params.key and type elem = Params.elem = struct type key = Params.key type elem = Params.elem val compare = Params.compare val keyOf = Params.keyOf structure T = Treap(structure Params = Params) type set = {tree: T.tree, seed: Rand.rand, size: int} fun empty() = {tree = T.Empty, seed = 0wx5a5a5, size = 0} fun lookup({tree,seed,size}, k) = T.lookup(tree,k) fun add({tree,seed,size}, e:elem) = let val p = Rand.random(seed) val (t',dup) = T.add(tree,e,p) val size' = if dup then size else size+1 in ({tree=t', seed=p, size=size'}, dup) end fun size({tree,seed,size}) = size fun first({tree,seed,size}) = T.first(tree) fun remove(t,k) = raise Fail "Not implemented: treap remove" fun last(t) = raise Fail "Not implemented: last" type 'b folder = ((elem*'b)->'b) -> 'b -> key -> set -> 'b fun fold_forward (f: elem*'b->'b) (b:'b) (k:key) {tree,seed,size} = T.fold_forward f b k tree fun fold_backward f b k tr = (raise Fail "Not implemented: fold backward") end

The win of treaps is that the code is considerably simpler than red-black trees. Red-black trees are known for being fast, but this implementation of treaps is competitive in speed and a lot shorter and simpler.