AVL Trees

A binary search tree is one in which every node *n* satisfies the
**binary search tree invariant**: its left child and all the nodes below it
have values (or keys) less than that of *n*.
Similarly, the right child node and all nodes below it have values greater
than that of *n*.

The code for a binary search tree looks like the following. First, to check for an element or to add a new element, we simply walk down the tree.

(* contains(t,x) is whether x is in the tree *) fun contains(t: tree, x: value): bool = case t of Empty => false | Node{value, left, right} => (case compare(x, value) of EQUAL => true | LESS => contains(left, x) | GREATER => contains(right, x)) (* add(t,x) is a BST with the same values as t, plus x *) fun add(t: tree, x: value): tree = let fun balance(t: tree): tree = t (* what to write here? *) in case t of Empty => Node{value=x, left=Empty, right=Empty} | Node {value, left, right} => (case compare(x, value) of EQUAL => Node{value=x, left=left, right=right} | LESS => Node{value=value, left=add(left, x), right=right} | GREATER => Node{value=value, left=left, right=add(right,x) } ) end

When a tree satisfies the BST invariant, an in-order traversal of the tree nodes will visit the nodes in ascending order of their contained values. So it's easy to fold over all tree nodes in order.

Removing elements is a little trickier. If a node is a leaf, it can be removed. If it has one child, it can be replaced with its child. If it has two children, it can be replaced with either its immediate successor (or predecessor), which requires searching in the tree.

(* Returns: a tree just like t except that the node containing x is removed. * Checks: x is in the tree. *) fun remove(t: tree, x: value): tree = let (* Returns: a tree in which the successor of the root is removed, * along with the value of that successor. * Checks: the root has a successor. *) fun removeSuccessor(t: tree): tree*value = case t of Empty => raise Fail "impossible" | Node {value, left=Empty, right} => (r, v) | Node {value, left, right} => let val (l, v) = removeSuccessor(l) in (Node {value, l, right}, v) end in case t of Empty => raise Fail "value not in the tree" | Node {value, left, right) => case Int.compare(x, value) of LESS => Node {value, remove(l, x), right) | GREATER => Node {value, left, remove(r, x)} | EQUAL => case (left, right) of (_, Empty) => l | (Empty, _) => r | _ => let val (r, v) = removeSuccessor(r) in Node {v, left, r} end end

The time required to find a node in a BST, or to remove a node from a BST, is
*O*(*h*), where *h* is
the **height** of the tree: the length of the longest path from the root
node to any leaf. If a tree is perfectly balanced, so that all leaf nodes are
at the same depth, then *h* is *O*(log
*n*). This makes binary search trees an attractive data structure,
especially for implementing ordered sets and maps.

The problem with BST's is that they are not necessarily balanced. In fact, if
nodes are added to a BST in increasing order, the resulting BST will be
essentially a linked list.
The solution to this problem is to make sure that the BST is
balanced. Making the BST perfectly balanced at every step
is too expensive, but if we are interested in asymptotic complexity,
we merely need the height *h* to be proportional
to *O*(log *n*). We will say that the
BST is balanced in this case.

There are many ways to keep binary search trees balanced. Some of the more popular methods are red-black trees, AVL trees, B-trees, and splay trees. But there are many more, including 2-3 trees, 2-3-4 trees, AA trees, and treaps. Each kind of binary search tree works by strengthening the representation invariant so that the tree must be approximately balanced.

AVL trees were invented by Adelson-Velskii and Landis in 1962. An
AVL tree is a balanced binary search tree where every node in the tree satisfies
the following invariant: the height difference between its left and right
children is at most 1. Hence, all sub-trees of an AVL tree are themselves AVL.
The height difference between children is referred to as the **balance factor**
of the node.

Let's see why the AVL invariant means that the tree is balanced. Suppose
we want to make it as unbalanced as possible. Then for a given height *h*, we
want to find the AVL tree with as few nodes as possible. Let
*N*(*h*) be the
minimum number of nodes in a tree of height *h*. If we think about it, we
can see that *N*(0)=1, *N*(1)=2, *N*(2)=4, and in general,

N(h) = 1 +N(h−1) +N(h−2)

because a tree with height *h* must have at least one child with height
*h*−1,
and to make the tree as small as possible, we make the other child have height
*h*−2.

Now we can show that there is a minimum size to an AVL tree of height *h*.
We do this by showing that *N*(*h*) has a lower bound; that it is Ω(*k*^{h}) for
suitable k.

We use the substitution method, replacing N(h) with
*ck*^{h} on both sides of the recurrence.
We need to find *c*, *h*_{0}
such that for all *h* greater than *h*_{0},

ck^{h}≤ 1 +ck^{h−1}+ck^{h−2}

Dividing by *ck ^{h−2}*, we see this is true if

k≤^{2}k^{-h}/c+k+ 1

Because the term *k*^{-h}/*c*
becomes small for large *h*,
this inequality will hold as long as *k* is less than the solution
to the equation:

k=^{2}k+ 1

which is the **golden ratio**, φ = 1.618... . Therefore, *n* is
Ω(φ^{h}), and
conversely *h* is
*O*(log_{φ}*n*) =
*O*(lg *n*). Therefore an AVL tree is balanced.

You may know that the golden ratio is connected to the Fibonacci series.
If you look more closely at the function *N*(*h*), you'll notice that in fact
*N*(*h*) = *F*(*h*+2) - 1, where
*F*(*n**)* gives the *n*th Fibonacci number.

Here is the code for AVL trees. The key piece of technology is the
`balance`

function, which rebalances an AVL tree. All the
operations such as add and remove can then use `balance`

to
restore the AVL invariant.

type height = int datatype avltree = Empty | Node of height * value * avltree * avltree (* Rep Invariant: * For each node Node(h, v, l, r): * (1) BST invariant: v is greater than all values in l, * and less than all values in r. * (2) h is the height of the node. * (3) Each node is balanced, i.e., abs(l.h - r.h) <= 1 *) fun height(Empty) = 0 | height(Node(h,_,_,_)) = h fun bal_factor(Empty) = 0 | bal_factor(Node(_,_,l,r)) = (height l) - (height r) fun node(v: value, l: avltree, r: avltree): avltree = Node(1+Int.max(height l, height r), v, l, r) fun rotate_left(t: avltree): avltree = case t of Node(_, x, a, Node(_, y, b, c)) => node(y, node(x, a, b), c) | _ => t fun rotate_right(t: avltree): avltree = case t of Node(_, x, Node(_, y, a, b), c) => node(y, a, node(x, b, c)) | _ => t (* Returns: an AVL tree containing the same values as n. * Requires: The children of n satisfy the AVL invariant, and * their heights differ by at most 2. *) fun balance(n as Node(h, v, l, r): avltree): avltree = case (bal_factor n, bal_factor l, bal_factor r) of ( 2, ~1, _) => rotate_right(node(v, rotate_left l, r)) | ( 2, _, _) => rotate_right(n) | (~2, _, 1) => rotate_left (node(v, l, rotate_right r)) | (~2, _, _) => rotate_left (n) | _ => n fun add (t: avltree, n:int): avltree = case t of Empty => node(n, Empty, Empty) | Node(h, v, l, r) => case Int.compare (n, v) of EQUAL => t | LESS => balance(node(v, add(l, n), r)) | GREATER => balance(node(v, l, add(r, n))) fun remove(t: avltree, n: int): avltree = let fun removeSuccessor(t: avltree): avltree*int = case t of Empty => raise Fail "impossible" | Node(_, v, Empty, r) => (r, v) | Node(_, v, l, r) => let val (l', v') = removeSuccessor(l) in (balance(node(v, l', r)), v') end in case t of Empty => raise Fail "value not in the tree" | Node (_, v, l, r) => case Int.compare(n, v) of LESS => balance(node(v, remove(l, n), r)) | GREATER => balance(node(v, l, remove(r, n))) | EQUAL => case (l, r) of (_, Empty) => l | (Empty, _) => r | _ => let val (r', v') = removeSuccessor(r) in balance(node(v', l, r')) end end

The `balance`

function works by doing **tree rotations**.
This is a reorganizing of the tree in which the parent-child relationships
between nodes are changed in a local way, usually to restore a global invariant.
There are two basic tree rotations, left rotations and right rotations,
which are symmetrical. A left rotation works as follows, moving the root node
to the left:

x y / \ / \ + y x + /a\ / \ ===> / \ /c\ --- + + + + --- /b\ /c\ /a\ /b\ --- --- --- ---

A right rotation is just the inverse transformation.
The important property is that
tree rotations preserve the BST invariant, because the left-to-right ordering
of all nodes remains unchanged:
```
x < a < b < y < c
```

Therefore tree rotations can be used to reestablish other invariants such
as the AVL invariant.

The `balance`

function is invoked on a node `t`

that is possibly unbalanced. We assume that whatever operation has
been performed on the tree below this node, it has changed the height of
nodes by at most one, and therefore the child subtrees of `t`

have
a height difference of at most 2. We also assume the subtrees satisfy
the AVL invariant themselves. If the height difference (balance factor) is 1 or 0,
then balance doesn't need to do anything. Suppose the balance factor is 2
(the case where it is −2 is symmetrical). Then the tree `t`

looks something like this:

y / \ + + / \ / \ / \ / h \ / h+2 \ ----- / \ ---------

How we fix this problem depends on what the left subtree looks like. There are two cases to consider:

In Case 1, we can do a right rotation to pull the subtreesCase 1 Case 2y y / \ / \ x + x z / \ /h\ / \ /h\ + + --- + + --- / \ / \ c /h\ / \ c /h+1\ /h+1\ --- /h+1\ ----- ----- a ----- a b (may be h) b

`a`

and `b`

up:
Case 1: y x / \ / \ / \ / \ x + + y / \ /h\ ====> / \ / \ + + --- /h+1\ + + / \ / \ c ----- / \ /h\ /h+1\ /h+1\ a /h+1\ --- ----- ----- ----- c a b (may be h) b

This clearly wouldn't work if the height of subtree `a`

were *h*,
because in that case `b`

's leaves would be two levels lower than
`a`

's.
That's the job of Case 2, which requires a double rotation:

Case 2:z y / \ / \ / \ / \ x + x z / \ /h\ ====> / \ / \ + y --- + + + + /h\ / \ c /h\ /h\ /h\ /h\ --- + + --- --- --- --- a /h\ /h\ a b' b'' c --- --- b' b''

(Note that one of `b'`

or `b''`

can actually have height *h*−1 here, but that doesn't
break the AVL invariant). The double rotation preserves the BST ordering
because it is equivalent to two rotations. So the ordering remains unchanged:
```
a < x < b' < y < b'' < z < c
```

When writing tree algorithms, it's helpful to be able to print out trees
on the display. Here is some code for visualizing trees. Call
`print(tree)`

to produce some nice output.

5 / \ / \ 3 7 / \ / \ 2 4 6 8 / \ 1 9