Recall that a binary search tree is a binary tree with the following representation invariant:
For
any node n
, every node in n.left
has a value less than
that of n
, and every node in n.right
has a value more
than that of n
. And the entire left and right subtrees satisfy the
same invariant. The datatype that defines binary search trees is:
type value = int datatype btree = Empty | Node of {value: value, left:btree, right:btree}
Given such a tree, how do you perform a lookup operation? Start from the root, and at every node, if the value of the node is what you are looking for, you are done; otherwise, recursively look up in the left or right subtree depending on the value stored at the node. In code:
fun contains (n:int, t:btree): bool = (case t of Empty => false | Node {value,left,right} => (case Int.compare (value,n) of EQUAL => true | GREATER => contains (n,left) | LESS => contains (n,right)))
Addition is similar: you perform a lookup until you find the empty node that should contain the value. In code:
fun add (n:int, t:btree):btree = (case t of Empty => Node {value=n, left=Empty, right=Empty} | Node {value,left,right} => (case Int.compare (value,n) of EQUAL => t | GREATER => Node {value=value, left=add (n,left), right=right} | LESS => Node {value=value, left=left, right=add (n,right)}))
What is the running time of those operations? Since add
is just a lookup
with an extra node creation, we focus on the lookup operation. Clearly, an
analysis of the code shows that add
is O(height
of the tree). What's the worst-case height of a tree? Clearly, a tree of n
nodes all in a single long branch (imagine adding the numbers 1,2,3,4,5,6,7
in order into a binary search tree). So the worst-case running time of lookup is
still O(n) (for n
the number of nodes in the tree).
Some useful code resources:
What is a good shape for a tree that would allow for fast lookup? A balanced, "bushy" tree; for example:
^ 50 | / \ | 25 75 height=3 | / \ / \ | 10 30 60 90 | / \ / \ / \ / \ V 4 12 27 40 55 65 80 99
If a tree with n nodes is kept balanced, its height is O(lg n), which leads to a lookup operation running in time O(lg n).
In a previous lecture we have seen one example of balanced binary trees: AVL trees. In this lecture, we'll discuss an alternative balanced binary search tree data structure called red-black tree.
Red-black trees achieve the tree balancing properties by coloring each node of the tree either red or black, and imposing certain representation invariants on these colors. A possible datatype for a red-black tree is as follows:
datatype color = Red | Black datatype rbtree = Empty | Node of {color: color, value: int, left:rbtree, right:rbtree}
Here are the new conditions we add to the binary search tree rep invariant:
Note that empty nodes are considered always to be black. If a tree satisfies these two conditions, it must also be the case that every subtree of the tree also satisfies the conditions. If a subtree violated either of the conditions, the whole tree would also.
With these invariants, the longest possible path from the root to an empty node would alternately contain red and black nodes; therefore it is at most twice as long as the shortest possible path, which only contains black nodes. If n is the number of nodes in the tree, the path cannot have a length greater than 2 lg n, which is O(lg n). Therefore, the tree has height O(lg n) and the operations are all as asymptotically efficient as we could expect.
How do we check for membership in red-black trees? Exactly the same way as for general binary trees:
fun contains (n: int, t:rbtree): bool = (case t of Empty => false | Node {value,left,right,...} => (case Int.compare (value, n) of EQUAL => true | GREATER => contains (n,left) | LESS => contains (n,right)))
More interesting is the add
operation. We add by replacing
the empty node that a standard add into a binary
search tree would. We also color the new node red to ensure that
invariant #2 is preserved. However, we may destroy invariant #1 in doing so, by
producing two red nodes, one the parent of the other. In order to restore this
invariant we will need to consider not only the two red nodes, but their parent.
Otherwise, the red-red conflict cannot be fixed while preserving black depth. The next figure shows all
the possible cases that may arise:
1 2 3 4Bz Bz Bx Bx / \ / \ / \ / \ Ry d Rx d a Rz a Ry / \ / \ / \ / \ Rx c a Ry Ry d b Rz / \ / \ / \ / \ a b b c b c c d
Notice that in each of these trees, the values of the nodes in a,b,c,d must have the same relative ordering with respect to x, y, and z: a<x<b<y<c<z<d. Therefore, we can perform a local tree rotation to restore the invariant locally, while possibly breaking invariant 1 one level up in the tree:
Ry / \ Bx Bz / \ / \ a b c d
By performing a rebalance of the tree at that level, and all the levels above, we can locally (and incrementally) enforce invariant #1. In the end, we may end up with two red nodes, one of them the root and the other the child of the root; this we can easily correct by coloring the root black. The SML code (which really shows the power of pattern matching!) is as follows:
fun add (n:int, t:rbtree): rbtree = let (* Definition: a tree t satisfies the "reconstruction invariant" if it is * black and satisfies the rep invariant, or if it is red and its children * satisfy the rep invariant and have the same black height. *) (* makeBlack(t) is a tree that satisfies the rep invariant. Requires: t satisfies the reconstruction invariant Algorithm: Make a tree identical to t but with a black root. *) fun makeBlack (t:rbtree): rbtree = case t of Empty => Empty | Node {color,value,left,right} => Node {color=Black, value=value, left=left, right=right} (* Construct the result of a red-black tree rotation. *) fun rotate(x: value, y: value, z: value, a: rbtree, b: rbtree, c:rbtree, d: rbtree): rbtree = Node {color=Red, value=y, left= Node {color=Black, value=x, left=a, right=b}, right=Node {color=Black, value=z, left=c, right=d}} (* balance(t) is a tree that satisfies the reconstruction invariant and * contains all the same values as t. * Requires: one of the children of t satisfies the rep invariant and * the other satisfies the reconstruction invariant. Both children * have the same black height. *) fun balance (t:rbtree): rbtree = case t of (*1*) Node {color=Black, value=z, left= Node {color=Red, value=y, left=Node {color=Red, value=x, left=a, right=b}, right=c}, right=d} => rotate(x,y,z,a,b,c,d) | (*2*) Node {color=Black, value=z, left=Node {color=Red, value=x, left=a, right=Node {color=Red, value=y, left=b, right=c}}, right=d} => rotate(x,y,z,a,b,c,d) | (*3*) Node {color=Black, value=x, left=a, right=Node {color=Red, value=z, left=Node {color=Red, value=y, left=b, right=c}, right=d}} => rotate(x,y,z,a,b,c,d) | (*4*) Node {color=Black, value=x, left=a, right=Node {color=Red, value=y, left=b, right=Node {color=Red, value=z, left=c, right=d}}} => rotate(x,y,z,a,b,c,d) | _ => t (* Add x into t, returning a tree that satisfies the reconstruction invariant. *) fun walk (t:rbtree):rbtree = case t of Empty => Node {color=Red, value=n, left=Empty, right=Empty} | Node {color,value,left,right} => (case Int.compare (value,n) of EQUAL => t | GREATER => balance (Node {color=color, value=value, left=walk left, right=right}) | LESS => balance (Node {color=color, value=value, left=left, right=walk right})) in makeBlack (walk (t)) end
This code walks back up the tree from the point of insertion fixing the
invariants at every level. At red nodes we don't try to fix the invariant; we
let the recursive walk go back until a black node is found. When the walk
reaches the top the color of the root node is restored to black, which is needed
if balance
rotates the root.
Deletion of elements from a red-black tree is also possible, but requires the consideration of more cases. Deleting a black element from the tree creates the possibility that some path in the tree has too few black nodes; the solution is to consider that path to contain a "doubly-black" node. A series of tree rotations can then eliminate the doubly-black node, by propagating the blackness up until a red node can be converted to a black node.