CS312 Lecture 9: Red-Black Trees

Red-Black Trees

We have argued last time that a binary search tree can degenerate into a simple list when the input is ordered. Here is the example we have discussed last time:

 1
  \
   2
    \
     3     A degenerate tree (essentially a list) results when we insert an ordered sequence.
      \    (inserted sequence: 1, 2, 3, 4, 5)
       4
        \
         5

As it happens, long runs long runs of ordered values are not uncommon. Can we protect against them? If we know the input is sorted, and all the input is available at the same time (or if we can buffer it before inserting it into the tree) there are several things we can do:

The first possibility is to apply recursively the following algorithm: Find the middle of the input sequence and insert the corresponding value into the tree, then repeat the procedure recursively with the left half of the sequence (from the first element until the element immediately preceding the middle element that we just inserted), then apply the procedure recursively to the right half of the sequence. There are a few details to work out (e.g. what is the “middle” of a sequence containing an even number of values?), but overall this is a very simple algorithm. Here is the tree that results when we insert sequence 1, 2, 3, 4, 5:

    3
   /  \
  2    4   Note: if the “middle” index is falls between two integers, we always round down.
 /      \
1        5

This looks much better - and would look even better for a longer input sequence. Try it on an example!

Another possible approach is to randomly permute the input sequence, then insert the values in the final (permuted) order. If we do this cleverly, we can reduce the probability of getting a very disadvantageous sequence. In particular, if an “adversary” feeds our program a perfectly ordered sequence, a random permutation of the original sequence will reduce or eliminate the probability that this ordering will be preserved when inserting into the tree.

   4
  /  \
1                  5
  \      Input sequence 1, 2, 3, 4, 5 was randomly permuted to 4, 1, 2, 5, 3, then inserted into the tree.
   2
    \
     3

Now, it is often impractical or impossible to wait and buffer the entire. So what can we do?

What is a good shape for a tree that would allow for fast lookup? A balanced, "bushy" tree; for example:

          ^                   50
          |               /        \
          |           25              75
 height=4 |         /    \          /    \
          |       10     30        60     90
          |      /  \   /  \      /  \   /  \
          V     4   12 27  40    55  65 80  99

A full binary tree of height h will have 2h+1-1 nodes. Thus if a tree has height h and has n nodes, we must have that n<=2h+1-1. Thus h>=log2(n+1)-1. In other words, for a given height, a binary search tree will have a minimal height (approximately) proportional to the base-2 logarithm of its number of nodes. The upper bound of the tree’s depth is n, the number of its nodes (this is the degenerate case we discussed before).

Ideally, we would like to insert elements into a binary search tree in any order they come, while keeping the tree balanced. How can we keep a tree balanced? Many techniques involve inserting an element just like in a normal binary search tree, followed by some kind of tree surgery to rebalance the tree. For example:

The idea is to strengthen the invariants of the binary search tree so that trees are always approximately balanced. To help enforce the invariants, we color each node of the tree either red or black:

datatype color = Red | Black
datatype rbtree = Empty
                | Node of {color: color, value: int,
                           left:rbtree, right:rbtree}

Here are the new conditions we add to the binary search tree rep invariant:

  1. No red node has a red parent
  2. Every path from the root to an empty node has the same number of black nodes

Note that empty nodes are considered always to be black. If a tree satisfies these two conditions, it must also be the case that every subtree of the tree also satisfies the conditions. If a subtree violated either of the conditions, the whole tree would also.

For mostly technical reasons, we also need the following condition:

  1. The root of a red-black tree is always black.

With these invariants, the longest possible path from the root to an empty node would alternately contain red and black nodes; therefore it is at most twice as long as the shortest possible path, which only contains black nodes.  If we can maintain the invariants, the tree will not get too much out of balance.

How do we check for membership in red-black trees? The same way as for general binary trees:

fun contains (n: int, t:rbtree): bool = 
  (case t
     of Empty => false
      | Node {color,value,left,right} => 
          (case Int.compare (value, n)
             of EQUAL => true
              | GREATER => contains (n,left)
              | LESSER => contains (n,right)))
 

More interesting is the insert operation. We proceed as we said we would: we insert at the empty node that a standard insertion into a binary search tree indicates. We also color the inserted node red to ensure that invariant #2 is preserved. However, we may destroy invariant #1 in doing so, by producing two red nodes, one the parent of the other. The next figure shows all the possible cases that may arise:

       1             2            3             4
       Bz            Bz           Bx            Bx
      /  \          / \          /  \          /  \
     Ry  d         Rx  d        a    Rz       a    Ry
 
    /  \          / \               /  \          /  \
  Rx   c         a   Ry            Ry   d        b    Rz
 /  \               /  \          / \                /  \
a    b             b    c        b   c              c    d
 

Notice that in each of these trees, the values of the nodes in a, b, c, d must have the same relative ordering with respect to x, y, and z: a<x<b<y<c<z<d. Therefore, we can perform a local "tree rotation" to restore the invariant locally, while possibly breaking invariant 1 one level up in the tree:

     Ry        
 
    /  \
  Bx    Bz
 / \   / \
a   b c   d

Note that while there is only one version of the final diagram, this is only because of out ingenious labeling.  Assuming that the (“big”) subtree represented in case (1)  above satisfies condition (2), we can prove that the number of black nodes from the roots of subtrees a, b, c, d, respectively, to any leaf in the same subtree is the same, and that the transformation preserves both the equality and the actual count of black nodes on each path. The trees represented in cases (2), (3) and (4) have analogue properties.

The insertion of a red node anywhere in the tree does not change the number of black nodes on a path. At any time, only at most one pair of adjacent red nodes that violate condition (1) can exist.

As a result of the transformations we apply the red nodes “percolate” up the tree, and it is possible for a red node to reach the root of the entire tree. To satisfy condition (3), this node is then colored black. Because the root is on all paths from the root to leaves, the number of black nodes on all these paths increases simultaneously by 1 - this is the only way the number of black nodes on a path can increase.

The SML code (which really shows the power of pattern matching!) is as follows:

fun insert (n:int, t:rbtree): rbtree = let
  (* Definition: a tree t satisfies the "reconstruction invariant" if it is
   * black and satisfies the rep invariant, or if it is red and its children
   * satisfy the rep invariant. *)
 
  (* makeBlack(t) is a tree that satisfies the rep invariant.
     Requires: t satisfies the reconstruction invariant
     Algorithm: Make a tree identical to t but with a black root. *)
  fun makeBlack (t:rbtree): rbtree = 
    case t
       of Empty => Empty
        | Node {color,value,left,right} =>
 
          Node {color=Black, value=value,
                left=left, right=right}
  (* Construct the result of a red-black tree rotation. *)
  fun rotate(x: value, y: value, z: value,
             a: rbtree, b: rbtree, c:rbtree, d: rbtree): rbtree =
    Node {color=Red, value=y,
          left= Node {color=Black, value=x, left=a, right=b},
          right=Node {color=Black, value=z, left=c, right=d}}
  (* balance(t) is a tree that satisfies the reconstruction invariant and
   * contains all the same values as t.
   * Requires: the children of t satisfy the reconstruction invariant. *)
  fun balance (t:rbtree): rbtree = 
    case t of
      (*1*) Node {color=Black, value=z,
                  left= Node {color=Red, value=y,
                              left=Node {color=Red, value=x,
                                         left=a, right=b},
                              right=c},
                  right=d} => rotate(x,y,z,a,b,c,d)
    | (*2*) Node {color=Black, value=z,
                  left=Node {color=Red, value=x,
                             left=a,
                             right=Node {color=Red, value=y,
                                         left=b, right=c}},
                  right=d} => rotate(x,y,z,a,b,c,d)            
    | (*3*) Node {color=Black, value=x,
                 left=a,
                 right=Node {color=Red, value=z,
                             left=Node {color=Red, value=y,
                                        left=b, right=c},
                             right=d}} => rotate(x,y,z,a,b,c,d)
    | (*4*) Node {color=Black, value=x,
                  left=a,
                  right=Node {color=Red, value=y,
                              left=b,
                              right=Node {color=Red, value=z,
                                          left=c, right=d}}} =>  
            rotate(x,y,z,a,b,c,d)
    | _ => t
 
  (* Insert x into t, returning a tree that satisfies the reconstruction
     invariant. *)
 
  fun walk (t:rbtree):rbtree = 
    case t
       of Empty => Node {color=Red, value=n, left=Empty, right=Empty}
        | Node {color,value,left,right} => 
           (case Int.compare (value,n) 
              of EQUAL => t
               | GREATER => balance (Node {color=color,
                                           value=value,
                                           left=walk (left)
                                           right=right})
               | LESSER => balance (Node {color=color,
                                          value=value,
                                          left=left,
                                          right=walk (right)}))
 
in
  makeBlack (walk (t))
end       

This code walks back up the tree from the point of insertion fixing the invariants at every level. At red nodes we don't try to fix the invariant; we let the recursive walk go back until a black node is found. When the walk reaches the top the color of the root node is restored to black, which is needed if balance rotates the root.

Deletion of elements from a red-black tree is also possible, but requires the consideration of many more cases.

An important property of any balanced search tree (red-black trees included) is that it can be used to implement an ordered set easily. This is a set that keeps its elements in some sorted order. Ordered sets generally provide operations for finding the minimum and maximum elements of the set, for iterating over all the elements between two elements, and for extracting ordered subsets of the elements in a given range.

It can be proven that a red-black tree with n nodes has a height of at most 2log2(n+1). As red-black trees are binary search trees, we can compare this upper bound with the lower bound we have established before h>=log2(n+1)-1. Thus compared with a perfectly balanced tree, a red-black tree will have at most double depth. Given that red-black trees achieve this irrespective of the input sequence, this is quite a remarkable fact.