Lecture 10: Implementing Sets Efficiently


Now that we're good at determining asymprotic running time, let's revisit some claims we made earlier in the course.

Recall sets; in this lecture, we focus on integers sets for simplicity. We can easily generalize by passing around a comparison function.

Our most efficient implementation of sets was as a sorted list of integers with no repetition.

What is the asymptotic running time of the operations, on a list of length n:

So it's not *that* efficient. Keeping the list unsorted has the same asymptotic running time. But then what's the difference: if the list is unsorted, *all* failed lookups take O(n) time; if the list is sorted, some failed lookups can be faster.

Is there a better way to implement a set than a sorted list? (better in the sense of having (asymptotically) faster operations)

Binary Search Trees

datatype btree = Empty 
               | Node of {value:int,left:btree,right:btree}

A binary search tree if a binary tree (see definition above) with the following invariant (property): for any node n , every node in left(n) has a value less than that of n, and every node in right(n) has a value more than that of n.

Given such a tree, how do you perform a lookup operation? Start from the root, and at every node, if the value of the node is what you are looking for, you are done; otherwise, recursive lookup in the left or right subtree depending on the value stored at the node. In code:

fun lookup (n:int, t:btree): bool = 
  (case t
     of Empty => false
      | Node {value,left,right} => 
          (case Int.compare (value,n)
             of EQUAL => true
              | GREATER => lookup (n,left)
              | LESSER => lookup (n,right)))

Insertion is similar: you perform a lookup until you find the empty node that should contain the value. In code:

fun insert (n:int, t:btree):btree = 
  (case t 
     of Empty => Node {value=n,left=Empty,right=Empty}
      | Node {value,left,right} => 
          (case Int.compare (value,n)
             of EQUAL => t
              | GREATER => Node {value=value,
                                 left=insert (n,left),
                                 right=right}
              | LESSER => Node {value=value,
                                left=left,
                                right=insert (n,right)}))

What is the running time of those operations? Since insert is just a lookup with an extra node creation, we focus on the lookup operation. Clearly, an analysis of the code shows that insert is O(height of the tree). What's the worst-case height of a tree? Clearly, a tree of n nodes all in a single long branch (imagine inserting the numbers 1,2,3,4,5,6,7 in order into a binary search tree). So the worst-case running time of lookup is still O(n) (for n the number of nodes in the tree).

What is a good shape for a tree that would allow for fast lookup? A bushy tree (also known as balanced), such as:

                             50
                         /        \
                     25              75
                   /    \          /    \
                 10     30        60     90
                /  \   /  \      /  \   /  \
               4   12 27  40    55  65 80  99

If a tree with n nodes is kept balanced, its height is O(log n), which leads to a lookup operation running in time O(log n).

How can we keep a tree balanced? Many techniques involve inserting an element just like in a normal binary search tree, followed by some kind of tree surgery to rebalance the tree. For example:

Red-Black Trees

The idea is to add new invariants to the kind of trees we consider (recall, we still have the binary search tree invariant).

To help enforce the invariants, we color the nodes of the tree, as red or black nodes:

datatype color = Red | Black
datatype rbtree = Empty
                | Node of {color:color, value:value,left:rbtree,right:rbtree}

Here are the invariants we want to add to the binary search tree invariant:

  1. No red node has a red parent
  2. Every path from the root to an empty node has the same number of black nodes

Note that for the purpose of the invariants, empty nodes are considered black.

Notice that with these invariants, the longest path from the root to an empty node must be an alternation of red and black nodes, and can be of length at most 2 of the shorted such path, which only contains black nodes. You can easily verify that this leads to a tree of height O(log n) for n the number of nodes in the tree.

How do we perform a lookup on red-black trees? Same as for general binary trees:

fun lookup (n:int, t:rbtree): bool = 
  (case t
     of Empty => false
      | Node {color,value,left,right} => 
          (case Int.compare (value,n)
             of EQUAL => true
              | GREATER => lookup (n,left)
              | LESSER => lookup (n,right)))

More interesting is the insert operation. We proceed as we said we would: we insert at the empty node that a standard insertion into a binary search tree indicates. We also color the inserted node red. This ensures that invariant 2 is preserved. However, we may destroy invariant 1, in other words we may have two red nodes, one parent of the other. The next figure shows all the possible cases that may arise:

      B[z]          B[z]          B[x]            B[x]
      /  \          /  \          /  \          /    \
    R[y]  d       R[x]  d        a  R[z]       a    R[y]
    /  \          /  \              /  \            /  \
 R[x]   c        a   R[y]        R[y]   d          b   R[z]
 /  \                /  \        /  \                  /  \
a    b              b    c      b    c                c    d

We perform a local "rotation" to restaure invariant locally, at the possible cost of breaking invariant 1 one level up in the tree:

     R[y]
    /    \
 B[x]    B[z]
 /  \    /  \
a    b  c    d

By performing a rebalance of the tree at that level, and all the levels above, we can tree surgery to locally enforce invariant 1. In the end, we may end up with two red nodes, one of them the root, which we can easily correct by coloring the root black. In code:

fun insert (n:int,t:rbtree):rbtree = let
  fun makeBlack (t:rbtree):rbtree = 
    (case t
       of Empty => Empty
        | Node {color,value,left,right} => Node {color=Black,value=value,
                                                 left=left,right=right})
  fun balance (t:rbtree):rbtree = 
    (case t 
       of Node {color=Black,value=z,
                left= Node {color=Red,value=y,
                            left=Node {color=Red,value=x,
                                       left=a,right=b},
                            right=c},
                right=d} => Node {color=Red, value=y,
                                  left=Node {color=Black, value=x,
                                             left=a, right=b},
                                  right=Node {color=Black, value=z,
                                              left=c, right=d}}
    | Node {color=Black,value=z,
                left=Node {color=Red,value=x,
                       left=a,
                       right=Node {color=Red, value=y,
                                   left=b,right=c}},
            right=d} => Node {color=Red, value=y,
                              left=Node {color=Black, value=x,
                                         left=a, right=b},
                              right=Node {color=Black, value=z,
                                          left=c, right=d}}
    | Node {color=Black, value=x,
            left=a,
            right=Node {color=Red, value=z,
                        left=Node {color=Red,value=y,
                                   left=b,right=c},
                        right=d}} => Node {color=Red, value=y,
                                           left=Node {color=Black, value=x,
                                                      left=a, right=b},
                                           right=Node {color=Black, value=z,
                                                       left=c, right=d}}
    | Node {color=Black,value=x,
            left=a,
            right=Node {color=Red,value=y,
                        left=b,
                        right=Node {color=Red,value=z,
                                    left=c,right=d}}} =>  
               Node {color=Red, value=y,
                     left=Node {color=Black, value=x,
                                left=a, right=b},
                     right=Node {color=Black, value=z,
                                 left=c, right=d}}
    | _ => t)
  fun ins (t:rbtree):rbtree = 
    (case t
       of Empty => Node {color=Red,value=n,left=Empty,right=Empty}
        | Node {color,value,left,right} => 
           (case Int.compare (value,n) 
              of EQUAL => t
               | GREATER => balance (Node {color=color,
                                           value=value,
                                           left=ins (left)
                                           right=right})
               | LESSER => balance (Node {color=color,
                                          value=value,
                                          left=left,
                                          right=ins (right)})))
in
  makeBlack (ins (t))
end