CS 312 Recitation 12
Binary Search Trees

Binary search trees

type value = int
datatype bstree = Nil | Node of value * bstree * bstree

A binary search tree t is a binary tree with the following representation invariant: for any node n=Node(v,left,right), all of the nodes in left have values less than v, all of the nodes in right have values greater than v. Furthermore, all of the nodes in left and right must also satisfy this invariant. and  The empty tree Nil is by default a binary search tree. 

Given such a tree, how do you perform a lookup operation? Start from the root, and at every node, if the value of the node is what you are looking for, you are done; otherwise, recursively look up in the left or right subtree depending on the value stored at the node. In code:

fun contains (t:bstree, v: value): bool = 
  case t
    of Empty => false
     | Node {v,l,r} => case Int.compare (v,n)
                         of EQUAL   => true
                          | GREATER => contains (l, n)
                          | LESS    => contains (r, n)

Addition is similar: you perform a lookup until you find the empty node that should contain the value. In code:

fun insert(t: bstree, n: int): bstree =
  case t 
    of Nil => Node(n, Nil, Nil)
     | Node(v, l, r) => case Int.compare(n, v)
                          of LESS    => Node(v, insert(l, n), r)
                           | GREATER => Node(v, l, insert(r, n))
                           | EQUAL   => t

What is the running time of those operations? An analysis of insert shows that this is O(n) where n is the height of the tree. The recurrence relation for the running time of insert is:

T(0) = c1
T(n) = T(n-1) + c2 for n > 0

The second relation comes from the fact that we're referring to the worst-case running time. In the worst-case, each recursive call to insert will be such that we follow the tallest sub-tree (i.e., insert follows the longest path through the tree). Hence, the T(n-1) term in the above formula. Clearly, this recurrence shows that the running time of insert is O(n).

One can also show that insert is O(n) with respect to n = number of nodes in the the tree. We get a similar recurrence relation for the worst-case running time. The worst case occurs when the tree degenerates into a single long branch (imagine adding the numbers 1,2,3,4,5,6,7 in order into a binary search tree).

How about removing an element? The code is as follows:

(* delete(t,x) = a BST with the same values as t, but without x. 
 * Requires: t is a binary search tree. *)
fun delete(t: bstree, n: int): bstree = 
 let
  (* deleteSuccessor(t) is (t', v) where v is the smallest value in t
   * and t' is a BST with all the elements of t except v. 
   * Requires: t is a non-empty binary search tree. *)
  fun deleteSuccessor(t: bstree): bstree*int =
    case t
      of Nil => raise Fail "impossible"
    | Node(v, Nil, r) => (r, v)
    | Node(v, l, r) => let val (l', v') = deleteSuccessor l
                       in (Node(v, l', r), v')
                       end
  in
    case t
      of Nil => raise NotFound
    | Node (v, l, r) =>
        case Int.compare(n, v)
          of LESS => Node(v, delete(l, n), r)
        | GREATER => Node(v, l, delete(r, n))
        | EQUAL => case (l, r) 
                     of (_, Nil) => l
                   | (Nil, _) => r
                   | _ => let val (r', v') = deleteSuccessor r
                          in Node(v', l, r') 
                          end
end

We can show that deleteSuccessor(t) works correctly, by induction on the height of the tree t. The property to prove is P(n) = if t is a binary search tree of height n, and (t',v) = deleteSuccessor(t) then t' is a binary search tree and elements(t) = elements(t') U {v}.

We can then show that delete(t,v) works correctly, again by induction on the height of the tree t. The property to prove is P(n) = if t is a binary search tree and t' =delete(t,v) then t' is a binary search tree and elements(t) = elements(t') U {v}. This proof will rely on the correctness of deleteSuccessor. Both proofs are left as exercise.

The run-time complexity of delete is again O(n) with respect to the size of the tree (i.e., number of nodes in the tree).

A way to improve the run-time complexity of tree operations is to keep the tree balanced. How can that be achieved? How are insertion and removal performed in order to keep the trees balanced? This will be the topic of the following lectures.