type value = int datatype bstree = Nil | Node of value * bstree * bstree
A binary search tree t is a binary tree with the following
representation invariant: for
any node n=Node(v,left,right), all of the nodes in left
have values less than v, all of the nodes in right have values
greater than v. Furthermore, all of the nodes in left and right
must also satisfy this invariant. and The empty tree Nil is
by default a binary search tree.
Given such a tree, how do you perform a lookup operation? Start from the root, and at every node, if the value of the node is what you are looking for, you are done; otherwise, recursively look up in the left or right subtree depending on the value stored at the node. In code:
fun contains (t:bstree, v: value): bool = case t of Empty => false | Node {v,l,r} => case Int.compare (v,n) of EQUAL => true | GREATER => contains (l, n) | LESS => contains (r, n)
Addition is similar: you perform a lookup until you find the empty node that should contain the value. In code:
fun insert(t: bstree, n: int): bstree = case t of Nil => Node(n, Nil, Nil) | Node(v, l, r) => case Int.compare(n, v) of LESS => Node(v, insert(l, n), r) | GREATER => Node(v, l, insert(r, n)) | EQUAL => t
What is the running time of those operations? An analysis of insert
shows that this is O(n) where n is the height of the
tree. The recurrence relation for the running time of insert
is:
T(0) = c1
T(n) = T(n-1)
+ c2 for n > 0
The second relation comes from the fact that we're referring to the worst-case running time. In the worst-case, each recursive call to insert will be such that we follow the tallest sub-tree (i.e., insert follows the longest path through the tree). Hence, the T(n-1) term in the above formula. Clearly, this recurrence shows that the running time of insert is O(n).
One can also show that insert is O(n) with respect to n = number of nodes in the the tree. We get a similar recurrence relation for the worst-case running time. The worst case occurs when the tree degenerates into a single long branch (imagine adding the numbers 1,2,3,4,5,6,7 in order into a binary search tree).
How about removing an element? The code is as follows:
(* delete(t,x) = a BST with the same values as t, but without x. * Requires: t is a binary search tree. *) fun delete(t: bstree, n: int): bstree = let (* deleteSuccessor(t) is (t', v) where v is the smallest value in t * and t' is a BST with all the elements of t except v. * Requires: t is a non-empty binary search tree. *) fun deleteSuccessor(t: bstree): bstree*int = case t of Nil => raise Fail "impossible" | Node(v, Nil, r) => (r, v) | Node(v, l, r) => let val (l', v') = deleteSuccessor l in (Node(v, l', r), v') end in case t of Nil => raise NotFound | Node (v, l, r) => case Int.compare(n, v) of LESS => Node(v, delete(l, n), r) | GREATER => Node(v, l, delete(r, n)) | EQUAL => case (l, r) of (_, Nil) => l | (Nil, _) => r | _ => let val (r', v') = deleteSuccessor r in Node(v', l, r') end end
We can show that deleteSuccessor(t) works correctly, by
induction on the height of the tree t. The property to prove is P(n)
= if t is a binary search tree of height n, and (t',v) =
deleteSuccessor(t) then t' is a binary search
tree and elements(t) = elements(t') U {v}.
We can then show that delete(t,v) works correctly, again by
induction on the height of the tree t. The property to prove is P(n)
= if t is a binary search tree and t' =delete(t,v) then t' is a binary search
tree and elements(t) = elements(t') U {v}. This proof will rely on the
correctness of deleteSuccessor. Both proofs are left as exercise.
The run-time complexity of delete is again O(n) with respect to the size of the tree (i.e., number of nodes in the tree).
A way to improve the run-time complexity of tree operations is to keep the tree balanced. How can that be achieved? How are insertion and removal performed in order to keep the trees balanced? This will be the topic of the following lectures.