Lab: Balanced Trees

# Balanced Trees In this lab, we will further explore binary search trees and red-black trees. We'll also take a brief look at tree traversals. As an optional but recommended exercise at the end, you can implement dictionaries using red-black trees. ## Binary search trees ##### Exercise: draw BST [✭✭] Draw binary search trees of height 3, 4, 5, 6, and 7 for the set {1, 4, 5, 10, 16, 17, 21}. The height of a tree is the maximum number of nodes on any path from its root to a leaf. &square; ##### Exercise: search sequence [✭✭] Suppose we have the numbers 1 through 1000 in a binary search tree and want to search for the number 363. Which of the following could *not* be the sequence of nodes examined? A. 2, 252, 401, 398, 330, 344, 397, 363 B. 924, 220, 911, 244, 898, 258, 362, 363 C. 925, 202, 911, 240, 912, 245, 363 D. 2, 399, 387, 219, 266, 382, 381, 278, 363 E. 935, 278, 347, 621, 399, 392, 358, 363 &square; ##### Exercise: functorized BST [✭✭✭] Our implementation of BSTs in lecture assumed that it was okay to compare values using the built-in comparison operators `<`, `=`, and `>`. But what if the client of the `Set` abstraction wanted to use their own comparison operators? (e.g., to ignore case in strings, or to have sets of records where only a single field of the record was used for ordering.) Reimplement the `BstSet` abstraction as a functor parameterized on a structure that enables client-provided comparison operator(s), much like the [standard library `Set`][stdlib-set]. [stdlib-set]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Set.html &square; ## Tree traversals Suppose you wanted to convert a tree to a list. You'd have to put the values stored in the tree in some order. Here are three ways of doing that: * *preorder*: each node's value appears in the list before the values of its left then right subtrees. * *inorder*: the values of the left subtree appear, then the value at the node, then the values of the right subtree. * *postorder*: the values of a node's left then right subtrees appear, followed by the value at the node. Here is code that implements those *traversals*, along with some example applications: ``` type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree let rec preorder = function | Leaf -> [] | Node (l,v,r) -> [v] @ preorder l @ preorder r let rec inorder = function | Leaf -> [] | Node (l,v,r) -> inorder l @ [v] @ inorder r let rec postorder = function | Leaf -> [] | Node (l,v,r) -> postorder l @ postorder r @ [v] let t = Node(Node(Node(Leaf, 1, Leaf), 2, Node(Leaf, 3, Leaf)), 4, Node(Node(Leaf, 5, Leaf), 6, Node(Leaf, 7, Leaf))) (* t is 4 / \ 2 6 / \ / \ 1 3 5 7 *) let () = assert (preorder t = [4;2;1;3;6;5;7]) let () = assert (inorder t = [1;2;3;4;5;6;7]) let () = assert (postorder t = [1;3;2;5;7;6;4]) ``` ##### Exercise: efficient traversal [✭✭✭] On unbalanced trees, the traversal functions above require quadratic worst-case time (in the number of nodes), because of the `@` operator. Re-implement the functions without `@`, and instead using `::`, such that they perform exactly one cons per `Node` in the tree. Thus the worst-case execution time will be linear. You will need to add an additional accumulator argument to each function, much like with tail recursion. (But your implementations won't actually be tail recursive.) &square; ## Red-black trees ##### Exercise: RB draw complete [✭✭] Draw the perfect binary tree on the values 1, 2, ..., 15. Color the nodes in three different ways such that (i) each way is a red-black tree (i.e., satisfies the red-black invariants), and (ii) the three ways create trees with black heights of 2, 3, and 4, respectively. The *black height* of a tree is the maximum number of black nodes along any path from its root to a leaf. &square; ##### Exercise: RB draw insert [✭✭] Draw the red-black tree that results from inserting the characters D A T A S T R U C T U R E into an empty tree. Carry out the insertion algorithm yourself by hand, then check your work with the implementation provided in lecture. &square; ##### Exercise: RB rotate [✭✭✭] Re-implement `insert` for red-black trees yourself, without any notes except for [this picture of rotations](rbrot.jpg) [source: *Purely Functional Data Structures*, Chris Okasaki, 1998]. Especially, challenge yourself to implement the pattern-matching involved in `balance` based solely on that picture. &square; ##### Exercise: time performance [✭] Using the lecture code, re-run the experiments shown in lecture. That is, run `test 50_000` to find out how long the implementations take on your own machine. You might first also try smaller workloads, such as 1,000 or 10,000 elements. &square; ##### Exercise: space performance [✭✭✭✭] In lecture we saw a comparison of three `Set` implementations in terms of their time performance on two workloads. But what about their space performance—that is, how much memory they use? Modify the lecture code to conduct such an experiment. Use the [standard library `Gc` module][stdlib-gc] to get statistics on memory usage. `Gc.minor_words` would be a good place to start. [stdlib-gc]: http://caml.inria.fr/pub/docs/manual-ocaml/libref/Gc.html &square; ## Additional exercises ##### Exercise: standard library set [✭✭, optional] Read the [source code][stdlib-set-ml] of the standard library `Set` module. Find the representation invariant for the balanced trees that it uses. Which kind of tree does it most resemble: 2-3, AVL, or red-black? [stdlib-set-ml]: https://github.com/ocaml/ocaml/blob/trunk/stdlib/set.ml &square; ##### Exercise: dictionary [✭✭✭, optional but recommended] We've so far used balanced trees to implement sets. But they can also efficiently implement maps aka dictionaries. Just add an additional "key" component to each `Node`: instead of one value, now you have one key and one value at each node. For example, ``` type ('k,'v) tree = | Leaf | Node of ('k,'v) tree * 'k * 'v * ('k,'v) tree ``` Modify the red-black tree implementation of sets to produce an implementation of dictionaries. Your implementation should match this interface: ``` module type Dictionary = sig type ('k,'v) t val empty : ('k,'v) t val mem : 'k -> ('k,'v) t -> bool val insert : 'k -> 'v -> ('k,'v) t -> ('k,'v) t val find : 'k -> ('k,'v) t -> 'v option val bindings : ('k,'v) t -> ('k*'v) list val of_list : ('k*'v) list -> ('k,'v) t end ``` The names of the functions above are meant to suggest their specifications, which you should write yourself. &square; * * * **Acknowledgement:** this lab uses some problems from *Introduction to Algorithms*, first edition, by Cormen, Leiserson, and Rivest, 1990.