Dictionaries Balanced trees Red/black trees A dictionary is an ADT that allows searching of ordered data. Dictionaries admit the following operations: insert, delete, lookup sometimes max, min, predecessor, successor You can implement dictionaries with arrays, linked lists, or doubly linked lists, but these all take time Omega(n) for one or another of these operations. However, all these operations can be made to run in O(h) on a binary search tree of height h (height = length of longest path). The data is stored at the nodes in inorder. To search for an element, start at the root and compare to the data element stored there. If equal, we have found the element. If less, go down to the left subtree and continue searching in the same way. If greater, go to the right subtree. If the tree is *balanced*, that is, if the length of the longest path is O(log n), where n is the number of nodes, then all the operations above can be made to run in time O(log n) in the worst case. If we had all the elements in advance, we could make an ideal tree by sorting, then taking recursive medians. This gives a balanced tree initially. The problem is that subsequent inserts and deletes can cause the tree to become unbalanced. So we will spend O(log n) extra time after every insert/delete to rebalance the tree if necessary. Rotation--a basic rebalancing operation y x / \ / \ x C <==> A y / \ / \ A B B C (If this picture looks strange, change to a fixed-width font such as Courier.) -- can be done in either direction -- O(1) time -- preserves inorder ordering of data: A < x < B < y < C Red/black trees For the purpose of this lecture, a *path* in a binary tree is a sequence of nodes and edges starting at the root and going down to an incomplete node, where an incomplete node is a node with fewer than 2 children. The length of a path is the number of nodes. Example: o / \ o o / \ o o / \ \ o o o has 2 paths of length 1, 1 path of length 2, and 3 paths of length 3. A red/black tree is a binary tree such that every node is colored either red or black but not both, and such that (1) the number of black nodes on every path is the same; and (2) there are no 2 adjacent red nodes along any path. Without loss of generality, can also assume (3) the root is black. This is without loss of generality because if the root is red, it can be recolored black without violating (1) or (2). Note that this is a little different from the CLR definition. We will denote red nodes by o and black nodes by *. Facts about red/black trees: (i) Any incomplete node occurs at height at most 1 in the tree; that is, any incomplete node is either a leaf or has exactly one child which is a leaf. (ii) No path is more than twice the length of any other path. (iii) The tree is balanced; i.e. its height is O(log n). Proof of (i): if a node is incomplete, then it is the terminus of a path. By property (1), all its descendants must be red. By property (2), there can be no path descending from that node of length 2 or more. Proof of (ii): the shortest path with k black nodes is ***...* (k nodes, all black). The longest is *o*o...*o (2k nodes, alternating black and red). Proof of (iii): the smallest possible binary tree of height k satisfying (ii) consists of a complete binary tree of height k/2 with one path extended to length k. For this tree, k = O(log n). Red/black insert Search down to find where to insert the node. It will be a new child of some incomplete node. Color it red. The tree still satisfies (1). If the parent is black, then the tree satisfies (2) and we are done. Otherwise, we have a temporary violation of (2). We will move the violation up the tree, rebalancing with rotations as necessary to restore (2). So say we have a single violation of (2) in the tree; i.e. a red node a such that parent(a) is red. -- parent(a) cannot be the root by (3) -- grandparent(a) exists and must be black by (2) Case 1: uncle(a) exists and is red. Recolor parent(a) and uncle(a) black, grandparent(a) red. * o / \ / \ o o ==> * * / \ / \ / \ / \ a o a o / \ / \ We have moved the violation higher in the tree, or gotten rid of it altogether if the greatgrandparent is black. Continue up the tree recursively. Case 2: uncle(a) does not exist. By (i), neither does sibling(a), nor does a have any children. In other words, a must be the new node that was just added, and parent(a) was previously the leaf that a was added to. Case 2a: a is a left child of a left child. Rotate once and recolor as shown. * * / / \ o ==> a o o / a o Case 2b: a is a right child of a left child. Rotate twice and recolor as shown. * * a / / \ o ==> o o \ o a Case 3: uncle(a) exists and is black. Then sibling(a) must exist by (1) and must be black by (2). Also, a must have two children and they must both be black for the same reasons. * / \ o * / \ a o * / \ * * Case 3a: a is a left child of a left child. Rotate once and recolor as shown. * / \ * o * / \ / \ ==> / \ a o * a o o / \ / \ / \ * * * * * * Case 3b: a is a right child of a left child. Rotate twice and recolor as shown. * / \ * a o * / \ / \ ==> / \ * o a o o / \ / \ / \ * * * * * * The other cases are symmetric. Every case takes time O(1), but case 1 requires recursion with the grandparent. But by (iii) this can happen at most O(log n) times, so the total time is O(log n). Next time: delete.