CS410, Summer 1998 Lecture 7 Outline Dan Grossman Goals: * Finish up deletion in red-black trees * Augmenting data structures with red-black trees as an example * How to implement splay trees (leaving the analysis for another time) Reading: * Red-black trees are CLR 14 * Augmenting data structures is CLR 15 * Splay trees are not in CLR: check out some of the books on reserve or ask me about other resources. * Deletion in re-black trees [Be sure to read the end of the previous lecture. These notes just pick up where those left off.] We have a token which counts as an "extra black" on a node in our tree. Placement of this token is described in the last lecture. We need to remove this token without violating the black-height property of the tree. We have 9 (yes, 9) cases: (As with insert, we assume the cases are checked in order, so i is only done if 1,...,(i-1) are not true.) 1. If the token is on the root, remove the token. (This reduces the black-height of _every_ path by one, so it's okay.) 2. If the token is on a red node, color the node black and remove the token. (Every path that includes the node will have the same black-height it did before, so it's okay.) 3. Sibling is black, nephews are black: make sibling red, move token to parent, recur: ? *? / \ / \ ** * =====> * O / \ / \ / \ / \ A B * * A B * * / \ / \ / \ / \ C D E F C D E F (Notice this only introduces a red-red violation if we will do case 2 next. And case 2 will eliminate the violation.) 4. Sibling is black, right nephew is red: rotate parent left and recolor as shown: y ? z ? z gets y's color / \ z / \ w keeps w's color x ** * =====> y * * u / \ / \ / \ / \ A B w ? O u x * w ? E F / \ / \ / \ / \ C D E F A B C D We are now done! 5. Sibling black, left nephew red: rotate sibling right and recolor as shown, now we're in case 4: y ? y ? / \ / \ x ** * z =======> x ** * w / \ / \ / \ / \ A B w O * u A B C O z / \ / \ / \ C D E F D * u / \ E F 6. Sibling red: rotate parent left, now sibling is black, so in case 3, 4, or 5. (Notice even in case 3, the parent is red, so it will only be one more step before case 2 applies.) y * z * / \ / \ x ** O z ========> y O D / \ / \ / \ A B C D x ** C / \ A B 7, 8, 9 are the right child version of 4, 5, 6. Efficiency: Why log n: If case 1, 2, or 4 we're done right away If case 3, we move up so at most log n steps If case 5, we go to case 4 and we're done If case 6, then if go to case 3, then go to case 2 b/c parent is red go to case 4, then done go to case 5, then go to case 4, then done So at most 3 rotations + log n moving ups. (Pain to code, we didn't even deal with the symmetric cases). * Augmenting Data Structures We have done the hard work of maintaining a balanced tree for any sequence of insert and delete operations. And we know how to do min, max, lookup, predecessor, successor, etc. Now the question is can we get more without violating O(log n) for what we have by adding and maintaining additional information. We can often do this by "piggy-backing" on the operations we already have: if we can always recompute the additional information based on the nodes around us, then we can recompute it as we rebalance. Example: Return i^th smallest element. Information: Size (number of nodes) of subtree, keep count at each node How useful: x = root; while (true) If x.left.size == (i-1) return x If x.left.size > (i-1) x = x.left If x.left.size < (i-1) i = i - (x.left.size + 1) x = x.right How maintained: on insert: increment count as you pass a node on rotate: do the needful changes to x, y, z on delete: for deleting a node with 2 children, do nothing. Then when deleting with <= 1 child, walk back to root, decrementing count of each ancestor. Example: min, max, predecessor, successor in O(1). The trick is to make insert, delete constant factor more expensive. min, max are trivial -- just keep extra variables and update them as necessary. pred, succ: extra links at each node, pointing to predecessor and successor. On insert: if left child, make my succ be my parent make my pred be my parent's pred make parent's pred's succ be me make my parent's pred be me right child case is symmetric On delete: make my pred's succ be my succ make my succ's pred be my pred Look -- this is exactly what we do in a doubly-linked list! In fact, these new links _are_ a sorted doubly-linked list of all the nodes in the tree. You can think of adding a linked list to a red-black tree, or a red-black tree on top of a linked list. What we now have is a data structure that is really both. On rotate: No change -- rotations do not change the items in the tree, so they certainly don't change predecessor/successor information. Example: Median element * Building of the first example, we could find the median in O(log n) by implementing: median () = find_ith_smalles (root.count / 2) * Building on the second example, we could find the median in O(1) by keeping two extra variables: * median -- the current median * diff -- an integer which is the number of nodes bigger than the median minus the number of nodes smaller than the median. Of course, this number should always be 0 if the tree has an odd number of elements. If it has an even number, it could be +1 or -1. On insert: if new key > median then diff++ else diff--; if diff == 2 then median = median.succ; diff = 0; if diff == -2 then median = median.pred; diff = 0; On delete: similarly update diff and adjust the median as necessary. On rotate: No change! Rotation doesn't change anything about the median. This all sounds great and we're staying within O(log n), but we are increasing constant factors. So we shouldn't go adding things unless they're useful to us. * Splay Trees What if we could have balanced binary search trees without the colors! Well, we can't, but we can if we use a little creative bookkeeping. Old rule: Guaranteed O(log n) operations. Relaxed rule: ANY m operations starting with an empty tree take total time O(mlog n). Notice: This is different. Notice: This means we can never fall behind. Remember when this is and isn't appropriate. Splay trees are just BSTs (no extra info), but after: * insert, we "splay" the new item to the root * lookup, we "splay" the item to the root * delete, we "splay" the item to the root, then delete and replace with successor. Splays are just "double rotations": 0: If one below root, do normal rotation. 1: If left child of left child, right rotate grandparent, then right rotate parent: x z / \ / \ y D =====> A y / \ / \ z C B x / \ / \ A B C D 2. If right child of left child, left rotate parent, then right rotate self: x z / \ / \ y D y x / \ =====> / \ / \ A z A B C D / \ B C 3, 4 are symmetric to 1, 2. Notice there are still cases, but fewer and we "share them" with all operations. Things seem to magically balance themselves -- we'll do a quick example next time. We haven't done the analysis. We may do it later in the course, but in any case, some really smart people did it for us. The analysis is messy, but this has no effect on the implementation. We should like data structures that are like that.