CS410, Summer 1998
Lecture 7 Outline
Dan Grossman

Goals:
* Finish up deletion in red-black trees
* Augmenting data structures with red-black trees as an example
* How to implement splay trees (leaving the analysis for another time)

Reading:
* Red-black trees are CLR 14
* Augmenting data structures is CLR 15
* Splay trees are not in CLR: check out some of the books on reserve
  or ask me about other resources.

* Deletion in re-black trees
 [Be sure to read the end of the previous lecture.  These notes just pick up
  where those left off.]

 We have a token which counts as an "extra black" on a node in our tree.  
 Placement of this token is described in the last lecture.  We need to remove
 this token without violating the black-height property of the tree.
 We have 9 (yes, 9) cases:
 (As with insert, we assume the cases are checked in order, so i is only done
  if 1,...,(i-1) are not true.)

 1. If the token is on the root, remove the token.
    (This reduces the black-height of _every_ path by one, so it's okay.)
 
 2. If the token is on a red node, color the node black and remove the token.
    (Every path that includes the node will have the same black-height it did
     before, so it's okay.)
 
 3. Sibling is black, nephews are black: 
	make sibling red, move token to parent, recur:
	     ?                          *?
           /   \                      /    \       
         **     *	=====>       *      O
         / \   / \                  / \    / \
        A  B  *   *                A   B  *   *
             / \ / \                     / \ / \
            C  D E  F                   C  D E  F
     (Notice this only introduces a red-red violation if we will do case 2
      next.  And case 2 will eliminate the violation.)
  
 4. Sibling is black, right nephew is red:
	rotate parent left and recolor as shown:  
            y ?                     z ?           z gets y's color
            /   \ z                  /  \         w keeps w's color
        x **     *       =====>    y *   * u
         / \    / \               /  \   /  \
        A   B w ?   O u         x * w ? E    F
             /  \ / \          / \   / \
 	     C  D E  F        A   B C   D
     We are now done!
   
 5. Sibling black, left nephew red:
    rotate sibling right and recolor as shown, now we're in case 4:
            y ?                          y ?
            /  \                         /   \
        x **    * z     =======>      x **   * w
         / \   / \                     / \   /  \
       A   B w O  * u                 A  B  C    O z
             /  \ / \                           / \
             C  D E  F                         D   * u
                                                  / \
                                                 E   F

  6. Sibling red:
     rotate parent left, now sibling is black, so in case 3, 4, or 5.
            (Notice even in case 3, the parent is red, so it will only be
             one more step before case 2 applies.)
            y *                       z *
             / \                      /   \
          x **  O z    ========>   y O     D
           / \  / \                /  \   
           A B C  D             x **   C
                                 / \
                                A   B

    7, 8, 9 are the right child version of 4, 5, 6.

    Efficiency:
	Why log n: If case 1, 2, or 4 we're done right away
		   If case 3, we move up so at most log n steps
		   If case 5, we go to case 4 and we're done
		   If case 6, then if
			go to case 3, then go to case 2 b/c parent is red
			go to case 4, then done
			go to case 5, then go to case 4, then done
	So at most 3 rotations + log n moving ups.

	(Pain to code, we didn't even deal with the symmetric cases).

* Augmenting Data Structures
	We have done the hard work of maintaining a balanced tree for
any sequence of insert and delete operations.  And we know how to do
min, max, lookup, predecessor, successor, etc.
	Now the question is can we get more without violating O(log n)
for what we have by adding and maintaining additional information.  We
can often do this by "piggy-backing" on the operations we already
have: if we can always recompute the additional information based on
the nodes around us, then we can recompute it as we rebalance.

	Example: Return i^th smallest element.
	Information: Size (number of nodes) of subtree, keep count at each node
	How useful:     x = root;
			while (true)
			    If x.left.size == (i-1)
				return x
			    If x.left.size > (i-1)
				x = x.left
			    If x.left.size < (i-1)
				i = i - (x.left.size + 1)
				x = x.right

	How maintained: on insert: increment count as you pass a node
			on rotate: do the needful changes to x, y, z
			on delete: for deleting a node with 2 children, do
			           nothing.  Then when deleting with <= 1 child,
                                   walk back to root, decrementing count of
                                   each ancestor.

	Example: min, max, predecessor, successor in O(1).  The trick
		 is to make insert, delete constant factor more expensive.
		 
		 min, max are trivial -- just keep extra variables and update
                 them as necessary.

		 pred, succ: extra links at each node, pointing to predecessor
                             and successor.

		 On insert: if left child,
				make my succ be my parent
				make my pred be my parent's pred
                                make parent's pred's succ be me
                                make my parent's pred be me
			     right child case is symmetric
		 On delete: make my pred's succ be my succ
			    make my succ's pred be my pred
		 Look --  this is exactly what we do in a doubly-linked list!
                 In fact, these new links _are_ a sorted doubly-linked list
                 of all the nodes in the tree.  You can think of adding
                 a linked list to a red-black tree, or a red-black tree on top
		 of a linked list.  What we now have is a data structure that
		 is really both.

		 On rotate: No change -- rotations do not change the items
		 in the tree, so they certainly don't change 
		 predecessor/successor information.
				   
	Example: Median element 
	    * Building of the first example, we could find the median in 
              O(log n) by implementing: 
			median () = find_ith_smalles (root.count / 2)
            * Building on the second example, we could find the median in
	      O(1) by keeping two extra variables:
		* median -- the current median
		* diff -- an integer which is the number of nodes bigger than
		  the median minus the number of nodes smaller than the median.
		  Of course, this number should always be 0 if the tree has
                  an odd number of elements.  If it has an even number, it
                  could be +1 or -1.
		On insert: if new key > median then diff++ else diff--;
			   if diff == 2 then 
				median = median.succ;
				diff = 0;
			   if diff == -2 then
				median = median.pred;
				diff = 0;
		On delete: similarly update diff and adjust the median as
			   necessary.
		On rotate: No change!  Rotation doesn't change anything about 
			   the median.

This all sounds great and we're staying within O(log n), but we are
increasing constant factors.  So we shouldn't go adding things unless
they're useful to us.

* Splay Trees
	What if we could have balanced binary search trees without the colors!
	Well, we can't, but we can if we use a little creative bookkeeping.

	Old rule: Guaranteed O(log n) operations.
	Relaxed rule: ANY m operations starting with an empty tree
		      take total time O(mlog n).
		Notice: This is different.
		Notice: This means we can never fall behind.
	Remember when this is and isn't appropriate.

	Splay trees are just BSTs (no extra info), but after:
	  * insert, we "splay" the new item to the root
	  * lookup, we "splay" the item to the root
	  * delete, we "splay" the item to the root, then delete and replace
		    with successor.

	Splays are just "double rotations":
	0: If one below root, do normal rotation.
	1: If left child of left child, right rotate grandparent, then right
	   rotate parent:

	      x                z
	    /  \              / \
	   y    D   =====>   A   y
          / \                   / \
         z   C                 B   x
        / \                       / \
       A   B                     C   D

	2. If right child of left child, left rotate parent, then right rotate
	   self:

	     x                   z
            / \                /   \
           y   D              y     x
          / \       =====>   / \   / \
         A   z              A   B C   D
            / \
           B   C
         3, 4 are symmetric to 1, 2.

	Notice there are still cases, but fewer and we "share them" with
	all operations.

	Things seem to magically balance themselves -- we'll do a quick
	example next time.

	We haven't done the analysis.  We may do it later in the
	course, but in any case, some really smart people did it for us.  The
	analysis is messy, but this has no effect on the implementation.  We
	should like data structures that are like that.