2-3 trees, B-trees Notes by Dan Grossman 2-3 Trees: These are a completely different way to achieve the all-important O(log n) guarantee. (A 2-3 Tree is NOT a binary search tree.) In a 2-3 tree, * All the data objects (keys and vals) are at the leaves. * All leaves are at the same depth. * The leaves are in order left to right (but don't really have pointers between them) * All internal nodes have either 2 or 3 children and the value of the smallest leaf in its subtree. Claim: The depth of the tree is O(log n). Proof: The deepest it could be is when every internal node has 2 children. In this case, there are n leaves, n/2 nodes at the level above, n/4 at the next level, n/8, n/16,..., 1. The sum is less 2n (the standard sequence 1 + 1/2 + 1/4 + 1/8 + ... converges to 2 and we have n times part of the sequence). The shallowest is when every internal node has 3 children. So the height is always between log_3 n and log_2 n. Search: Go to child with the largest value that is less than or equal to what we're looking for. Min is go left (the key is at the root, but the value is at the leftmost leaf!), Max is go right Predecessor: Go up until you have a closest left sibling, then find max of tree at that sibling. Successor: Go up until you have a closest right sibling, then find min of tree at that sibling. (Alternately, could keep doubly-linked list. If pred/succ are really the common operations, then only have a doubly-linked list. We will not need pred/succ in any other operations like we did for red-black trees.) Insert: Put in right place. Problem: parent might now have 4 children. x = parent; while (x has 4 children) { let x's children be A, B, C, D in that order make x_1 with 2 children A, B and x_2 with 2 children C, D remove x from it's parent and add x_1 and x_2 (if was root, make a new higher root) x = x.parent } Note we might even change the root (increase the height). The rebalancing is called splitting. Delete: Take it out. Problem: parent might now have 1 child. x = parent; while (x has 1 child) { if root, make the child the root if it has 3 closest nephews, move closest here Done if it has 2 closest nephews, move child there delete x and set x to x.parent } This all works and is thanks to your Dean of Engineering in 1970. * Example: First notice that 2-3 trees are not unique for a set of keys. These are both legal 2-3 trees for the keys 8, 6, 7, 5, 3, 0, 9: 0 0 / | \ / | \ 0 5 7 0 6 8 / \ / \ / | \ / | \ / \ / \ 0 3 5 6 7 8 9 0 3 5 6 7 8 9 Notice the leaves will always be in the same order though. The height may be different, although it isn't above. Insert 1 into the tree on the right: 0 insert 0 split 0 split/ new root / \ ===> / | \ ====> / / | \ ===> 0 6 0 6 8 0 3 6 8 / \ / \ / / | \ / \ / \ / \ / \ / \ / \ 0 3 6 8 0 1 3 5 6 7 8 9 0 1 3 5 6 7 8 9 / \ / \ / \ / \ 0 1 3 5 6 7 8 9 (Note: It is NOT the case that we always have a binary tree after making a new root.) insert 0 Insert 2 into result: =====> / \ 0 6 / \ / \ 0 3 6 8 / | \ / \ / \ / \ 0 1 2 3 5 6 7 8 9 Delete 9 from result: delete 0 merge 0 merge/delete root ====> / \ ====> / \ ===> 0 6 0 6 0 / \ / \ / \ | / | \ 0 3 6 8 0 3 6 0 3 6 /|\ / \ / \ | /|\ / \ /|\ / | \ / \ /|\ 0 1 2 3 5 6 7 8 0 1 2 3 5 6 7 8 0 1 2 3 5 6 7 8 Delete 2 from result: delete 0 =====> / | \ 0 3 6 / \ / \ /|\ 0 1 3 5 6 7 8 * In fact 2-3-4 trees work just fine too. There's even a strange resemblance between red-black trees and 2-3-4 trees if you squint enough. This is a mathematical artifact of being balanced enough -- do NOT confuse the two kinds of trees! * What's the real trade-off: * No: In a red-black tree, some nodes are near the top (This is a pretty bogus argument because only exponentially few of the nodes are near the top. It's an okay argument only if those are the ones accessed. * Yes: 2-3 trees may be easier to code, but the constant factors (in terms of space and perhaps time) are worse. (Just remember we have as many as 2n nodes in our tree.) * There is a generalization of 2-3 trees called B trees. The basic idea is everything still works if 2 becomes k and 3 becomes 2k. The height is now between log_2k n and log_k n. B-trees * There's a generalization of 2-3 trees called B-Trees. It turns out everything still works just fine by making 2 => k and 3 => 2k (or 2k-1 if you prefer). Now the height is between log_2k n and log_k n. Other minor changes: * Use an array of child pointers (for convenience or maybe for binary search). * Move minimum values to the parent (to avoid following child pointers -- could have done this for 2-3 trees too. In B-tree applications it's more important. We'll see why in a minute.) Now search is log_2 k * log_k n , insert and delete basically k log_k n. What is k? It's chosen so one node fits exactly on one page of disk. (This is also why we moved the minimum values to the parents.) Computation is so much cheaper than bringing in pages from disk, that we should minimize pages. (So things like binary search on the array are completely irrelevant.) This is great for B-trees -- we have the space, we just don't want to touch it. Red-black doesn't generalize to k and d touch more nearby nodes when rebalancing. As a result, B-trees are very common in databases.