Radix Searching (Digital Search Trees and Tries) Notes by Dan Grossman Note: This is not covered in CLR. See Sedgewick (see course reserve list) ch. 15. In the near future, when we study sorting, one of the methods we will study is radix sorting. Most comparison-based sorting methods require O(n log n) comparisons, but radix sorting takes time O(n log k), where n is the number of elements and k is the number of possible keys. Radix sorting is not a purely comparison-based method, but tries to break down the keys. We should think of log k as the number of "digits" and the base of the logarithm is the number of possible values for a "digit". For example, with Java's ints and each digit being a bit, log k is 32. So this will be a viable alternative to the O(n log n) comparison sorting methods when k is small relative to n. Here we introduce the idea in the context of search trees. Most of the comparison-based methods we have been studying give O(log n) search times. Today we will introduce a strategy to get O(log k) search times. We will examine tree structures that branch on parts (i.e. "digits") of the keys rather than doing full-key comparisons. This is less general than comparison-based methods because it assumes a structure on the keys; before we just assumed only that keys were drawn from an ordered set. In the discussion below we number the bits of a key from 0 to (n-1) with bit 0 being the most-significant-bit. Digital Search Trees The simplest such strategy is a digital search tree. A digital search tree is a binary tree where each node has one element. All nodes in the left subtree of a node at level i (root at level 0, its children at level 1, etc.) have an ith bit of 0. Similarly, all nodes in the right subtree of a node at level i have an ith bit of 1. We assume no duplicate keys. Thus our tree can have height at most log k, the number of bits in our keys. The search method looks like this: search (Tree t, Key k) searchSub(t.root, k, 0) searchSub(Node n, Key k, int b) if (n==null) not there if (k == n.key) return n.val; if (digit(k,b) == 0) return SearchSub(n.left, k, b+1); else return SearchSub(n.right, k, b+1); insert follows the same descent strategy and puts the new node where the first null is reached. delete can find the element, delete it, and recursively move children into their parent's position until every node again has an element. Notice this works here and not in BSTs because digital search trees are not sorted. So we do not have to replace the element with its successor. In fact, despite efficient insert, lookup, and delete, digital search trees do not have efficient predecessor and successor. Example insertion sequence for a digital search tree: 1001 ===> 1001 ===> 1001 ===> 1001 \ \ / \ 1100 1100 0111 1100 / / 1011 1011 Tries The reason digital search trees can't find predecessor and successor efficiently is that a particular element could be anywhere along the root-to-leaf path which corresponds to its bits. We can gain O(log k) predecessor and successor while retaining O(log k) insert, delete, lookup by using a trie. The main difference in a trie is that all elements are at the leaves. The internal nodes need no information except pointers to children and parent. The leaves are then in left-right sorted order, so finding predecessor and successor is not unlike doing so in 2-3 trees. Example insertion sequence for a trie: 1001 ===> * ===> * ===> * \ \ / \ * * 0111 * / \ / \ / \ 1001 1100 * 1100 * 1100 / \ / \ 1001 1011 1001 1011 Notice the maximum height is still log k. However, we could be doing a lot of branching for no good reason. A worst-case example is something like the trie for 0000000001 and 0000000000: * / * / * / * / * / * / * / * / * / \ 0000000000 0000000001 One interesting note is that unlike just about every other structure we have studied, for any set of elements, there is only one correct trie for it. So no sort of "re-balancing" can fix the shortcoming shown above -- no rebalancing is legal. What is happening is that by always branching on the ith bit at the ith level we are being too restrictive. We can avoid these bad examples by being more clever. We could eliminate all the "one-way" branching by finding for each internal node a "meaningful" bit to branch on and storing at the node what that bit is. To do this, on insert when we get to a leaf, we find a distinguishing bit between the leaf and the new element and branch on it. Here is our example sequence -- numbers in parentheses are the bits branched on: 1001 ===> (1) ===> (1) ===> (1) / \ / \ / \ 1001 1100 (2) 1100 (2) (0) / \ / \ / \ 1001 1011 1001 1011 0111 1100 But this destroyed our ordered leaves property, thus losing efficient predecessor and successor. We can regain it by allowing branching on different bits at different levels, but requiring that the branching bits along any root-to-leaf path only increase. So we would want the last insert above to actually produce something like: (0) / \ 0111 (1) / \ (2) 1100 / \ 1001 1011 It is not immediately obvious how to write an insert that can do this. We did not discuss the algorithm in class, but it can be done. If interested, see the Sedgewick text section on "patricia tries". They achieve efficient predecessor/successor without doing unnecessary branching. Multi-way Tries If our digits have b possibilities and not just two, then our tries extend naturally by having b children at each node. How to store these b children is a classic time/space trade-off. If we expect most of the children to be null, then using an array of size b wastes much space, especially since this is at _each_ node. But a linked list of children will take longer to find the right one, and we must store which child it is (rather than implictly using an array index). In particular applications, ad hoc hybrid methods may show real improvement. For example, if our trie were using English words as keys, we could use an array for the more commonly used letters and a linked list for the rest. A different strategy might use a linked list at some nodes and an array at others. While a node had less than some threshold of children it could use a list, then switch to an array. By using an abstract node class, other nodes in the trie would not even have to know which was being used. One clever use of tries is answering the question "does a particular word exist in this language"? In the trie we store at each node a bit that is true if and only if the word spelled out by following the branches from the root down to it is in the language. This has the advantage that words with common prefixes (such as do, dog, dogged, and doghouse) actually share some space. Radix Summary Breaking keys into parts rather than using comparisons can be a simple way to get provably good behavior, especially when k is not significantly greater than n. However, sometimes getting part of a key can be slow (getting and comparing bits is slower on modern machines than comparing ints). Also, it is a less generic solution. It makes sense to make new types of data implement a lessThan method if they need to be sorted. But whether they should create methods for dividing themselves into digits is less clear.