Radix Searching (Digital Search Trees and Tries)
Notes by Dan Grossman

Note: This is not covered in CLR.  See Sedgewick (see course reserve list)
ch. 15.

In the near future, when we study sorting, one of the methods we will
study is radix sorting.  Most comparison-based sorting methods require
O(n log n) comparisons, but radix sorting takes time O(n log k), where
n is the number of elements and k is the number of possible keys.  Radix
sorting is not a purely comparison-based method, but tries to break down
the keys.  We should think of log k as the number of "digits" and the
base of the logarithm is the number of possible values for a "digit".
For example, with Java's ints and each digit being a bit, log k is 32.
So this will be a viable alternative to the O(n log n) comparison
sorting methods when k is small relative to n.

Here we introduce the idea in the context of search trees.  Most of the
comparison-based methods we have been studying give O(log n) search times.
Today we will introduce a strategy to get O(log k) search times.  We will
examine tree structures that branch on parts (i.e. "digits") of
the keys rather than doing full-key comparisons.  This is less
general than comparison-based methods because it assumes a structure
on the keys; before we just assumed only that keys were drawn from an
ordered set.

In the discussion below we number the bits of a key from 0 to (n-1)
with bit 0 being the most-significant-bit.

Digital Search Trees

The simplest such strategy is a digital search tree.  A digital search
tree is a binary tree where each node has one element.  All nodes in
the left subtree of a node at level i (root at level 0, its children
at level 1, etc.) have an ith bit of 0.  Similarly, all nodes in the
right subtree of a node at level i have an ith bit of 1.

We assume no duplicate keys.  Thus our tree can have height at most
log k, the number of bits in our keys.  The search method looks like
this:

search (Tree t, Key k)
  searchSub(t.root, k, 0)

searchSub(Node n, Key k, int b)
  if (n==null)
    not there
  if (k == n.key)
    return n.val;
  if (digit(k,b) == 0)
    return SearchSub(n.left, k, b+1);
  else
    return SearchSub(n.right, k, b+1);

insert follows the same descent strategy and puts the new node where
the first null is reached.

delete can find the element, delete it, and recursively move children
into their parent's position until every node again has an element.
Notice this works here and not in BSTs because digital search trees
are not sorted.  So we do not have to replace the element with its
successor.  In fact, despite efficient insert, lookup, and delete,
digital search trees do not have efficient predecessor and successor.

Example insertion sequence for a digital search tree:

1001 ===> 1001   ===>   1001     ===>  1001
             \             \           /  \
            1100           1100      0111 1100
                           /               /
                          1011            1011

Tries

The reason digital search trees can't find predecessor and successor
efficiently is that a particular element could be anywhere along the
root-to-leaf path which corresponds to its bits.  We can gain O(log k)
predecessor and successor while retaining O(log k) insert, delete,
lookup by using a trie.  The main difference in a trie is that all
elements are at the leaves.  The internal nodes need no information
except pointers to children and parent.  The leaves are then in
left-right sorted order, so finding predecessor and successor is not
unlike doing so in 2-3 trees.

Example insertion sequence for a trie:

1001 ===>  *         ===>  *       ===>       *
            \               \                / \
             *               *            0111  *
            / \             / \                / \
          1001 1100        *  1100            *  1100
                          / \                / \
                       1001 1011           1001 1011

Notice the maximum height is still log k.  However, we could be doing
a lot of branching for no good reason.  A worst-case example is
something like the trie for 0000000001 and 0000000000:

                          *
                         /
                        *
                       /
                      *
                     /
                    *
                   /
                  *
                 /
                *
               /
              *
             /
            *
           /
          *
         / \
0000000000 0000000001

One interesting note is that unlike just about every other structure
we have studied, for any set of elements, there is only one correct
trie for it.  So no sort of "re-balancing" can fix the shortcoming
shown above -- no rebalancing is legal.  What is happening is that by
always branching on the ith bit at the ith level we are being too
restrictive.  

We can avoid these bad examples by being more clever.  We could
eliminate all the "one-way" branching by finding for each internal
node a "meaningful" bit to branch on and storing at the node what that
bit is.  To do this, on insert when we get to a leaf, we find a
distinguishing bit between the leaf and the new element and branch on
it.

Here is our example sequence -- numbers in parentheses are the bits
branched on:

1001 ===>  (1)    ===>    (1)       ===>    (1) 
          /   \          /   \             /    \
        1001 1100       (2)  1100         (2)    (0)
                       /  \              /  \    /  \
                      1001 1011       1001 1011 0111 1100

But this destroyed our ordered leaves property, thus losing efficient
predecessor and successor.  We can regain it by allowing branching on
different bits at different levels, but requiring that the branching
bits along any root-to-leaf path only increase.  So we would want the
last insert above to actually produce something like:

                  (0)
                 /   \
                0111 (1)                    
                    /   \
                  (2)  1100
                 /   \
               1001 1011

It is not immediately obvious how to write an insert that can do this.
We did not discuss the algorithm in class, but it can be done.  If
interested, see the Sedgewick text section on "patricia tries".  They achieve
efficient predecessor/successor without doing unnecessary branching.

Multi-way Tries

If our digits have b possibilities and not just two, then our tries
extend naturally by having b children at each node.  How to store
these b children is a classic time/space trade-off.  If we expect
most of the children to be null, then using an array of size b wastes
much space, especially since this is at _each_ node.  But a linked
list of children will take longer to find the right one, and we must
store which child it is (rather than implictly using an array index).

In particular applications, ad hoc hybrid methods may show real
improvement.  For example, if our trie were using English words as
keys, we could use an array for the more commonly used letters and a
linked list for the rest.  A different strategy might use a linked
list at some nodes and an array at others.  While a node had less than
some threshold of children it could use a list, then switch to an
array.  By using an abstract node class, other nodes in the trie would
not even have to know which was being used.

One clever use of tries is answering the question "does a particular
word exist in this language"?  In the trie we store at each node a bit
that is true if and only if the word spelled out by following the
branches from the root down to it is in the language.  This has the
advantage that words with common prefixes (such as do, dog, dogged,
and doghouse) actually share some space.

Radix Summary

Breaking keys into parts rather than using comparisons can be a simple
way to get provably good behavior, especially when k is not
significantly greater than n.  However, sometimes getting part of a
key can be slow (getting and comparing bits is slower on modern
machines than comparing ints).  Also, it is a less generic solution.
It makes sense to make new types of data implement a lessThan method
if they need to be sorted.  But whether they should create methods for
dividing themselves into digits is less clear.