Discussion 9: Huffman Coding
Solutions
First, fill out the following table to determine the frequencies of each character in this String.
| Character | Frequency | 
|---|---|
 
        ' ' | 
       
         4 
       | 
    
 
        'a' | 
       
         4 
       | 
    
 
        'c' | 
       
         3 
       | 
    
 
        'd' | 
       
         2 
       | 
    
 
        'e' | 
       
         3 
       | 
    
 
        'r' | 
       
         5 
       | 
    
 
        't' | 
       
         1 
       | 
    
Next, use the procedure described in the previous section to draw the prefix tree.
Note: Our above description did not specify what should happen in the case of ties (when there are multiple subtrees in the priority queue with the same priority). This means that there can be multiple valid prefix trees (which, interestingly, all produce a final encoding using the same number of bits).
To make sure you end up with the same prefix tree as us, assume that the priority queue breaks ties by returning the tree with highest priority that contains the alphabetically earliest leftmost leaf (space is alphabetically before all letters). In addition, always draw the highest-priority subtree from each iteration to the right of the second-highest-priority subtree. These rules were used to break ties in the example from the previous section.
Now, use your prefix tree to fill out the following table of binary encodings for each character.
| Character | Binary Encoding | 
|---|---|
 
        ' ' | 
       
        000 | 
    
 
        'a' | 
       
        11 | 
    
 
        'c' | 
       
        011 | 
    
 
        'd' | 
       
        0100 | 
    
 
        'e' | 
       
        001 | 
    
 
        'r' | 
       
        10 | 
    
 
        't' | 
       
        0101 | 
    
buildPrefixTree()
Complete the definition of the buildPrefixTree() method, which carries out the tree construction procedure similar to part (b) of the previous exercise. Pay careful attention to the data types involved in this method.
- The priority queue is not the 
MaxHeapPriorityQueuethat we developed in lecture. Rather, it is thePriorityQueuefrom Java’s standard library (offering you the opportunity to work with more of the Java library code). The main difference is that Java’s priority queue uses the “natural ordering” of the objects themselves to determine priority, rather than allowing the user to specify the priority inadd(). - The elements in the priority queue are of type 
SymbolCount. This is a record class defined at the top of the file that contains a subtree (specifically, anImmutableBinaryTreefrom the lecture notes) and an integer frequency. The nodes of this subtree are labeled withStrings, which should contain all the characters of that subtree’s leaves, in order from left to right. The frequency is the sum of the frequencies of all these characters. 
Complete the method definition to iteratively select the two highest-priority subtrees from the priority queue and combine them to build a larger subtree. Once the priority queue includes a single entry, return the tree in that entry.
 | 
 | 
 | 
 | 
getBinaryEncoding()
Complete the definition of the getBinaryEncoding() method, which returns the binary String encoding the given character, according to the given prefix tree (i.e., this method allows us to build a table similar to that from part (c) of the previous exercise).
In your definition, you’ll want to traverse down the prefix tree, building up the binary String that you will return as you go. Use the labels of each node to determine which child to traverse down to.
 | 
 | 
 | 
 | 
decode()
The compressed files after Huffman coding appear nothing like the original (take a look!), so we’ll need a way to decode them and restore the original text. We’ve provided most of the definition of a decode() method that does just this. To complete this method, you must fill in the portion to scan over the binary String and restore the original characters.
To do this, write a loop over the bits in the BitSet object bits. You can read the ith bit by calling bits.get(i) which will return true if this bit is a 1 and false if this bit is a 0. The total number of bits is bits.length(). You’ll want to use the decodingMap to convert binary Strings back to characters. Read through the Map documentation (or look ahead at the Lecture 19 notes) to see which map operations may be useful.
 | 
 | 
 | 
 |