Discussion 9: Huffman Coding

Solutions

Exercise 1: Huffman Coding by Hand

In this exercise, you'll use Huffman coding to encode the String,

"carter raced a red car"

(a)

First, fill out the following table to determine the frequencies of each character in this String.

Character	Frequency
`' '`	4
`'a'`	4
`'c'`	3
`'d'`	2
`'e'`	3
`'r'`	5
`'t'`	1

(b)

Next, use the procedure described in the previous section to draw the prefix tree.

Note: Our above description did not specify what should happen in the case of ties (when there are multiple subtrees in the priority queue with the same priority). This means that there can be multiple valid prefix trees (which, interestingly, all produce a final encoding using the same number of bits).

To make sure you end up with the same prefix tree as us, assume that the priority queue breaks ties by returning the tree with highest priority that contains the alphabetically earliest leftmost leaf (space is alphabetically before all letters). In addition, always draw the highest-priority subtree from each iteration to the right of the second-highest-priority subtree. These rules were used to break ties in the example from the previous section.

(c)

Now, use your prefix tree to fill out the following table of binary encodings for each character.

Character	Binary Encoding
`' '`	`000`
`'a'`	`11`
`'c'`	`011`
`'d'`	`0100`
`'e'`	`001`
`'r'`	`10`
`'t'`	`0101`

(d)

Finally, use your table of character encodings to write the final binary String encoding our original String, “carter raced a red car”.

011111001010011000010110110010100000110001000101000000111110

Exercise 2: Programming the Huffman Coding Procedure

(a)

buildPrefixTree()

Complete the definition of the buildPrefixTree() method, which carries out the tree construction procedure similar to part (b) of the previous exercise. Pay careful attention to the data types involved in this method.

The priority queue is not the MaxHeapPriorityQueue that we developed in lecture. Rather, it is the PriorityQueue from Java’s standard library (offering you the opportunity to work with more of the Java library code). The main difference is that Java’s priority queue uses the “natural ordering” of the objects themselves to determine priority, rather than allowing the user to specify the priority in add().
The elements in the priority queue are of type SymbolCount. This is a record class defined at the top of the file that contains a subtree (specifically, an ImmutableBinaryTree from the lecture notes) and an integer frequency. The nodes of this subtree are labeled with Strings, which should contain all the characters of that subtree’s leaves, in order from left to right. The frequency is the sum of the frequencies of all these characters.

Complete the method definition to iteratively select the two highest-priority subtrees from the priority queue and combine them to build a larger subtree. Once the priority queue includes a single entry, return the tree in that entry.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * Carries out the Huffman encoding procedure to build and return the prefix 
 * tree associated with this text. Each node in the tree is labeled with the 
 * characters found in all leaves of its subtree, in order from left to right.
 */
static ImmutableBinaryTree<String> buildPrefixTree(PriorityQueue<SymbolCount> pQueue) {
  while (true) {
    SymbolCount s1 = pQueue.remove();
    if (pQueue.isEmpty()) { // s1 holds the complete prefix tree
      return s1.symbolTree;
    }
    SymbolCount s2 = pQueue.remove();
    pQueue.add(new SymbolCount(
      new ImmutableBinaryTree<>(s2.symbolTree.root + s1.symbolTree.root, s2.symbolTree, s1.symbolTree),
      s1.count + s2.count));
  }
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


/**
 * Carries out the Huffman encoding procedure to build and return the prefix 
 * tree associated with this text. Each node in the tree is labeled with the 
 * characters found in all leaves of its subtree, in order from left to right.
 */
static ImmutableBinaryTree<String> buildPrefixTree(PriorityQueue<SymbolCount> pQueue) {
  while (true) {
    SymbolCount s1 = pQueue.remove();
    if (pQueue.isEmpty()) { // s1 holds the complete prefix tree
      return s1.symbolTree;
    }
    SymbolCount s2 = pQueue.remove();
    pQueue.add(new SymbolCount(
      new ImmutableBinaryTree<>(s2.symbolTree.root + s1.symbolTree.root, s2.symbolTree, s1.symbolTree),
      s1.count + s2.count));
  }
}

(b)

getBinaryEncoding()

Complete the definition of the getBinaryEncoding() method, which returns the binary String encoding the given character, according to the given prefix tree (i.e., this method allows us to build a table similar to that from part (c) of the previous exercise).

In your definition, you’ll want to traverse down the prefix tree, building up the binary String that you will return as you go. Use the labels of each node to determine which child to traverse down to.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


/**
 * Returns the binary string encoding of the given character `c` from the given `prefixTree`. 
 * Requires that `c` is present in one of the leaves of the `prefixTree`.
 */
static String getBinaryEncoding(char c, ImmutableBinaryTree<String> prefixTree) {
  assert prefixTree.root.indexOf(c) >= 0 : c + " cannot be encoded using this prefix tree";
  BinaryTree<String> current = prefixTree; // our current position in the tree traversal
  StringBuilder sb = new StringBuilder(); // will hold the binary encoding of this character
  while(!current.root.equals(Character.toString(c))) {
    if (current.left() != null && current.left().root.indexOf(c) >= 0) { // c in left subtree
      current = current.left();
      sb.append(0);
    } else { // c in right subtree
      current = current.right();
      sb.append(1);
    }
  }
  return sb.toString();
}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19


/**
 * Returns the binary string encoding of the given character `c` from the given `prefixTree`. 
 * Requires that `c` is present in one of the leaves of the `prefixTree`.
 */
static String getBinaryEncoding(char c, ImmutableBinaryTree<String> prefixTree) {
  assert prefixTree.root.indexOf(c) >= 0 : c + " cannot be encoded using this prefix tree";
  BinaryTree<String> current = prefixTree; // our current position in the tree traversal
  StringBuilder sb = new StringBuilder(); // will hold the binary encoding of this character
  while(!current.root.equals(Character.toString(c))) {
    if (current.left() != null && current.left().root.indexOf(c) >= 0) { // c in left subtree
      current = current.left();
      sb.append(0);
    } else { // c in right subtree
      current = current.right();
      sb.append(1);
    }
  }
  return sb.toString();
}

(c)

decode()

The compressed files after Huffman coding appear nothing like the original (take a look!), so we’ll need a way to decode them and restore the original text. We’ve provided most of the definition of a decode() method that does just this. To complete this method, you must fill in the portion to scan over the binary String and restore the original characters.

To do this, write a loop over the bits in the BitSet object bits. You can read the ith bit by calling bits.get(i) which will return true if this bit is a 1 and false if this bit is a 0. The total number of bits is bits.length(). You’ll want to use the decodingMap to convert binary Strings back to characters. Read through the Map documentation (or look ahead at the Lecture 19 notes) to see which map operations may be useful.

1
2
3
4
5
6
7
8


String s = ""; // holds bit string of the character we are currently decoding
for(int i = 0; i < bits.length(); i++) {
  s += bits.get(i) ? "1" : "0";
  if (decodingMap.containsKey(s)) {
    sb.append(decodingMap.get(s));
    s = "";
  }
}

1
2
3
4
5
6
7
8


String s = ""; // holds bit string of the character we are currently decoding
for(int i = 0; i < bits.length(); i++) {
  s += bits.get(i) ? "1" : "0";
  if (decodingMap.containsKey(s)) {
    sb.append(decodingMap.get(s));
    s = "";
  }
}