CS631 Paper Response Form

Paper Title: Notes on Lossless Compression

Author(s): Brian Smith (Editor)

Main Point(s):

Huffman compression: A binary tree is build bottom up by always combining the two symbols with the lowest probability. The branching decisions of the path from the root to a symbol then form its code, ensuring that the code length is inversely proportional to the probability of the symbol.
Arithmetic coding:In principle, we write down all sequences of the same length as the one we want ot encode in lexicographic order and determine the cumulative probability function c(i) = sum_{j<=i}{p(i)}. The code for a symbol sequence is then the shortest binary fraction of a number in the interval ]c(i-1), c(i)].
Lempel-Ziv compression: The basic idea is to build a lexicon that maps whole strings to codewords and to build these lexicons on the fly both at compression and decompression time such that the lexicon is well adapted to the text at hand yet does not have to be transmitted with the encoded text.

Possible Use(s):

There is no single universally best compression technique and knowing the assumption on which a compression technique is based can help to choose the appropriate one for a specific task.

Extensions:

A straightforward improvement of the fast huffman decoding as presented in class: When building these 10-Bit lookup-tables, it's actually a waste to just output one symbol when the 10-Bit string contains more: I propose to fill the 10-Bit table with the parsing result of the respective 10-Bit string (and the number of bits that could be parsed). For example, if we assume that the most common character is, say, a blank and it's represented by symbol 0, our lookup table would tell us the 0000000000 translates into a string of 10 blanks, all in one swoop.

CS631 home page