Image Compression

An uncompressed full-color image obtained from digital photography,
scanning, or computer generation typically has 24-bits/pixel of
color information.  This is usually too much data to ship around
the net (a screen-size 24-bit color image is about 4MB), so
compression is needed.

The 2 most commonly used image compression techniques are GIF and JPEG.

GIF:
Does well with line drawings, cartoons, clipart, hard edges,
small number of colors, black/white images, large monochromatic
areas, overlaid text, icons.  Typical compression is 5:1.
Achieves compression by color reduction (quantization)
and lossless LZ compression.

JPEG:
Does poorly with the types of images mentioned above.
Does well with high-color photographs, naturalistic
scenes.  Exploits known limitations of human eye in
distinguishing small variations in color.  Lossy--when
you decompress, you don't get the same image back.
Achieves compression by truncating DCT (discrete cosine
transform) coefficients.  Typical compression 20:1 or up
to 100:1 at low-quality settings.  Can adjust parameters
to trade off image quality for better compression.

Color models

A 32-bit word is divided into 4 8-bit "channels", 3 of
which are used to represent the color.  The remaining channel
is sometimes used for transparency info.

In the RGB (red/green/blue) color model, a color is represented
by a triple (r,g,b), where each of r,g,b is a color level 0-255.
We can view of these triples as points in a cube.  The corners are
red = (255,0,0)
green = (0,255,0)
blue = (0,0,255)
cyan = (0,255,255)
magenta = (255,0,255)
yellow = (255,255,0)
white = (255,255,255)
black = (0,0,0)
Different shades of gray are found on the diagonal (x,x,x) between
white and black.

Another model is the HSV model for hue, saturation, value (also
called luminance).  Looking along the gray diagonal of the RGB cube from
white to black, we would see a hexagon.  Hue is the angle around that
diagonal, saturation is the distance from the diagonal to the edge,
and luminance is the distance directly outward along the diagonal
toward white.  There is a simple conversion formula between RGB 
and HSV values, although the correspondence is not one-to-one.

GIF

GIF images are only 8-bits/pixel, thus can represent only 256 colors.
The first step in reducing a 24-bit image to GIF is color quantization.
The most representative colors in the image are chosen and entered in
a color table or palette.  Colors not in the palette can be faked by
"dithering"--using dots of colors of various sizes to fool the eye
into interpolating between the represented colors, in the same way
that newspaper photographs represent grayscale photographs.  Color
quantization gives 3:1 compression.  LZ compression is used to
compress the image further, typically giving 5:1 compression.
Color information is lost during quantization, but after that
the LZ compression is lossless.

The GIF standard includes a file format in which files can contain
several images for animation.

One of the 256 color values can be reserved for transparency.

One disadvantage of GIF compared to JPEG is that color quantization
is done by the image producer before shipping over the net.  It may
then have to be quantized again on the client end to match the client's
display hardware.  Also, GIF images with different color tables displayed
on the same web page have to be quantized to use the same palette, further
degrading the image.  On the other hand, JPEG images have full color
information, which can be quantized on the client end to match the client's
color display hardware.

JPEG

JPEG stands for "Joint Photographic Experts Group".  Here is an outline of how
the compression algorithm works:

1. Transform RGB to HSV or some other suitable color space with a luminance
channel.  The other channels are color information and luminance is brightness.
More compression can be done on the color channels since the eye is less sensitive
to the loss of high frequencies in color channels than in the luminance channel.

2. Perform color quantization by averaging together groups of pixels.  This is
done in the color channels only; the luminance component is left intact.  This
gives 2:1 compression.

3. Break the image into 8x8 pixel blocks.  Transform each block using the
discrete cosine transform (DCT).  This is related to the Fourier transform.
It maps the spatial information into frequency space.  This is done separately
on each channel.  The result is a set of 64 coefficients representing the
amplitude of different frequency components.

The DCT is defined as follows.  For a sequence x_0,...,x_{N-1} of length N=2^p,
define the transformed seqence f_0,...,f_{N-1} by 

              N-1
f_n = 2e(n)/N SUM x_k cos((2k+1)n pi/2N),  n=0,1,2,...,N-1
              k=0

where e(n) = 1/sqrt(2) if n=0, 1 otherwise.  The inverse transform is

      N-1
x_k = SUM e(n) x_k cos((2k+1)n pi/2N),  k=0,1,2,...,N-1.
      n=0

Both the DCT and DFT pretend that the function is a periodic function that
repeats every 8 bits, and computes a representation of it as a linear
combination of basis periodic functions of higher and higher frequencies.
It is only an approximation, since every component above a certain frequency
is thrown away.  Also low-order bits of the amplitudes of the high frequencies
are thrown away.
 
The cosine transform is used instead of the Fourier transform because
it represents the periodic duplication of the 8x8 block as

|       __|__       |       __|__       |       __|__       |       __|__       |
|   ___/  |  \___   |   ___/  |  \___   |   ___/  |  \___   |   ___/  |  \___   |
|__/      |      \__|__/      |      \__|__/      |      \__|__/      |      \__|

instead of

|       __|       __|       __|       __|       __|       __|       __|       __|
|   ___/  |   ___/  |   ___/  |   ___/  |   ___/  |   ___/  |   ___/  |   ___/  |
|__/      |__/      |__/      |__/      |__/      |__/      |__/      |__/      |

as with the Fourier transform, thus it is smoother at the block boundary.

4. Truncate the amplitudes using a separate "quantization coefficient" for each
frequency to throw away bits of accuracy.  This is where most of the compression comes
from.  Higher frequencies are truncated more than lower, and color data is truncated
more than luminance.  The quantization coefficients are parameters that can be
adjusted according to the image quality or compression desired.  It is the high
frequency loss that causes edges to look fuzzy in JPEG.

5. Code the reduced coefficients using Huffman coding.

This describes "baseline" JPEG.  There are also "progressive" and "hierarchical"
versions.  In the progressive version, the DCT coefficients are sent in order
from lower to higher frequencies, allowing decoding and immediate display of a
low-quality image that improves as more data is received.  In the hierarchical
version, the image is coded at multiple resolutions, where the next higher resolution
is coded as a differences from the previous resolution.

MPEG
Stands for "Moving Pictures Experts Group"
For video compression.  Nothing to do with JPEG.

1. Start with a low-resolution video sequence, 30 frames/sec.

2. Convert images to HSV.  Reduce resolution further by quantization in
the color channels, leave the luminance channel at full resolution.

3. Divide each frame into 16x16 pixel blocks.  Divide each block into
8x8 subblocks.  Look for a match of the 16x16 block in the luminance
channel in a previous or future frame.  Do JPEG compression on either
the 8x8 subblocks or on the difference with the matching block in another
frame.  There are 3 types of frames: I (intracoded), P (predicted), and B
(bidirectional).  A typical sequence is

IBBPBBBPBBPBBIBBPBBBPBBBPBBIBBBPBBBPBBPBB...

The I frames are independently coded as in JPEG.  The P frames can depend
on the previous I or P frame.  The B frames depend on the previous or next
I or P frame.  Typical sizes for I, P, and B frames are 12K, 8K, 2K.
For each 16x16 block (macroblock), we do motion prediction in the luminance
channel.  A motion vector is saved with the block telling which block
in the other frame the block is similar to.

4. The DCT coefficients, motion vectors, and quantization tables are Huffman
coded.

Other compression methods and standards
JBIG -- for black/white, such as faxes
MHEG -- for multimedia
Wavelets -- alternative to DCT, catching on big
Fractals -- currently out of favor