Image Compression An uncompressed full-color image obtained from digital photography, scanning, or computer generation typically has 24-bits/pixel of color information. This is usually too much data to ship around the net (a screen-size 24-bit color image is about 4MB), so compression is needed. The 2 most commonly used image compression techniques are GIF and JPEG. GIF: Does well with line drawings, cartoons, clipart, hard edges, small number of colors, black/white images, large monochromatic areas, overlaid text, icons. Typical compression is 5:1. Achieves compression by color reduction (quantization) and lossless LZ compression. JPEG: Does poorly with the types of images mentioned above. Does well with high-color photographs, naturalistic scenes. Exploits known limitations of human eye in distinguishing small variations in color. Lossy--when you decompress, you don't get the same image back. Achieves compression by truncating DCT (discrete cosine transform) coefficients. Typical compression 20:1 or up to 100:1 at low-quality settings. Can adjust parameters to trade off image quality for better compression. Color models A 32-bit word is divided into 4 8-bit "channels", 3 of which are used to represent the color. The remaining channel is sometimes used for transparency info. In the RGB (red/green/blue) color model, a color is represented by a triple (r,g,b), where each of r,g,b is a color level 0-255. We can view of these triples as points in a cube. The corners are red = (255,0,0) green = (0,255,0) blue = (0,0,255) cyan = (0,255,255) magenta = (255,0,255) yellow = (255,255,0) white = (255,255,255) black = (0,0,0) Different shades of gray are found on the diagonal (x,x,x) between white and black. Another model is the HSV model for hue, saturation, value (also called luminance). Looking along the gray diagonal of the RGB cube from white to black, we would see a hexagon. Hue is the angle around that diagonal, saturation is the distance from the diagonal to the edge, and luminance is the distance directly outward along the diagonal toward white. There is a simple conversion formula between RGB and HSV values, although the correspondence is not one-to-one. GIF GIF images are only 8-bits/pixel, thus can represent only 256 colors. The first step in reducing a 24-bit image to GIF is color quantization. The most representative colors in the image are chosen and entered in a color table or palette. Colors not in the palette can be faked by "dithering"--using dots of colors of various sizes to fool the eye into interpolating between the represented colors, in the same way that newspaper photographs represent grayscale photographs. Color quantization gives 3:1 compression. LZ compression is used to compress the image further, typically giving 5:1 compression. Color information is lost during quantization, but after that the LZ compression is lossless. The GIF standard includes a file format in which files can contain several images for animation. One of the 256 color values can be reserved for transparency. One disadvantage of GIF compared to JPEG is that color quantization is done by the image producer before shipping over the net. It may then have to be quantized again on the client end to match the client's display hardware. Also, GIF images with different color tables displayed on the same web page have to be quantized to use the same palette, further degrading the image. On the other hand, JPEG images have full color information, which can be quantized on the client end to match the client's color display hardware. JPEG JPEG stands for "Joint Photographic Experts Group". Here is an outline of how the compression algorithm works: 1. Transform RGB to HSV or some other suitable color space with a luminance channel. The other channels are color information and luminance is brightness. More compression can be done on the color channels since the eye is less sensitive to the loss of high frequencies in color channels than in the luminance channel. 2. Perform color quantization by averaging together groups of pixels. This is done in the color channels only; the luminance component is left intact. This gives 2:1 compression. 3. Break the image into 8x8 pixel blocks. Transform each block using the discrete cosine transform (DCT). This is related to the Fourier transform. It maps the spatial information into frequency space. This is done separately on each channel. The result is a set of 64 coefficients representing the amplitude of different frequency components. The DCT is defined as follows. For a sequence x_0,...,x_{N-1} of length N=2^p, define the transformed seqence f_0,...,f_{N-1} by N-1 f_n = 2e(n)/N SUM x_k cos((2k+1)n pi/2N), n=0,1,2,...,N-1 k=0 where e(n) = 1/sqrt(2) if n=0, 1 otherwise. The inverse transform is N-1 x_k = SUM e(n) x_k cos((2k+1)n pi/2N), k=0,1,2,...,N-1. n=0 Both the DCT and DFT pretend that the function is a periodic function that repeats every 8 bits, and computes a representation of it as a linear combination of basis periodic functions of higher and higher frequencies. It is only an approximation, since every component above a certain frequency is thrown away. Also low-order bits of the amplitudes of the high frequencies are thrown away. The cosine transform is used instead of the Fourier transform because it represents the periodic duplication of the 8x8 block as | __|__ | __|__ | __|__ | __|__ | | ___/ | \___ | ___/ | \___ | ___/ | \___ | ___/ | \___ | |__/ | \__|__/ | \__|__/ | \__|__/ | \__| instead of | __| __| __| __| __| __| __| __| | ___/ | ___/ | ___/ | ___/ | ___/ | ___/ | ___/ | ___/ | |__/ |__/ |__/ |__/ |__/ |__/ |__/ |__/ | as with the Fourier transform, thus it is smoother at the block boundary. 4. Truncate the amplitudes using a separate "quantization coefficient" for each frequency to throw away bits of accuracy. This is where most of the compression comes from. Higher frequencies are truncated more than lower, and color data is truncated more than luminance. The quantization coefficients are parameters that can be adjusted according to the image quality or compression desired. It is the high frequency loss that causes edges to look fuzzy in JPEG. 5. Code the reduced coefficients using Huffman coding. This describes "baseline" JPEG. There are also "progressive" and "hierarchical" versions. In the progressive version, the DCT coefficients are sent in order from lower to higher frequencies, allowing decoding and immediate display of a low-quality image that improves as more data is received. In the hierarchical version, the image is coded at multiple resolutions, where the next higher resolution is coded as a differences from the previous resolution. MPEG Stands for "Moving Pictures Experts Group" For video compression. Nothing to do with JPEG. 1. Start with a low-resolution video sequence, 30 frames/sec. 2. Convert images to HSV. Reduce resolution further by quantization in the color channels, leave the luminance channel at full resolution. 3. Divide each frame into 16x16 pixel blocks. Divide each block into 8x8 subblocks. Look for a match of the 16x16 block in the luminance channel in a previous or future frame. Do JPEG compression on either the 8x8 subblocks or on the difference with the matching block in another frame. There are 3 types of frames: I (intracoded), P (predicted), and B (bidirectional). A typical sequence is IBBPBBBPBBPBBIBBPBBBPBBBPBBIBBBPBBBPBBPBB... The I frames are independently coded as in JPEG. The P frames can depend on the previous I or P frame. The B frames depend on the previous or next I or P frame. Typical sizes for I, P, and B frames are 12K, 8K, 2K. For each 16x16 block (macroblock), we do motion prediction in the luminance channel. A motion vector is saved with the block telling which block in the other frame the block is similar to. 4. The DCT coefficients, motion vectors, and quantization tables are Huffman coded. Other compression methods and standards JBIG -- for black/white, such as faxes MHEG -- for multimedia Wavelets -- alternative to DCT, catching on big Fractals -- currently out of favor