Combining Color and Spatial Information for Content-based Image Retrieval

Jing Huang

Ramin Zabih

Computer Science Department

Cornell University

Ithaca, NY 14853



Much of the information stored in digital libraries will contain either images or video, which is difficult to search or browse.  Automatic methods for searching image collections make wide use of color histograms, because they are robust to large changes in viewpoint, and can be computed trivially. However, color histograms fail to incorporate spatial information, and therefore tend to give poor results. We have developed several methods for combining color information with spatial layout, while retaining the advantages of histograms. One technique computes the distribution of a given color as a function of the distance between two pixels. The resulting method, which we call a color correlogram, has proven to be quite effective even with very coarsely quantized color information. Another method computes joint histograms of local properties, thus dividing pixels into classes based on both color and spatial properties. Experiments with a database of over 200,000 images demonstrate that these measures perform significantly better than color histograms, especially when the number of images is large.


One of the primary challenges in digital libraries is the problem of providing intelligent search mechanisms for multimedia collections. While there are good tools for searching text collections, images are much more difficult. If the images are annotated by hand, a textual search can be used; however, this approach is too labor-intensive to scale up with large digital libraries. Automated methods for searching large image collections are therefore necessary.  This in turn requires simple and effective image features for comparing images based on their overall appearance. Color histograms are widely used, for example by [QBIC], [Chabot] and [Photobook]. The histogram is easy to compute and is insensitive to small changes in viewing positions. A histogram is a coarse characterization of an image, however, and images with very different appearances can have similar histograms. For example, the images shown in figure 1 have similar color histograms. When image databases are large, this problem is especially acute.


same1.jpg (131715 bytes) same2.jpg (122037 bytes)

Figure 1: Two images with similar color histograms

Since histograms do not include any spatial information, recently several approaches have attempted to incorporate spatial information with color [Hsu, Stricker, Smith]. These methods, however, lose many of the advantages of color histograms. In this paper we describe methods for combining color information with spatial layout while retaining the advantages of histograms. One method computes the spatial correlation of pairs of colors as a function of the distance between pixels. We call this feature a  color correlogram (the term ``correlogram'' is adapted from spatial data analysis [Upton]) Another approach is based on computing joint histograms of several local properties. Joint histograms can be compared as vectors, just as color histograms can. However, in a color histogram any two pixels of the same color are effectively identical. With joint histograms, pixels must share several properties beyond color. We call this approach histogram refinement. The methods we describe are easy to compute, and they produce concise summaries of the image.

We will next describe color correlograms and histogram refinement (for details see [Huang] and [Pass]. We have evaluated these methods using a large database of images, on tasks with a simple, intuitive notion of ground truth. The experimental results that we present show that our methods are significantly more efficient than color histograms.


A color correlogram (henceforth correlogram) expresses how the spatial correlation of pairs of colors changes with distance. Informally, a correlogram for an image is a table indexed by color pairs, where the d-th entry for row (i,j) specifies the probability of finding a pixel of color j at a distance d from a pixel of color i in this image. Here d is chosen from a set of distance values D (see [Huang] for the formal definition). An autocorrelogram captures spatial correlation between identical colors only. This information is a subset of the correlogram and consists of rows of the form (i,j) only. An example autocorrelogram is shown in figure 2.

Since local correlations between colors are more significant than global correlations in an image, a small value of d is sufficient to capture the spatial correlation. We have an efficient algorithm to compute the correlogram when d is small. This computation is linear in the image size (see [Huang]).

exp1_s.jpg (1178 bytes)

exp2_s.jpg (1479 bytes)

Image 1

Image 2

autocorr_s.jpg (69218 bytes)
Figure 2: Two images with their autocorrelograms.  Note that the change in spatial layout would be ignore by color histograms, but causes a significant difference in the autocorrelograms.

The highlights of the correlogram method are: (i) it includes the spatial correlation of colors, and (ii) it can be used to describe the global distribution of local spatial correlation of colors if D is chosen to be local (see our experimental data). An additional advantage lies in the ability of our methods to succeed with very coarse color information. As we show in [Huang], our data suggests that 8-color correlograms perform better than 64-color histograms.

Unlike purely local properties, such as pixel position, gradient direction, or purely global properties, such as color distribution, correlograms take into account the local color spatial correlation as well as the global distribution of this spatial correlation. While any scheme that is based on purely local properties is likely to be sensitive to large appearance changes, (auto)correlograms are more stable to these changes; while any scheme that is based on purely global properties is susceptible to false positive matches, (auto)correlograms prove to be quite effective for content-based image retrieval from a large image database.


In histogram refinement the pixels of a given bucket are subdivided into classes based on local features. There are many possible features, including texture, orientation, distance from the nearest edge, relative brightness, etc. If we consider color as a random variable, then a color histogram approximates the variable's distribution. Histogram refinement approximates the joint distribution of a variety of local properties.

Histogram refinement prevents pixels in the same bucket from matching each other if they do not fall into the same class. Pixels in the same class can be compared using any standard method for comparing histogram buckets (such as the L1 distance). This allows fine distinctions that cannot be made with color histograms.

For example, consider a joint histogram that combines color information with the intensity gradient. A given pixel in an image has a color (in the discretized range 0 . . . ncolors - 1 and an intensity gradient (in the discretized range 0 . . . ngradient - 1). The joint histogram for color and intensity gradient will contain (ncolors x ngradient) entries. Each entry corresponds to a particular color and a particular intensity gradient. The value stored in this entry is the number of pixels in the image with that color and intensity gradient.

More precisely, given a set of k features,  we can construct a joint histogram. A joint histogram is a k-dimensional vector, such that each entry in the joint histogram contains the number of pixels in an image that are described by a k-tuple of feature values. The size of the joint histogram is therefore  the number of possible combinations of the values of each feature. Just as a color histogram approximates the density of pixel color, a joint histogram approximates the joint density of several pixel features.

Joint histograms thus increase the dimensionality of the histogram space without changing the capacity of each feature's individual histogram space. This preserves the robustness of each feature, while increasing the capacity of the histogram space.


For our experiments, we have concentrated on "query by example", where the user specifies an image, and the system attempts to retrieve the most similar images from the database.  We have used a large image collection of almost 250,000 images. Our collection contains the databases used by QBIC (1,440 images) and Chabot (11,667), as well as 200,000 frames from CNN taken one minute apart.  We have identified by hand 52 pairs of images where there is a unique "right answer" in the database, and used these images as benchmarks.   More specifically, these are image pairs where the same scene is shown from two rather different views. 

On this database, our methods perform significantly better than color histograms.  Some specific examples are given in figures 3 and 4, using both color correlograms and joint histograms.

bridge1.jpg (43559 bytes)

Color histogram rank: 411; Autocorrelogram rank: 1

Color histogram rank: 310; Autocorrelogram rank: 5

industry2.jpg (29685 bytes)

Color histogram rank: 367; Autocorrelogram rank: 1

Figure 3: Example query images and correct answers, and the rank of the correct answer using color histograms or autocorrelograms. Lower numbers indicate better performance.
13a.jpg (7607 bytes) 13b.jpg (7444 bytes)
Color histogram rank: 308; Joint histogram rank: 2
40a.jpg (3919 bytes)
Color histogram rank: 1896; Joint histogram rank: 3
47a.jpg (9177 bytes) 47b.jpg (9494 bytes)
Color histogram rank: 649; Joint histogram rank: 2
Figure 4: Example query images and correct answers, and the rank of the correct answer using color histograms or joint histograms. Lower numbers indicate better performance.

We have also performed a statistical analysis of this data; to save space, we will only present these results for joint histograms (the results for correlograms are quite similar). Most measures used by authors to evaluate retrieval performance, such as precision [Salton], are dependent on the number of images in the database. We believe that a retrieval performance
measure should be independent of the number of images. Typically a user is willing to browse a certain number of the retrieval results by hand, similar to text-based search on the web. This number is unlikely to change as the database fluctates in size, as it is really a measure of human patience. We call this number the scope of the user. A good performance measure should judge the retrieval method within a particular scope.

For the 52 queries, we ask what percent of the 52 answers were found within a particular scope. The percentage of correct answers is called the recall in the information retrieval literature [Salton]. These results are shown in figure 5 for scopes of 1 and 100. Note that joint histograms have a higher recall level at a scope of 1 than color histograms have for a scope of 100. Thus a user who was only willing to look at the top image returned using joint histograms would do better than a user willing to look at the top 100 images returned using color histograms.

Algorithm Recall at scope 1 Recall at scope 100
Color histograms 2% 40%
Joint histograms 60% 94%

Figure 5: Scope versus recall results. Higher numbers indicate better performance.



[Chabot95] Virginia Ogle and Michael Stonebraker. Chabot: Retrieval from a relational database of images. IEEE Computer, 28(9):40--48, September 1995.

[Hsu95] Wynne Hsu, T. S. Chua, and H. K. Pung. An integrated color-spatial approach to content-based image retrieval. In ACM Multimedia Conference, pages 305--313, 1995.

[Huang97] Jing Huang, S. Ravi Kumar, Mandar Mitra, Wei-Jing Zhu, and Ramin Zabih. Image indexing using color correlograms.  In IEEE Conference on Computer Vision and Pattern Recognition, pages 762--768, 1997.

[Huang97] Jing Huang, S. Ravi Kumar and Ramin Zabih. An Automatic Hierarchical Image Classification Scheme,  ACM Multimedia Conference, pages 219--228,1998.

[Pass98] Greg Pass and Ramin Zabih. Comparing images using joint histograms. Journal of Multimedia Systems, 1998 (to appear).

[Photobook96] Alex Pentland, Rosalind Picard, and Stan Sclaroff.  Photobook: Content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233--254, June 1996.

[QBIC95] Myron Flickner, Harpreet Sawhney, Wayne Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee, Dragutin Petkovic, David
Steele, and Pater Yanker. Query by image and video content: The QBIC system. IEEE Computer, 28(9):23--32, September 1995.

[Salton89] Gerard Salton. Automatic Text Processing. Addison-Wesley, 1989.

[Smith96] J.R. Smith and S.-F. Chang. VisualSEEK: A fully automated content-based image query system. In ACM Multimedia Conference, pages 87--98, November 1996.

[Stricker96] Markus Stricker and Alexander Dimai.  Color indexing with weak spatial constraints. SPIE proceedings, 2670:29--40, February 1996.

[Upton85] Graham J.G. Upton and Bernard Fingleton. Spatial Data Analysis by Example, volume I. John Wiley & Sons, 1985.