Jing Huang |
| Ramin Zabih |
Computer Science Department Cornell University Ithaca, NY 14853 |
ABSTRACT
Much of the information stored in digital libraries will contain either
images or video, which is difficult to search or browse. Automatic methods for
searching image collections make wide use of color histograms, because they are robust to
large changes in viewpoint, and can be computed trivially. However, color histograms fail
to incorporate spatial information, and therefore tend to give poor results. We have
developed several methods for combining color information with spatial layout, while
retaining the advantages of histograms. One technique computes the distribution of a given
color as a function of the distance between two pixels. The resulting method, which we
call a color correlogram, has proven to be quite effective even with very
coarsely quantized color information. Another method computes joint histograms of local
properties, thus dividing pixels into classes based on both color and spatial properties.
Experiments with a database of over 200,000 images demonstrate that these measures perform
significantly better than color histograms, especially when the number of images is large.
One of the primary challenges in digital libraries is the problem of providing intelligent search mechanisms for multimedia collections. While there are good tools for searching text collections, images are much more difficult. If the images are annotated by hand, a textual search can be used; however, this approach is too labor-intensive to scale up with large digital libraries. Automated methods for searching large image collections are therefore necessary. This in turn requires simple and effective image features for comparing images based on their overall appearance. Color histograms are widely used, for example by [QBIC], [Chabot] and [Photobook]. The histogram is easy to compute and is insensitive to small changes in viewing positions. A histogram is a coarse characterization of an image, however, and images with very different appearances can have similar histograms. For example, the images shown in figure 1 have similar color histograms. When image databases are large, this problem is especially acute.
![]() |
![]() |
Figure 1: Two images with similar color histograms |
|
Since histograms do not include any spatial information, recently several approaches have
attempted to incorporate spatial information with color [Hsu, Stricker, Smith]. These methods, however,
lose many of the advantages of color histograms. In this paper we describe methods for
combining color information with spatial layout while retaining the advantages of
histograms. One method computes the spatial correlation of pairs of colors as a function
of the distance between pixels. We call this feature a color correlogram
(the term ``correlogram'' is adapted from spatial data analysis [Upton])
Another approach is based on computing joint histograms of several local properties. Joint
histograms can be compared as vectors, just as color histograms can. However, in a color
histogram any two pixels of the same color are effectively identical. With joint
histograms, pixels must share several properties beyond color. We call this approach histogram
refinement. The methods we describe are easy to compute, and they produce concise
summaries of the image.
We will next describe color correlograms and histogram refinement (for details see [Huang] and [Pass]. We have evaluated these
methods using a large database of images, on tasks with a simple, intuitive notion of
ground truth. The experimental results that we present
show that our methods are significantly more efficient than color histograms.
A color correlogram (henceforth correlogram) expresses how the spatial correlation of pairs of colors changes with distance. Informally, a correlogram for an image is a table indexed by color pairs, where the d-th entry for row (i,j) specifies the probability of finding a pixel of color j at a distance d from a pixel of color i in this image. Here d is chosen from a set of distance values D (see [Huang] for the formal definition). An autocorrelogram captures spatial correlation between identical colors only. This information is a subset of the correlogram and consists of rows of the form (i,j) only. An example autocorrelogram is shown in figure 2.
Since local correlations between colors are more significant than global
correlations in an image, a small value of d is sufficient to capture the spatial
correlation. We have an efficient algorithm to compute the correlogram when d is
small. This computation is linear in the image size (see [Huang]).
|
![]() |
Image 1 |
Image 2 |
![]() |
|
| Figure 2: Two images with their autocorrelograms. Note that the change in spatial layout would be ignore by color histograms, but causes a significant difference in the autocorrelograms. | |
The highlights of the correlogram method are: (i) it includes the spatial
correlation of colors, and (ii) it can be used to describe the global distribution of
local spatial correlation of colors if D is chosen to be local (see our experimental data). An additional advantage lies in the
ability of our methods to succeed with very coarse color information. As we show in [Huang], our data suggests that 8-color correlograms perform better
than 64-color histograms.
Unlike purely local properties, such as pixel position, gradient direction, or purely
global properties, such as color distribution, correlograms take into account the local
color spatial correlation as well as the global distribution of this spatial correlation.
While any scheme that is based on purely local properties is likely to be sensitive to
large appearance changes, (auto)correlograms are more stable to these changes; while any
scheme that is based on purely global properties is susceptible to false positive matches,
(auto)correlograms prove to be quite effective for content-based image retrieval from a
large image database.
In histogram refinement the pixels of a given bucket are
subdivided into classes based on local features. There are many possible features,
including texture, orientation, distance from the nearest edge, relative brightness, etc.
If we consider color as a random variable, then a color histogram approximates the
variable's distribution. Histogram refinement approximates the joint distribution of a
variety of local properties.
Histogram refinement prevents pixels in the same bucket from matching each other if they
do not fall into the same class. Pixels in the same class can be compared using any
standard method for comparing histogram buckets (such as the L1 distance). This allows
fine distinctions that cannot be made with color histograms.
For example, consider a joint histogram that combines color information with the intensity
gradient. A given pixel in an image has a color (in the discretized range 0 . . .
ncolors - 1 and an intensity gradient (in the discretized range 0 . . . ngradient
- 1). The joint histogram for color and intensity gradient will contain (ncolors
x ngradient) entries. Each entry corresponds to a particular color and a
particular intensity gradient. The value stored in this entry is the number of pixels in
the image with that color and intensity gradient.
More precisely, given a set of k features, we can construct a joint
histogram. A joint histogram is a k-dimensional vector, such that each entry in
the joint histogram contains the number of pixels in an image that are described by a k-tuple
of feature values. The size of the joint histogram is therefore the number of
possible combinations of the values of each feature. Just as a color histogram
approximates the density of pixel color, a joint histogram approximates the joint density
of several pixel features.
Joint histograms thus increase the dimensionality of the histogram space without changing
the capacity of each feature's individual histogram space. This preserves the robustness
of each feature, while increasing the capacity of the histogram space.
For our experiments, we have concentrated on "query by example", where the user specifies an image, and the system attempts to retrieve the most similar images from the database. We have used a large image collection of almost 250,000 images. Our collection contains the databases used by QBIC (1,440 images) and Chabot (11,667), as well as 200,000 frames from CNN taken one minute apart. We have identified by hand 52 pairs of images where there is a unique "right answer" in the database, and used these images as benchmarks. More specifically, these are image pairs where the same scene is shown from two rather different views.
On this database, our methods perform significantly better than color histograms. Some specific examples are given in figures 3 and 4, using both color correlograms and joint histograms.
![]() |
![]() |
Color histogram rank: 411; Autocorrelogram rank: 1 |
|
![]() |
![]() |
Color histogram rank: 310; Autocorrelogram rank: 5 |
|
![]() |
![]() |
Color histogram rank: 367; Autocorrelogram rank: 1 |
|
| Figure 3: Example query images and correct answers, and the rank of the correct answer using color histograms or autocorrelograms. Lower numbers indicate better performance. | |
![]() |
![]() |
| Color histogram rank: 308; Joint histogram rank: 2 | |
![]() |
![]() |
| Color histogram rank: 1896; Joint histogram rank: 3 | |
![]() |
![]() |
| Color histogram rank: 649; Joint histogram rank: 2 | |
| Figure 4: Example query images and correct answers, and the rank of the correct answer using color histograms or joint histograms. Lower numbers indicate better performance. | |
We have also performed a statistical analysis of this data; to save space,
we will only present these results for joint histograms (the results for correlograms are
quite similar). Most measures used by authors to evaluate retrieval performance, such as
precision [Salton], are dependent on the number of images in the
database. We believe that a retrieval performance
measure should be independent of the number of images. Typically a user is willing to
browse a certain number of the retrieval results by hand, similar to text-based search on
the web. This number is unlikely to change as the database fluctates in size, as it is
really a measure of human patience. We call this number the scope of the user. A
good performance measure should judge the retrieval method within a particular scope.
For the 52 queries, we ask what percent of the 52 answers were found within a particular scope. The percentage of correct answers is called the recall in the information retrieval literature [Salton]. These results are shown in figure 5 for scopes of 1 and 100. Note that joint histograms have a higher recall level at a scope of 1 than color histograms have for a scope of 100. Thus a user who was only willing to look at the top image returned using joint histograms would do better than a user willing to look at the top 100 images returned using color histograms.
| Algorithm | Recall at scope 1 | Recall at scope 100 |
| Color histograms | 2% | 40% |
| Joint histograms | 60% | 94% |
Figure 5: Scope versus recall results. Higher numbers indicate better performance. |
||
[Chabot95] Virginia Ogle and Michael Stonebraker. Chabot: Retrieval from a relational database of images. IEEE Computer, 28(9):40--48, September 1995.
[Hsu95] Wynne Hsu, T. S. Chua, and H. K. Pung. An integrated color-spatial approach to content-based image retrieval. In ACM Multimedia Conference, pages 305--313, 1995.
[Huang97] Jing Huang, S. Ravi Kumar, Mandar Mitra,
Wei-Jing Zhu, and Ramin Zabih. Image indexing using
color correlograms. In IEEE Conference on Computer Vision and Pattern
Recognition, pages 762--768, 1997.
[Huang97] Jing Huang, S. Ravi Kumar and Ramin Zabih. An Automatic Hierarchical Image Classification Scheme, ACM Multimedia Conference, pages 219--228,1998.
[Pass98] Greg Pass and Ramin Zabih. Comparing images using joint histograms. Journal of Multimedia Systems, 1998 (to appear).
[Photobook96] Alex Pentland, Rosalind Picard, and Stan Sclaroff. Photobook: Content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233--254, June 1996.
[QBIC95] Myron Flickner, Harpreet Sawhney, Wayne
Niblack, Jonathan Ashley, Qian Huang, Byron Dom, Monika Gorkani, Jim Hafner, Denis Lee,
Dragutin Petkovic, David
Steele, and Pater Yanker. Query by image and video content: The QBIC system. IEEE
Computer, 28(9):23--32, September 1995.
[Salton89] Gerard Salton. Automatic Text Processing.
Addison-Wesley, 1989.
[Smith96] J.R. Smith and S.-F. Chang. VisualSEEK: A
fully automated content-based image query system. In ACM Multimedia Conference,
pages 87--98, November 1996.
[Stricker96] Markus Stricker and Alexander Dimai. Color
indexing with weak spatial constraints. SPIE proceedings, 2670:29--40, February
1996.
[Upton85] Graham J.G. Upton and Bernard Fingleton. Spatial Data
Analysis by Example, volume I. John Wiley & Sons, 1985.