This collection of datasets accompany the paper "Learning to Match
Images in Large-Scale Collections" in ECCV 2012, Workshop on Web-scale
Vision and Social Media, by Song Cao and Noah Snavely.

The project page is located here:
http://www.cs.cornell.edu/projects/matchlearn/

In each dataset,

list_url.txt ---- list of Flickr URLs of images contained
list_keys.txt ---- list of SIFT key files (in the same order as list_url.txt)
gt.txt ---- image IDs (0 based) that match each other (ground truth)
vectors.txt ---- Bag-of-Words vectors of each image (L2 normalized) in
                 the same order as list_url.txt and list_keys.txt.
images ---- directory contains all .key.gz files (compressed version of .key files)

Image IDs are defined as the line number (0 based) in list_url.txt or
list_keys.txt (e.g. the first image has ID 0, the second has ID 1, etc).
Note that the ground truth (gt.txt) is computed only considering the top 500
most similar images based on raw Bag-of-Words similarity measured by
the dot product of BoW vectors. The minimum inlier number in RANSAC
precedure is set to 12, so that each matching pair has at least 12
inliers. The key files are generated using David Lowe's SIFT code
(http://www.cs.ubc.ca/~lowe/keypoints/). For convenience, the
following excerpt from his README file describes the format of the key
files.

"The file format starts with 2 integers giving the total number of
keypoints and the length of the descriptor vector for each keypoint
(128). Then the location of each keypoint in the image is specified by
4 floating point numbers giving subpixel row and column location,
scale, and orientation (in radians from -PI to PI).  Obviously, these
numbers are not invariant to viewpoint, but can be used in later
stages of processing to check for geometric consistency among matches.
Finally, the invariant descriptor vector for the keypoint is given as
a list of 128 integers in range [0,255].  Keypoints from a new image
can be matched to those from previous images by simply looking for the
descriptor vector with closest Euclidean distance among all vectors
from previous images."

In the vector file (vectors.txt), the first line contains the number
of images in the file and the maximum dimension of the vector,
i.e. the vocabulary size in BoW model. Then, each line is the
concatenation of <dimension value> pairs of the corresponding
image. The images in this file is in the same order as "list_url.txt",
as well as "list_keys.txt".