This collection of datasets accompany the paper "Learning to Match Images in Large-Scale Collections" in ECCV 2012, Workshop on Web-scale Vision and Social Media, by Song Cao and Noah Snavely. The project page is located here: http://www.cs.cornell.edu/projects/matchlearn/ In each dataset, list_url.txt ---- list of Flickr URLs of images contained list_keys.txt ---- list of SIFT key files (in the same order as list_url.txt) gt.txt ---- image IDs (0 based) that match each other (ground truth) vectors.txt ---- Bag-of-Words vectors of each image (L2 normalized) in the same order as list_url.txt and list_keys.txt. images ---- directory contains all .key.gz files (compressed version of .key files) Image IDs are defined as the line number (0 based) in list_url.txt or list_keys.txt (e.g. the first image has ID 0, the second has ID 1, etc). Note that the ground truth (gt.txt) is computed only considering the top 500 most similar images based on raw Bag-of-Words similarity measured by the dot product of BoW vectors. The minimum inlier number in RANSAC precedure is set to 12, so that each matching pair has at least 12 inliers. The key files are generated using David Lowe's SIFT code (http://www.cs.ubc.ca/~lowe/keypoints/). For convenience, the following excerpt from his README file describes the format of the key files. "The file format starts with 2 integers giving the total number of keypoints and the length of the descriptor vector for each keypoint (128). Then the location of each keypoint in the image is specified by 4 floating point numbers giving subpixel row and column location, scale, and orientation (in radians from -PI to PI). Obviously, these numbers are not invariant to viewpoint, but can be used in later stages of processing to check for geometric consistency among matches. Finally, the invariant descriptor vector for the keypoint is given as a list of 128 integers in range [0,255]. Keypoints from a new image can be matched to those from previous images by simply looking for the descriptor vector with closest Euclidean distance among all vectors from previous images." In the vector file (vectors.txt), the first line contains the number of images in the file and the maximum dimension of the vector, i.e. the vocabulary size in BoW model. Then, each line is the concatenation of pairs of the corresponding image. The images in this file is in the same order as "list_url.txt", as well as "list_keys.txt".