BigSFM: Reconstructing the World from Internet Photos

Our group is working on building a 3D model of the world from online photo collections, and our research spans several areas, from image features, to large-scale image matching, to structure-from-motion optimization, to applications such as location recognition. This page summarizes our work, has links to code and datasets we have made available, and has a description of each project.

Cornell Participants (and alumni): Noah Snavely (Cornell), David Crandall (Indiana University), Daniel Hauagge (Cornell),
Kyle Wilson (Cornell), Song Cao (Cornell), Yin Lou (Cornell), Yunpeng Li (EPFL), Andrew Owens (MIT),
Johannes Gehrke (Cornell) Dan Huttenlocher (Cornell)

Other Collaborators: Sameer Agarwal (University of Washington and Google), Brian Curless (UW),
Yasutaka Furukawa (Google), Steve Seitz (UW), Ian Simon (UW), Rick Szeliski (Microsoft Research)


We gratefully acknowledge the support of the following sponsors:

National Science Foundation Amazon Web Services in Education MIT Lincoln Labs Microsoft
National Science
Foundation
Amazon Web Services
in Education
MIT Lincoln Labs Microsoft IU Data to
Insight Center

Quick index: Videos, Code, Datasets, Projects

Sample Videos

Old City of Dubrovnik (57,845 photos) Old City of Dubrovnik (time-lapse of reconstruction)

The Colosseum, Rome Central Rome

Code

Code is available for a variety of problems:

Structure from Motion from Unordered Images: see the Bundler homepage, or visit the Bundler repository on GitHub.

Global Structure from Motion using 1DSfM: see the 1DSfM homepage, or visit the 1DSfM repository on GitHub.

Disambiguating SfM Models: see the sfm-disambig homepage, or visit the sfm-disambig repository on GitHub.

Structure from Motion using Discrete and Continuous Optimization: see the DISCO project page.

Local Symmetry Features: see the Local Symmetry Features project page.

Datasets

A variety of datasets are available for different problem domains:

Structure from Motion and Location Recognition / Pose Estimation, with features, tracks, and our 3D models
Quad6K (DISCO webpage)
(w/ ground truth camera positions)
Dubrovnik6K
(via the location
recognition project page)
Rome16K (with components for several dozen Roman landmarks)
(via the location
recognition project page)

The older NotreDame dataset
(1.7GB, 715 images)
Many more Internet photo collection
SFM datasets on the 1DSfM page


Structure from Motion Disambiguation: see the sfm-disambig project page for a range of SfM problems on datasets with symmetries and other repeated structures.

Bundle Adjustment: see the Bundle Adjustment in the Large project page for a large set of bundle adjustment problems.

Feature Detection and Matching: see the Local Symmetry Features project page for the SymBench dataset.

Large-Scale Image Matching: see the Learning to Match page for the Trafalgar, LondonEye, TateModern, SanMarco, and TimesSquare datasets.

Projects

Robust Global Translations with 1DSfM

Kyle Wilson and Noah Snavely
ECCV 2014

[ project page, pdf ]

1DSfM
Synopsis: We present a simple, effective method for solving structure from motion problems by averaging epipolar geometries. Based on recent successes in solving for global camera rotations using averaging schemes, we focus on the problem of solving for 3D camera translations given a network of noisy pairwise camera translation directions (or 3D point observations). To do this well, we have two main insights. First, we propose a method for removing outliers from problem instances by solving simpler low-dimensional subproblems, which we refer to as 1DSfM problems. Second, we present a simple, principled averaging scheme. We demonstrate this new method in the wild on Internet photo collections.

Network Principles for SfM: Disambiguating Repeated Structures with Local Context

Kyle Wilson and Noah Snavely
ICCV 2013

[ project page, pdf ]

1DSfM
Synopsis: Repeated features are common in urban scenes. Many objects, such as clock towers with nearly identical sides, or domes with strong radial symmetries, pose challenges for structure from motion. When similar but distinct features are mistakenly equated, the resulting 3D reconstructions can have errors ranging from phantom walls and superimposed structures to a complete failure to reconstruct. We present a new approach to solving such local visibility structure of such repeated features. Drawing upon network theory, we present a new way of scoring features using a measure of local clustering. Our model leads to a simple, fast, and highly scalable technique for disambiguating repeated features based on an analysis of an underlying visibility graph, without relying on explicit geometric reasoning. We demonstrate our method on several very large datasets drawn from Internet photo collections, and compare it to a more traditional geometry-based disambiguation technique.

World-Scale Pose Estimation using 3D Point Clouds

Yunpeng Li, Noah Snavely, Dan Huttenlocher, Pascal Fua
ECCV 2012

[ project page, pdf ]

Worldwide Pose Estimation
Synopsis: We address the problem of determining where a photo was taken by estimating a full 6-DOF-plus-intrincs camera pose with respect to a large geo-registered 3D point cloud, bringing together research on image localization, landmark recognition, and 3D pose estimation. Our method scales to datasets with hundreds of thousands of images and tens of millions of 3D points through the use of two new techniques: a co-occurrence prior for RANSAC and bidirectional matching of image features with 3D points. We evaluate our method on several large data sets, and show state-of-the-art results on landmark recognition as well as the ability to locate cameras to within meters, requiring only seconds per query.

MatchMiner: Efficiently Finding Connected Components in Large Image Collections

Yin Lou, Noah Snavely, Johannes Gehrke
ECCV 2012

[ project page, pdf ]

Synopsis: Many computer vision applications are utilizing large-scale datasets of places derived from the many billions of photos on the Web. Such applications often require knowledge of the visual connectivity structure of these image collections--describing which images overlap or are otherwise related--and an important step in understanding this structure is to identify connected components of this underlying image graph. As the structure of this graph is often initially unknown, this problem can be posed as one of exploring the connectivity between images as quickly as possible, by intelligently selecting a subset of image pairs for feature matching and geometric verification. We propose a novel, scalable algorithm called MatchMiner that efficiently explores visual relations between images, incorporating ideas from relevance feedback to improve decision making over time, as well as a simple yet effective {\em rank distance} measure for detecting outlier images. Using these ideas, our algorithm automatically prioritizes image pairs that can potentially connect or contribute to large connected components, using an information-theoretic algorithm to decide which image pairs to test next.

Image Matching using Local Symmetry Features

Daniel Cabrini Hauagge, Noah Snavely
CVPR 2012

[ project page (with code and data), pdf ]

Local Symmetry Features
Synopsis: We present a new technique for extracting local features from images of architectural scenes, based on detecting and representing local symmetries. These features are motivated by the fact that local symmetries, at different scales, are a fundamental characteristic of many urban images, and are potentially more invariant to large appearance changes than lower-level features such as SIFT. Our features are based on simple measures of local bilateral and rotational symmetries, used both for feature detection and for computing descriptors. We demonstrate our method on a challenging new dataset containing image pairs exhibiting a range of dramatic variations in lighting, age, and rendering style, and show that our features can improve matching performance for this difficult task.

SfM with MRFs: Discrete-Continuous Optimization for Large-Scale Structure from Motion

David Crandall, Andrew Owens, Noah Snavely, Daniel P. Huttenlocher
CVPR 2011

[ project page (with code and data), pdf ]

SfM with MRFs
Synopsis: We present an new approach to structure from motion (SfM) based on combining discrete and continuous optimization. SfM involves a complex objective function that is difficult to initialize. Our technique first uses powerful discrete optimization techniques (loopy belief propagation) to get a coarse initial solution, then refines this solution using bundle adjustment. We use this method to produce models that are similar to or better than those produced with standard techniques, but more robustly and in a fraction of the time.

Bundle Adjustment in the Large

Sameer Agarwal, Noah Snavely, Steven M. Seitz, Richard Szeliski
ECCV 2010

[ project page (with data), pdf ]

Bundle Adjustment in the Large
Synopsis: Recent work in structure from motion has demonstrated the possibility of reconstructing geometry from large-scale community photo collections. Bundle adjustment, the joint non-linear refinement of camera and point parameters, is a key component of most SfM systems, and one which can consume a significant amount of time for large problems. We consider the design and implementation of a new Inexact Newton type bundle adjustment algorithm, which uses substantially less time and memory than standard Schur complement based methods, without compromising on the quality of the solution. We explore the use of the Conjugate Gradients algorithm for calculating the Newton step and its performance as a function of some simple and computationally efficient preconditioners.

Building Rome in a Day

Participants: Sameer Agarwal, Noah Snavely, Ian Simon, Yasutaka Furukawa, Steven M. Seitz, Richard Szeliski

[ project page ]


Synopsis: Entering the search term Rome on Flickr returns more than two million photographs. This collection represents an increasingly complete photographic record of the city, capturing every popular site, facade, interior, fountain, sculpture, painting, cafe, and so forth. It also offers us an unprecedented opportunity to richly capture, explore and study the three dimensional shape of the city. We consider the problem of reconstructing entire cities from images harvested from the web. Our aim is to build a parallel distributed system that can match hundreds of thousands of images of a city to find common points, then use this information to compute the 3D structure of the city. All this to be done in a day.

Papers:

Building Rome in a Day [pdf]
Sameer Agarwal, Noah Snavely, Ian Simon, Steven M. Seitz, and Richard Szeliski
ICCV 2009

Reconstructing Rome [pdf]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Brian Curless, Steven M. Seitz, and Richard Szeliski
IEEE Computer, pp. 40-47, June, 2010 (Cover feature)

Building Rome in a Day [CACM page]
Sameer Agarwal, Yasutaka Furukawa, Noah Snavely, Ian Simon, Brian Curless, Steven M. Seitz, and Richard Szeliski
Communications of the ACM, Vol. 54, No. 10, Pages 105-112, October 2011.
with a Technical Perspective by Prof. Carlo Tomasi