Minimal Scene Descriptions from Structure from Motion Models

How much data do we need to describe a location? We explore this question in the context of 3D scene reconstructions created from running structure from motion on large Internet photo collections, where reconstructions can contain many millions of 3D points. We consider several methods for computing much more compact representations of such reconstructions for the task of location recognition, with the goal of maintaining good performance with very small models. In particular, we introduce a new method for computing compact models that takes into account both image-point relationships and feature distinctiveness, and we show that this method produces small models that yield better recognition performance than previous model reduction techniques.

This work was funded in part by grants from the National Science Foundation (IIS-0964027, IIS-1149393, and IIS-1111534), and by support from the Intel Science and Technology Center for Visual Computing.

Paper (PDF, 326 KB)
Poster (PDF, 1.3 MB)
(Also presented at Scene Understanding Workshop - CVPR 2014, and won Best Poster Award.)
C++ implementation can be found on GitHub here.

Minimal Scene Descriptions from Structure from Motion Models

Abstract

Paper - CVPR 2014

Acknowledgements