Special Topics in Computer Vision, CS7670, Fall 2011

Special Topics in Computer Vision
CS7670, Fall 2011, Cornell University

Time: Tu/Th 2:55pm - 4:10pm
Place: Upson 315

Instructor: Noah Snavely (snavely@cs.cornell.edu)
Office: Upson 4157
Office Hours: TBA

In the past decade computer vision has made incredible progress across the board, in geometry, recognition, image processing, and other areas. In this graduate seminar in computer vision, we will survey and discuss state-of-the-art research papers in this quickly moving field, with a focus on 3D geometry estimation, image matching and retrieval, use of the Internet to gather and annotate data, and scene understanding. This will draw on papers from both computer vision and computer graphics venues.

Prerequisites
Students are expected to have a working knowledge of computer vision at the level of CS6670 (Computer Vision) or equivalent, and should be willing and able to understand and analyze recent conference papers in this area. If you are unsure if this course is right for you, please come talk to me. Perusing a few papers on the syllabus is a good way to gauge what kind of background is necessary. This course is expected to be interactive, relevant to the latest research, and (most of all), fun. Please send me email or speak to me if you are unsure of whether you can take the course.

Preliminary Schedule

td>

Date	Topics	Papers and links	Presenters	Items due
Aug 26	Course intro	handout		Topic preferences due via CMS by Tuesday August 30
Aug 30	No class -- instructor out of town
Sep 1	No class -- instructor out of town
Sep 6	Object Detection and Exemplars	* Ensemble of Exemplar-SVMs for Object Detection and Beyond. Malisiewicz, Gupta, Efros, ICCV 2011. [pdf,code,www] Recognition by association via learning per-exemplar distances. Malisiewicz and Efros, CVPR 2008. [pdf,www] Beyond Categories: The Visual Memex Model for Reasoning About Object Relationships. Malisiewicz and Efros, NIPS 2009. [pdf, www] An exemplar model for learning object classes. Chum and Zisserman, CVPR 2007. [pdf]	Noah [ppt,pdf]
Sep 8	Saliency	* Learning to Predict Where Humans Look. T. Judd, K. Ehinger, F. Durand, A. Torralba. ICCV 2009. [pdf, www]	Noah [ppt,pdf]
I. 3D Geometry
Sep 13	Multi-view Stereo	* Reconstructing Building Interiors from Images. Furukawa, Curless, Seitz, Szeliski [pdf, www] * Piecewise Planar and Non-Planar Stereo for Urban Scene Reconstruction. Gallup, Frahm, Pollefeys. CVPR 2010. [pdf, www, wmv] Manhattan-World Stereo. Furukawa, Curless, Seitz, Szeliski, CVPR 2009. [pdf, www] Piecewise planar stereo for image-based rendering. Sinha, Steedly, Szeliski, ICCV 2009. [pdf, www]	Ivo [pdf]
Sep 15	User-Assisted 3D Reconstruction	* Interactive 3D Architectural Modeling from Unordered Photo Collections. Sinha, Steedly, Szeliski, Agrawala, Pollefeys, SIGGRAPH Asia 2008. [pdf, www] Active Learning for Piecewise Planar 3D Reconstruction. Kowdle, Chang, Gallagher, Chen, CVPR 2011. [pdf, www] 3D Modeling with Silhouettes. Rivers, Durand, Igarashi. SIGGRAPH 2010. [pdf, www].	Adarsh [pptx, pdf]
Sep 20	New 3D Sensors	* Real-Time Human Pose Recognition in Parts from Single Depth Images. Shotton, et al, CVPR 2011. [pdf] RGB-D Mapping: Using depth cameras for dense 3D modeling of indoor environments. Henry, Krainin, Herbst, Ren, Fox. ISER 2010. [pdf, www] Autonomous Generation of Complete 3D Object Models Using Next Best View Manipulation Planning. Krainin, Curless, Fox, ICRA 2011. [pdf] Kernel Descriptors for Visual Recognition, Bo et al., NIPS 2010. [pdf] A Large-Scale Hierarchical Multi-View RGB-D Object Dataset, Lai et al., ICRA 2011. [pdf,www]	Zhaoyin [pdf]	Project proposals due
Sep 22	Structure from motion	* Semantic Structure from Motion. Bao and Savarese, CVPR 2011. [pdf, www] Building Rome in a Day. Agarwal, Snavely, Simon, Seitz, Szeliski. ICCV 2009. [pdf, www, code] Building Rome on a Cloudless Day. Frahm, Georgel, Gallup, Johnson, Raguram, Wu, Jen, Dunn, Clipp, Lazebnik, Pollefeys [pdf, www, code] Disambiguating Visual Relations Using Loop Constraints. Zach, Klopschitz, Pollefeys, CVPR 2010. [pdf]	Ian [ppt, pdf]
II. Computational Photography
Sep 27	Computational Photography Intrinsic Images and White Balance	* Light Mixture Estimation for Spatially Varying White Balance. Hsu, Mertens, Paris, Avidan, Durand. SIGGRAPH 2008. [pdf, www] * User Assisted Intrinsic Images. Bousseau, Paris, Durand, SIGGRAPH Asia 2009. [pdf, www]	Daniel, Ivo [pdf (Daniel), pdf (Ivo)]
Sep 29	Computational Photography Fun with Light Transport	* Dual Photography. Sen, Chen, Garg, Marschner, Horowitz, Levoy, Lensch. SIGGRAPH 2005. [pdf, www] Optical Computing for Fast Light Transport Analysis. O'Toole and Kutulakos, SIGGRAPH Asia 2010. [pdf, www] Compressive Light Transport Sensing. Peers, Mahajan, Lamond, Ghosh, Matusik, Ramamoorthi, Debevec, TOG 2009. [pdf, www] Wavelet Environment Matting. Peers, Dutre. EGSR 2003. [pdf, www] Symmetric Photography: Exploiting Data-sparseness in Reflectance Fields. Garg, Talvala, Levoy, Lensch, EGSR 2006. [pdf, www]	Kevin [pdf]
Oct 4	Illumination	* Estimating Natural Illumination from a Single Outdoor Image. Lalonde, Efros, Narasimhan, ICCV 2009. [pdf, www] Detecting Ground Shadows in Outdoor Consumer Photographs. Lalonde, Efros, Narasimhan. ECCV 2010. [pdf, www] Single-Image Shadow Detection and Removal using Paired Regions. Guo, Dai, Hoiem, CVPR 2011. [pdf, www]	Chun-Po [ppt, pdf]
III. Image Matching and Retrieval
Oct 6	Large-Scale Image Collections	* Small codes and large databases for recognition. Torralba, Fergus, Weiss, CVPR 2008. [pdf, www] * ImageNet: A Large-Scale Hierarchical Image Database. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, CVPR 2009. [pdf, www] Nonparametric scene parsing: Label transfer via dense scene alignment. C. Liu, J. Yuen and A. Torralba. CVPR, 2009. [pdf, www] 80 million tiny images: a large dataset for non-parametric object and scene recognition. Torralba, Fergus, Freeman. PAMI 2008. [pdf] Attribute Learning in Large-scale Datasets. O. Russakovsky and L. Fei-Fei, Proc. ECCV Workshop on Parts and Attributes, 2010. [pdf] What does classifying more than 10,000 image categories tell us? J. Deng, A. Berg, K. Li and L. Fei-Fei, ECCV 2010. [pdf]	Henry and Yimeng [pdf]
Oct 11	Fall break -- no classes	-	-	-
Oct 13	Image Representations	* What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms. Kulis, Saenko, Darrell, CVPR 2011. [pdf] * Informative Feature Selection for Object Recognition via Sparse PCA. Naikal, Yang, Sastry, ICCV 2011. [pdf, www] Image Retrieval with Geometry Preserving Visual Phrases. Zhang, Jia, Chen, CVPR 2011. [pdf] Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories. Lazebnik, Schmid, Ponce, CVPR 2006. [pdf, code, slides] Video Google: A Text Retrieval Approach to Object Matching in Videos. Sivic and Zisserman, ICCV 2003. [pdf, demo] Scalable Recognition with a Vocabulary Tree. Nister and Stewenius, CVPR 2006. [pdf, slides]	Song [pptx]
Oct 18	Instructor out of town -- no class	-	-	-
Oct 20	Guest Lecture -- Andy Gallagher (Kodak), Dhruv Batra (TTI)
Oct 25	Image Representations (Sparse Coding)	* Linear Spatial Pyramid Matching Using Sparse Coding for Image Classification. Yang, Yu, Gong, Huang, CVPR 2009. [pdf, www] * Locality-constrained Linear Coding for Image Classification. Wang, Yang, Yu, Lv, Huang, Gong. CVPR 2010. [pdf, www]	Ruogu
Oct 27	Feature Detection and Matching	* Edge Foci Interest Points. Zitnick and Ramnath, ICCV 2011. [pdf, www] Boundary-Preserving Dense Local Regions. Kim and Grauman, CVPR 2011. [pdf, www] LDAHash: Improved Matching with Smaller Descriptors. Strecha, Bronstein, Bronstein, Fua, PAMI Submission. [pdf, code, www] Object Recognition from Local Scale-Invariant Features. Lowe, IJCV 2004. [pdf, code, other implementations of SIFT] Local Invariant Feature Detectors: A Survey. Tuytelaars and Mikolajczyk. Foundations and Trends in Computer Graphics and Vision, 2008. [pdf] [Oxford code] [Read pp. 178-188, 216-220, 254-255] SURF: Speeded Up Robust Features. Bay, Ess, Tuytelaars, and Van Gool, CVIU 2008. [pdf] [code] Robust Wide Baseline Stereo from Maximally Stable Extremal Regions. J. Matas, O. Chum, U. Martin, and T. Pajdla, BMVC 2002. [pdf] A Performance Evaluation of Local Descriptors. Mikolajczyk and Schmid, CVPR 2003. [pdf] Oxford group interest point software Andrea Vedaldi's code, including SIFT, MSER, hierarchical k-means. INRIA LEAR team's software, including interest points, shape features	Daniel	Project updates due Friday
Nov 1	Machine Learning for Image Matching	* Fast Keypoint Recognition using Random Ferns. Özuysal, Calonder, Lepetit, Fua, PAMI, March 2010. [pdf, www] * Decision Tree Fields. Nowozin, Rother, Bagon, Yao, Sharp, Kohli, ICCV 2011. [pdf] Descriptor Learning for Efficient Retrieval. Philbin , Isard, Sivic, Zisserman. ECCV 2010. [pdf] Learning a Fine Vocabulary. Mikulık, Perdoch, Chum, Matas. ECCV 2010. [pdf]	Ian, Song
IV: Object Recognition and Scene Understanding
Nov 3	Geometric Context	* Closing the Loop on Scene Interpretation. Hoiem, Efros, and Hebert, CVPR 2008. * Recovering Occlusion Boundaries from a Single Image. Hoiem, Stein, Efros, and Hebert. [pdf, www, code] Recovering Surface Layout from a Single Image. Hoiem, Efros, and Hebert. [pdf, code] Thinking Inside the Box: Using Appearance Models and Context Based on Room Geometry. Hedau, Hoiem, Forsyth, ECCV 2010. [pdf] Recovering the Spatial Layout of Cluttered Rooms. Hedau, Hoiem, Forsyth, ICCV 2009. [pdf, code, www] Segmenting Scenes by Matching Image Composites. Russell, Efros, Sivic, Freeman, Zisserman, NIPS 2009. [pdf, www] Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories. Su, Sun, Li, Savarese, ICCV 2009. [pdf]	Zhaoyin, Adarsh
Nov 8	Attributes	* Describing Objects by their Attributes. Farhadi, Endres, Hoiem, Forsyth, CVPR 2009. [pdf, www] * Relative Attributes. Parikh and Grauman, ICCV 2011. [pdf, www] Attribute-Centric Recognition for Cross-Category Generalization. Farhadi, Endres, Hoiem, CVPR 2010. [pdf]	Amir, Ruogu
Nov 10	Materials	* Inferring Reflectance under Real-world Illumination. Romeiro, Zickler, IJCV. [pdf] * Exploring features in a Bayesian framework for material recognition. Liu, Sharan, Adelson, Rosenholtz, CVPR 2010. [pdf, www] What An Image Reveals About Material Reflectance. Chandraker, Ramamoorthi, ICCV 2011. [pdf]	Kevin, Chun-Po
Nov 15	No class	-	-	-
Nov 17	No class	-	-	-
Nov 22	No class	-	-	-
Nov 24	Thanksgiving -- no classes	-	-	-
Nov 29	Event Recognition from Videos	* Learning realistic human actions from movies. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. In CVPR 2008. [pdf,www] Activity recognition using the velocity histories of tracked keypoints. R. Messing, C. Pal, and H. A. Kautz. ICCV 2009. [pdf,www] Behavior recognition via sparse spatio-temporal features. P. Dollar, V. Rabaud, G. Cottrell, and S. J. Belongie. PETS Workshop, 2005. [pdf] A “string of feature graphs” model for recognition of complex activities in natural videos. U. Gaur, Y. Zhu, B. Song, and A. Roy-Chowdhury. ICCV 2011. [pdf]	Yimeng [pdf]
Dec 1	Image-to-text and Recognition in social context	* Seeing People in Social Context: Recognizing People and Social Relationships. Wang, Gallagher, Luo, and Forsyth. ECCV 2010. [pdf] * Baby Talk: Understanding and Generating Simple Image Descriptions. Kulkarni, Premraj, Dhar, Li, Choi, Berg, and Berg. CVPR 2011. [pdf] Autotagging Facebook: Social Network Context Improves Photo Annotation. Stone, Zickler, Darrell. Workshop on Internet Vision. [pdf] Understanding Images of Groups of People. Gallagher and Chen. CVPR 2009. [pdf] Estimating Age, Gender and Identity using First Name Priors. Gallagher and Chen, CVPR 2008. [www]	Amir and Henry	-

Dec 8				Final presentations

Course Resources
TBA.

Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit must be the student's own work. Violations of the rules will not be tolerated.