CS 664 Computer Vision – Fall 2003
Cornell University
 

Lectures TR 1:25-2:40, 245 Olin

Professor: Dan Huttenlocher
346 Sage (4114 Upson)
Office Hours, Wednesday
2-3pm
dph "at" cs.cornell.edu

TA: David Crandall
4121 Upson
Office Hours, Monday 10:15-11:30am
crandall "at" cs.cornell.edu

Course materials:

Brief overview:

This course is intended for graduate students and advanced undergraduates who are interested in processing image and video data, in order to extract information about the scene that is being imaged.  There is no textbook for the course.  Handouts and papers will be made available online.  A recommended text is Forsyth and Ponce’s book “Computer Vision:  A Modern Approach”, but it covers topics that we won’t and vice versa.

The course has a more algorithmic flavor than many introductory computer vision courses.  We will focus on efficient algorithms, precise problem definitions and methods that work well in practice. 

We use material from various areas of algorithms and mathematics as well as requiring programming assignments, but this course does not teach algorithms, mathematics or programming. Thus we expect that students have good programming skills (using C or C++), a good mathematics background, and a knowledge of algorithms. Students will be expected to pick up new mathematical and algorithmic techniques during the semester, as covered in lecture, and to relate the concepts from lecture to the programming assignments.

Here is an outline of the topics to be covered, and the anticipated order of topics (each topic is 1-2 weeks):

·        Image matching: fast detection algorithms, distance transforms, template matching, chamfer distance, Hausdorff distance, learning templates, subspace methods, template trees.

·        Matching multi-part models: flexible templates, pictorial structures, global versus local methods, finding people, faces, hands.

·        Uniform local image operations: smoothing (low pass filtering), edge detection, feature detection (e.g., corners), oriented filters, multi-scale representations.

·        Local motion estimation: optical flow, parametric motion, robust statistical measures for layered motion estimation.

·        Image segmentation: perceptual grouping, saliency, local and non-local algorithms, graph-based and spectral methods.

·        3D structure from 2D images: stereo, structure-from-motion (SFM), multi-baseline stereo, imaging geometry, fundamental matrix.

·        Deformable models (snakes).

·        Tracking objects over time: tracking as matching, deformable objects (hands, bodies).

·        3D and 2.5D object recognition and matching: pose estimation, geometric invariants, parameter hashing schemes.

Course Requirements:

There will be two assignments and a final project. Each of these will require programming, testing with image or video data, and a well thought-out write-up explaining what was done, what was learned and why. The programming assignments will be done individually, but the final project should be done in teams of 2 or 3 students.  The scope of each final project will depend on the number of students working together.

The programming assignments and project require prior experience with C/C++ on a Unix or Windows platform. (This class will not cover how to use a C development environment to complete the assignments.)