CS 664 Computer Vision – Fall 2003
Lectures TR , 245 Olin
Professor: Dan Huttenlocher
346 Sage (4114 Upson)
Office Hours, Wednesday
dph "at" cs.cornell.edu
TA: David Crandall
Office Hours, Monday 10:15-11:30am
crandall "at" cs.cornell.edu
This course is intended for
graduate students and advanced undergraduates who are interested in processing
image and video data, in order to extract information about the scene that is
being imaged. There is no textbook for the course. Handouts and papers will be made
available online. A recommended
text is Forsyth and
The course has a more algorithmic flavor than many introductory computer vision courses. We will focus on efficient algorithms, precise problem definitions and methods that work well in practice.
We use material from various areas of algorithms and mathematics as well as requiring programming assignments, but this course does not teach algorithms, mathematics or programming. Thus we expect that students have good programming skills (using C or C++), a good mathematics background, and a knowledge of algorithms. Students will be expected to pick up new mathematical and algorithmic techniques during the semester, as covered in lecture, and to relate the concepts from lecture to the programming assignments.
Here is an outline of the topics to be covered, and the anticipated order of topics (each topic is 1-2 weeks):
· Image matching: fast detection algorithms, distance transforms, template matching, chamfer distance, Hausdorff distance, learning templates, subspace methods, template trees.
· Matching multi-part models: flexible templates, pictorial structures, global versus local methods, finding people, faces, hands.
· Uniform local image operations: smoothing (low pass filtering), edge detection, feature detection (e.g., corners), oriented filters, multi-scale representations.
· Local motion estimation: optical flow, parametric motion, robust statistical measures for layered motion estimation.
· Image segmentation: perceptual grouping, saliency, local and non-local algorithms, graph-based and spectral methods.
· 3D structure from 2D images: stereo, structure-from-motion (SFM), multi-baseline stereo, imaging geometry, fundamental matrix.
· Deformable models (snakes).
· Tracking objects over time: tracking as matching, deformable objects (hands, bodies).
· 3D and 2.5D object recognition and matching: pose estimation, geometric invariants, parameter hashing schemes.
There will be two assignments and a final project. Each of these will require programming, testing with image or video data, and a well thought-out write-up explaining what was done, what was learned and why. The programming assignments will be done individually, but the final project should be done in teams of 2 or 3 students. The scope of each final project will depend on the number of students working together.
The programming assignments and project require prior experience with C/C++ on a Unix or Windows platform. (This class will not cover how to use a C development environment to complete the assignments.)