CS 664 Computer Vision – Fall 2003
Lectures TR
Professor: Dan Huttenlocher
346 Sage (4114 Upson)
Office Hours, Wednesday
dph "at" cs.cornell.edu
TA: David Crandall
4121 Upson
Office Hours, Monday 10:15-11:30am
crandall "at"
cs.cornell.edu
Course materials:
Brief overview:
This course is intended for
graduate students and advanced undergraduates who are interested in processing
image and video data, in order to extract information about the scene that is
being imaged. There is no textbook for the course. Handouts and papers will be made
available online. A recommended
text is Forsyth and
The course has a more
algorithmic flavor than many introductory computer vision courses. We will focus on efficient algorithms,
precise problem definitions and methods that work well in practice.
We use material from various
areas of algorithms and mathematics as well as requiring programming
assignments, but this course does not teach algorithms, mathematics or
programming. Thus we expect that students have good programming skills (using C
or C++), a good mathematics background, and a knowledge
of algorithms. Students will be expected to pick up new mathematical and
algorithmic techniques during the semester, as covered in lecture, and to
relate the concepts from lecture to the programming assignments.
Here is an outline of the
topics to be covered, and the anticipated order of topics (each topic is 1-2
weeks):
·
Image
matching: fast detection algorithms, distance transforms, template matching,
chamfer distance, Hausdorff distance, learning
templates, subspace methods, template trees.
·
Matching
multi-part models: flexible templates, pictorial structures, global versus
local methods, finding people, faces, hands.
·
Uniform
local image operations: smoothing (low pass filtering), edge detection, feature
detection (e.g., corners), oriented filters, multi-scale representations.
·
Local
motion estimation: optical flow, parametric motion, robust statistical measures
for layered motion estimation.
·
Image
segmentation: perceptual grouping, saliency, local and non-local algorithms,
graph-based and spectral methods.
·
3D
structure from 2D images: stereo, structure-from-motion (SFM), multi-baseline
stereo, imaging geometry, fundamental matrix.
·
Deformable
models (snakes).
·
Tracking
objects over time: tracking as matching, deformable objects (hands, bodies).
·
3D
and 2.5D object recognition and matching: pose estimation, geometric
invariants, parameter hashing schemes.
Course Requirements:
There will be two assignments
and a final project. Each of these will require programming, testing with image
or video data, and a well thought-out write-up explaining what was done, what
was learned and why. The programming assignments will be done individually, but
the final project should be done in teams of 2 or 3 students. The scope
of each final project will depend on the number of students working together.
The programming assignments
and project require prior experience with C/C++ on a Unix
or Windows platform. (This class will not cover how to use a C development
environment to complete the assignments.)