Computer Vision (CS6670), Fall 2009
Final Project
Assigned: Tuesday, October 20
Proposal Due: Wednesday,
October 28 by 11:59pm (via CMS), noon
Status Report: Tuesday,
November 24 by 11:59pm
Final Presentation Slides
Due: Wednesday, December 16, 10am
Final Presentation: Wednesday, December 16, 2-4:45pm, Upson 315
Writeup/Code Due: Thursday, December 17,
11:59pm
Synopsis
There are two options for this project, which will take roughly four weeks.
A.
Working alone, implement a state-of-the-art
research paper from a recent computer vision conference or journal.
B.
Working in a group of two or three, complete a
short research project. You can devise your own project from scratch, or
use one of the ideas suggested below.
In either case, the purpose is (1) to learn more about a subfield of
computer vision, and (2) to get a feel for doing research in this field.
Guidelines
If you choose Option A, you should start by searching through recent
computer vision conference proceedings or journal articles, and choosing a
paper that interests you. The premiere vision conferences are CVPR, ICCV,
and ECCV. The
premiere journals are IJCV
and PAMI.
Click on the links to access the online articles (need to be on a Cornell
machine/proxy to download PDF). We recommend starting with the most
recent years, i.e., CVPR 2009, ICCV 2009, and ECCV 2008. Most of the
papers and project web sites are linked online from this nice webpage.
Alternatively, you may find it more convenient to leaf through the proceedings
or transactions in hand at the Cornell Engineering Library.
You should select a paper that is appropriate for a five-week project.
I.e., it should be more involved than one of the class projects (~ double the
effort). In most cases, the expectation is that you will implement the
method yourself rather than use any code that the authors make available.
If you do plan to use existing code, make sure and clearly explain this in your
proposal and justify its use.
If you choose Option B, you need to undertake a research project
with some novelty. What constitutes a "research
project?" Well, it must be new, something that no one has
published before. Naturally we're not expecting PhD-level
research in this amount of time. Following, are some examples of
what we have in mind. These are listed roughly in order of least
to most ambitious:
- An experimental evaluation.
Implement one or more existing algorithms and design an in-depth
experimental evaluation and comparison that goes beyond what was described
in the paper(s). Identify the relative strengths and
weaknesses and include this in your report.
- An interesting extension
of prior work. In most cases, we'd recommend implementing the prior
method yourself, rather than downloading implementations available online,
as this gives you a better understanding of how the method works (and you
can avoid mucking around with some one else's code). But this is not
a hard and fast rule--if the extension is very significant you may use
available code.
- A new application of
prior work. Apply a known technique to a new application domain, and
evaluate its performance.
- Develop a new solution (hopefully
better!) to an existing problem. If you chose this option,
you have to figure out the solution by the time you submit the proposal,
to convince us that your method will work.
- Pose a new technical
problem and solve it. Identify a new problem for which no
known solution exists, devise a solution, and implement/test it. If
you chose this option, you have to figure out both the problem and
solution by the time you submit the proposal.
How ambitious/difficult should your project be? Each team member
should count on committing at least twice the work as one of the previous class
projects.
Requirements
Proposal
Each team will turn in a one-page proposal describing their project.
It should specify:
- Your team members
- Project goals. Be
specific. Describe the input and output.
- Brief description of your
approach. If you are implementing or extending a previous method,
give the reference and web link to the paper.
- Will you be using helper code
(e.g., available online) or will you implement it all yourself?
- Evaluation method. How
will you test it? Which test cases will you use?
- Breakdown--what will each
team-member do? Ideally, everyone should do something imaging/vision
related (it's not good for one team member to focus purely on
user-interface, for instance).
- Special equipment that will
be needed. We may be able to help with cameras, tripods, etc.
Each team must submit a proposal, even if you choose one of the research
ideas described below.
Turn in the proposal
via CMS
by Wednesday October 28 (by 11:59pm).
Status Report
Each team will turning a one page status report for their project
on Friday, November 13 by 11:59pm. This report should
include a short description of the project and should present your
progress to date, as well as any problems that you have run into.
Final Presentation
Each group will give a short (5-10 minute) PowerPoint presentation on their
project to the class. Details will be announced closer to the time of the
presentation. Your final presentation should be uploaded to CMS.
Final Writeup
Turn
in a web writeup describing your problem and
approach. It should roughly follow the format of a CVPR conference paper,
including the following:
- title, team members
- short intro
- related work, with references
to papers, web pages
- technical description
including algorithm
- experimental results
- discussion of results,
strengths/weaknesses, what worked, what didn't
- future work and what you
would do if you had more time
Code
Turn
in your code.
Option B Project Ideas
Here are several ideas that would make appropriate final research
projects. Feel free to choose variations of these or to devise
your own research problems that are not on this list. We're
happy to meet with you to discuss any of these (or other) project
ideas in more detail--if you can't make office hours, just email the
instructor to set up a meeting.
- Take a stab at the state
of the art! The following web sites provide benchmark datasets
and evaluations for several computer vision problems. Devise a new
algorithm (or modify an existing technique) and see how well you do on the
benchmarks. Here are some of the more prominent benchmarks:
- Middlebury stereo
evaluation
- Middlebury multi-view
stereo evaluation
- Middlebury optical flow
evaluation
- FOM multi-view,
camera calibration, and pose estimation evaluation
- PASCAL
challenge (includes links to other object recognition databases)
- many more standard test datasets available here
- Matching
historical and modern photos. Feature matching doesn't
always work well between images taken at dramatically different
times, e.g., decades apart, for a variety of reasons. Develop a
new matching technique designed for such image pairs, or a new
semi-automatic interface for specifying correspondences. For
inspiration, check
out these images of
downtown Ithaca,
or this
site on rephotography.
- Camera clock
calibration. Given a sequence of timestamped photos,
can we automatically calibrate the camera's clock and determine
the actual time each photo was taken, in both absolute (e.g.,
Greenwich mean time) and the local time at the location where the
photo was taken (e.g., Paris)? This would involve determining
where some of the photos were taken, as well as exploiting image
cues for lighting, shadows, and weather. This could be
formulated as an optimization problem for jointly solving for
location and time. Some places to start would be to look at work
on location recognition, such
as IM2GPS,
and on lighting estimation, such as Lalonde, et al., ICCV 2009.
- What year was
this photo taken? What decade was a movie created in?
Different time periods have characteristic fashions, products,
movie idioms, and image qualities (one of the most obvious is
that photos taken in the first half of the Twentieth Century tend
to be black and white, with some exceptions). Can you guess what
decade this
photo was taken? How
about this
one? This is a hard problem, but there might be signatures
such as noise levels and color tone that are distinctive for
different decades. Detecting faces and comparing them to a
database organized by decade may also be useful here. Google's
LIFE archive
might be a good place to look for historical photos.
- Visual links
between web pages. Sometimes two pages on the web may
be about the same topic, but not obviously so (such as when they
are in different languages, such
as this
page
and this
one). Can we detect related articles and blog posts through
visual matching of images on the Web?
- Parallelization
of vision algorithms. There are several interesting
research problems in parallelizing different computer vision
problems:
- How can we program basic graph algorithms on new platforms
such as MapReduce?
- How can we exploit new architectures such as GPUs for
image matching and reconstruction?
- Can we make structure from motion faster by breaking it up
into a hierarchy of smaller, overlapping problems, and solving
these problems in parallel?
- Vision on the iPhone. Implement feature detection and
matching, tracking, or other application on the iPhone (or other
mobile device). These algorithms would be useful as a basis
for
augmented reality.
- Face scanning on a laptop. A laptop is an interesting
platform for structured light scanning. The display can be used as
a (very diffuse) projector, and the webcam (if present) can be used
as the capture device. Design a system for 3D face scanning using
only a laptop.
- Finding good photos:
blur, smile, exposure, and orientation. Given a collection of photos
taken of the same person or location, determine which of the photos to
keep. Evaluate each photograph based on criteria such as (see [Datta et al. ECCV 2006 and Ke
et al. CVPR 2006] for more potential ideas):
1. Blur: how sharp the images are. A simple measure is the average squared
gradient; more sophisticated techniques can be found in [Neel Joshi et
al., CVPR 2008].
2. Exposure: images should neither have too many saturated or dark pixels,
but should also span the full available gamut.
3. Orientation: the horizon line should be horizontal. Use analysis of
vanishing points to establish this.
4. Facial expression: find the face, then
determine if the mouth is smiling and the eyes are open.
5. Composition: the subject(s) of interest (person, face) should be placed
according to the "rule of thirds".
- Aligning color
spaces. Two images taken with different cameras, even
if they were taken from similar viewpoints and times, often have
different color characteristics because different cameras will
apply different color transformations before writing the image
out to memory. How can we take two such images and transform
them so that there color spaces align?
- Augmented reality.
Create a system where you can walk around with a portable device equipped
with a camera and display (e.g., laptop or mobile phone) and have it
overlay information about the scene on the current photo. One approach
would be to use face detection/recognition to identify people.
Another would be to use feature correspondence to transfer tags from a set
of pre-annotated photos taken around the scene (see the Photo Tourism system for an
example of this idea).