Computer Vision (CS6670), Fall 2009

Final Project

Assigned: Tuesday, October 20
Proposal Due: Wednesday, October 28 by 11:59pm (via CMS), noon
Status Report: Tuesday, November 24 by 11:59pm
Final Presentation Slides Due: Wednesday, December 16, 10am
Final Presentation: Wednesday, December 16, 2-4:45pm, Upson 315
Writeup/Code Due: Thursday, December 17, 11:59pm

Synopsis

There are two options for this project, which will take roughly four weeks.

A. Working alone, implement a state-of-the-art research paper from a recent computer vision conference or journal.

B. Working in a group of two or three, complete a short research project. You can devise your own project from scratch, or use one of the ideas suggested below.

In either case, the purpose is (1) to learn more about a subfield of computer vision, and (2) to get a feel for doing research in this field.

Guidelines

If you choose Option A, you should start by searching through recent computer vision conference proceedings or journal articles, and choosing a paper that interests you. The premiere vision conferences are CVPR, ICCV, and ECCV. The premiere journals are IJCV and PAMI. Click on the links to access the online articles (need to be on a Cornell machine/proxy to download PDF). We recommend starting with the most recent years, i.e., CVPR 2009, ICCV 2009, and ECCV 2008. Most of the papers and project web sites are linked online from this nice webpage. Alternatively, you may find it more convenient to leaf through the proceedings or transactions in hand at the Cornell Engineering Library. You should select a paper that is appropriate for a five-week project. I.e., it should be more involved than one of the class projects (~ double the effort). In most cases, the expectation is that you will implement the method yourself rather than use any code that the authors make available. If you do plan to use existing code, make sure and clearly explain this in your proposal and justify its use.

If you choose Option B, you need to undertake a research project with some novelty. What constitutes a "research project?" Well, it must be new, something that no one has published before. Naturally we're not expecting PhD-level research in this amount of time. Following, are some examples of what we have in mind. These are listed roughly in order of least to most ambitious:

An experimental evaluation. Implement one or more existing algorithms and design an in-depth experimental evaluation and comparison that goes beyond what was described in the paper(s). Identify the relative strengths and weaknesses and include this in your report.
An interesting extension of prior work. In most cases, we'd recommend implementing the prior method yourself, rather than downloading implementations available online, as this gives you a better understanding of how the method works (and you can avoid mucking around with some one else's code). But this is not a hard and fast rule--if the extension is very significant you may use available code.
A new application of prior work. Apply a known technique to a new application domain, and evaluate its performance.
Develop a new solution (hopefully better!) to an existing problem. If you chose this option, you have to figure out the solution by the time you submit the proposal, to convince us that your method will work.
Pose a new technical problem and solve it. Identify a new problem for which no known solution exists, devise a solution, and implement/test it. If you chose this option, you have to figure out both the problem and solution by the time you submit the proposal.

How ambitious/difficult should your project be? Each team member should count on committing at least twice the work as one of the previous class projects.

Requirements

Proposal

Each team will turn in a one-page proposal describing their project. It should specify:

Your team members
Project goals. Be specific. Describe the input and output.
Brief description of your approach. If you are implementing or extending a previous method, give the reference and web link to the paper.
Will you be using helper code (e.g., available online) or will you implement it all yourself?
Evaluation method. How will you test it? Which test cases will you use?
Breakdown--what will each team-member do? Ideally, everyone should do something imaging/vision related (it's not good for one team member to focus purely on user-interface, for instance).
Special equipment that will be needed. We may be able to help with cameras, tripods, etc.

Each team must submit a proposal, even if you choose one of the research ideas described below.

Turn in the proposal via CMS by Wednesday October 28 (by 11:59pm).

Status Report

Each team will turning a one page status report for their project on Friday, November 13 by 11:59pm. This report should include a short description of the project and should present your progress to date, as well as any problems that you have run into.

Final Presentation

Each group will give a short (5-10 minute) PowerPoint presentation on their project to the class. Details will be announced closer to the time of the presentation. Your final presentation should be uploaded to CMS.

Final Writeup

Turn in a web writeup describing your problem and approach. It should roughly follow the format of a CVPR conference paper, including the following:

title, team members
short intro
related work, with references to papers, web pages
technical description including algorithm
experimental results
discussion of results, strengths/weaknesses, what worked, what didn't
future work and what you would do if you had more time

Code

Turn in your code.

Option B Project Ideas

Here are several ideas that would make appropriate final research projects. Feel free to choose variations of these or to devise your own research problems that are not on this list. We're happy to meet with you to discuss any of these (or other) project ideas in more detail--if you can't make office hours, just email the instructor to set up a meeting.

Take a stab at the state of the art! The following web sites provide benchmark datasets and evaluations for several computer vision problems. Devise a new algorithm (or modify an existing technique) and see how well you do on the benchmarks. Here are some of the more prominent benchmarks:
- Middlebury stereo evaluation
- Middlebury multi-view stereo evaluation
- Middlebury optical flow evaluation
- FOM multi-view, camera calibration, and pose estimation evaluation
- PASCAL challenge (includes links to other object recognition databases)
- many more standard test datasets available here
Matching historical and modern photos. Feature matching doesn't always work well between images taken at dramatically different times, e.g., decades apart, for a variety of reasons. Develop a new matching technique designed for such image pairs, or a new semi-automatic interface for specifying correspondences. For inspiration, check out these images of downtown Ithaca, or this site on rephotography.
Camera clock calibration. Given a sequence of timestamped photos, can we automatically calibrate the camera's clock and determine the actual time each photo was taken, in both absolute (e.g., Greenwich mean time) and the local time at the location where the photo was taken (e.g., Paris)? This would involve determining where some of the photos were taken, as well as exploiting image cues for lighting, shadows, and weather. This could be formulated as an optimization problem for jointly solving for location and time. Some places to start would be to look at work on location recognition, such as IM2GPS, and on lighting estimation, such as Lalonde, et al., ICCV 2009.
What year was this photo taken? What decade was a movie created in? Different time periods have characteristic fashions, products, movie idioms, and image qualities (one of the most obvious is that photos taken in the first half of the Twentieth Century tend to be black and white, with some exceptions). Can you guess what decade this photo was taken? How about this one? This is a hard problem, but there might be signatures such as noise levels and color tone that are distinctive for different decades. Detecting faces and comparing them to a database organized by decade may also be useful here. Google's LIFE archive might be a good place to look for historical photos.
Visual links between web pages. Sometimes two pages on the web may be about the same topic, but not obviously so (such as when they are in different languages, such as this page and this one). Can we detect related articles and blog posts through visual matching of images on the Web?
Parallelization of vision algorithms. There are several interesting research problems in parallelizing different computer vision problems:
- How can we program basic graph algorithms on new platforms such as MapReduce?
- How can we exploit new architectures such as GPUs for image matching and reconstruction?
- Can we make structure from motion faster by breaking it up into a hierarchy of smaller, overlapping problems, and solving these problems in parallel?
Vision on the iPhone. Implement feature detection and matching, tracking, or other application on the iPhone (or other mobile device). These algorithms would be useful as a basis for augmented reality.
Face scanning on a laptop. A laptop is an interesting platform for structured light scanning. The display can be used as a (very diffuse) projector, and the webcam (if present) can be used as the capture device. Design a system for 3D face scanning using only a laptop.
Finding good photos: blur, smile, exposure, and orientation. Given a collection of photos taken of the same person or location, determine which of the photos to keep. Evaluate each photograph based on criteria such as (see [Datta et al. ECCV 2006 and Ke et al. CVPR 2006] for more potential ideas):
1. Blur: how sharp the images are. A simple measure is the average squared gradient; more sophisticated techniques can be found in [Neel Joshi et al., CVPR 2008].
2. Exposure: images should neither have too many saturated or dark pixels, but should also span the full available gamut.
3. Orientation: the horizon line should be horizontal. Use analysis of vanishing points to establish this.
4. Facial expression: find the face, then determine if the mouth is smiling and the eyes are open.
5. Composition: the subject(s) of interest (person, face) should be placed according to the "rule of thirds".
Aligning color spaces. Two images taken with different cameras, even if they were taken from similar viewpoints and times, often have different color characteristics because different cameras will apply different color transformations before writing the image out to memory. How can we take two such images and transform them so that there color spaces align?
Augmented reality. Create a system where you can walk around with a portable device equipped with a camera and display (e.g., laptop or mobile phone) and have it overlay information about the scene on the current photo. One approach would be to use face detection/recognition to identify people. Another would be to use feature correspondence to transfer tags from a set of pre-annotated photos taken around the scene (see the Photo Tourism system for an example of this idea).