Introduction to Computer Vision (CS4670), Fall 2010

Final Project



Assigned: Sunday, October 24
Proposal Due:  Friday, October 29 by 11:59pm (via CMS)
Status Report:  Friday, November 19, 11:59pm
Final Presentation Slides Due: Sunday, December 12, 7:59pm
Final Presentation: Monday, December 13, 9-11:30am, *** Location: *** Phillips 203
Writeup/Code Due: Tuesday, December 14, 11:59pm

Synopsis and Guidelines

The final project is an open-ended project in mobile computer vision. In teams of two or three, you will come up with a project idea (some ideas to get you started are described below), then implement it on a Nokia N900. Your project must involve some kind of demo that you can show in class, either through a live demo or a video that you have created of the system in action. The project could be a new research project, or a reimplementation of an existing system, but must involve a non-trivial implementation of an computer vision algorithm or system. Teams of three will be expected to devise and implement a more ambitious project than teams of two.

The goal of the project is (1) to learn more about a subfield of computer vision, and (2) get more hands-on experience with computer vision on mobile devices. 

How ambitious/difficult should your project be?  Each team member should count on committing at least twice the work as in Project 2ab.

Accordingly, you won't be able to implement something arbitrarily ambitious, but please feel free to use your imagination when coming up with projects, and to implement a prototype system that could be extended in interesting ways. As part of the project, you can use any capability that you can think of. For instance, the phone comes with wireless, a touch screen, accelerometer, GPU, GPS, and other bells and whistles. You can set up a remote server that listens for requests from the phone, and runs some vision algorithm on the server. You can use Google Streetview, or any other existing API on the Web (as long as you still implement something interesting yourselves).

Requirements

Proposal

Each team will turn in an approximately one-page proposal describing their project.  It should specify:

  1. Your team members
  2. Project goals.  Be specific.  Describe what the inputs to the system are, and what the outputs will be.
  3. Brief description of your approach.  If you are implementing or extending a previous method, give the reference and web link to the paper.
  4. Will you be using helper code (e.g., available online) or will you implement it all yourself?
  5. Breakdown--what will each team-member do?  Ideally, everyone should do something imaging/vision related (it's not good for one team member to focus purely on user-interface, for instance).
  6. Special equipment that will be needed.  We may be able to help with servers, extra cameras, etc.

Turn in the proposal via CMS by Thursday, October 28 (by 11:59pm).

Status Report

Each team will turn in a one page status report for their project on Friday, November 19 by 11:59pm.  This report should present your progress to date, including preliminary results, as well as any problems that you are encountering.

Final Presentation

Each group will give a short (10 minute) PowerPoint presentation on their project to the class.  Details will be announced closer to the time of the presentation. Your final presentation slides should be uploaded to CMS. Your presentation is expected to include some kind of (canned or live) demo.

Final Writeup

Turn in a web writeup describing your problem and approach.  This writeup should include the following:

  • title, team members
  • short intro
  • related work, with references to papers, web pages
  • technical description including algorithm
  • experimental results
  • discussion of results, strengths/weaknesses, what worked, what didn't
  • future work and what you would do if you had more time

Code

In addition to the writeup, you will be turning in the code associate with the project.

Resources

Coming soon...

Final Project Ideas!

Here are several ideas that would make appropriate final projects.  Feel free to choose variations of these or to devise your own projects that are not on this list.  We're happy to meet with you to discuss any of these (or other) project ideas in more detail--if you can't make office hours, just email the instructor to set up a meeting.

  1. Nokia Goggles Lite. Write an app that can recognize some limited class of objects, like all book covers or all DVDs, all artwork, etc. You may need to write a scraper to download large sets of images from Amazon, for instance, in order to create the database. You'll probably also need to create a server for this project.
  2. Computational Photography App.  Computational photography (which we will talk about in class) uses computation combined with imaging to create better images (your panorama stitcher in Project 2 is an example of a computational photography application). A mobile device---combining a camera with computation---is an ideal platform for computational photography. Devise and implement a computational photography app on the N900. Here are some possibilities:
    • HDR (high dynamic range) imaging. Cameras are limited in the dynamic range (i.e., the range of intensities of light hitting the sensor) they can capture in a single photo; it is hard to capture very bright and very dark intensities in the same photo. However, if we take multiple photos with different exposures, we can combine them together to produce an HDR (high dynamic range) image. You can see many examples of HDR images on Flickr. Write an app for taking multiple photos with different exposures and combining them on the phone. See a related project from Li Zhang here. You can also get inspiration from the iPhone's version of this app, reviewed here on Ars Technica.
    • 360 panorama capture. Extend your panorama app in Project 2b to create an entire 360 panorama on the phone (see the Monster Bells on the Project 2b page). Alternatively, make a real-time panorama capture app as described in this project, or as implemented in the N900 QuickPanorama app.
    • Flash-no flash. Use the flash in the N900 to capture flash-no flash pairs then combine them into beautiful images, as described in this project.
    • Video stabilization. Use optical flow to create a real-time image stabilizing algorithm.
    • Image deblurring. Can you use the phone's accelerometer to help with image deblurring? See this project.
    • Something else cool. Take a look at the Frankencamera project for more ideas for computational photography apps.
  3. Location-based games. The instructor is working on PhotoCity, a game for photographing all of Cornell. Part of this game is an app that runs on a mobile device. Build an N900 version of this app. For more details, talk to the instructor.
  4. Location recognition and augmented reality.  Create a Cornell campus app that recognizes what building you are in front of by taking a photo and comparing it to a large database of Cornell images. If you are interested in this project, please talk to the instructor (who has tens of thousands of images of Cornell that could be used as a database). Use this in a campus tour, which displays additional information on top of the photo, such as the name of the building and a link to Wikipedia.
  5. Misc. Augmented Reality or Recognition App. There are many things you could do here. Take a picture of an airplane flying overhead, and automatically highlight the plane, along with the flight number, by connecting with an online flight database (and estimating the rough location and orientation of the phone). You could write a barcode scanner, and display useful information on an image of a product. You could write a vision-based Sudoku-capture app for taking a photo of a Sudoku puzzle and converting it to a digital version. And so on.
  6. Digital object insertion. Build an app to track the pose of a camera, and insert a digital 3D object into the real scene (as viewed through the phone. This makes the phone into an interface for viewing a virtual 3D object by simply walking around it.
  7. Face recognition. Write an app that can take an photo of a person and recognize them and display their name.
  8. Artistic image filtering.  Create an image filtering app that in real-time applies an interesting (non-linear) filter to the stream of images. For instance, you could apply a cartooning effect, as in this project on real-time video abstraction.
  9. Vision-based user interface. Write an app that will implement a phone UI based on computer vision (this is most useful for a phone with a front-facing camera, e.g. an iPhone 4, but we can at least prototype one with the N900). If might track features on your face, for instance, or recognize gestures, in order to activate certain UI commands (e.g., ``raise left eyebrow'' might push the ``1'' button on the dialpad---you can probably think of much more useful ideas). This could be used as an interface for impaired users, or as an interface to a new game (imagine using your face to control a game character).
  10. Stereo/structure from motion. Use the camera to capture two images, then run stereo on them to produce an image with depth map. You'll first need to estimate the F-matrix between the two images.
  11. Autofocus. Create a camera app that implementes autofocus by recognizing faces in the image.