CS4670/5670: Computer Vision, Fall 2012
Project 5: Object Detection

Brief

Assigned: Sunday, November 25, 2012
Code Due: Friday, November 30, 2012 (by 11:59pm)
Artifact Due: Saturday, December 1, 2012 (by 11:59pm)
This assignment should be done in groups of 2 students.

Introduction

The goal of this project is to implement a simple, effective method for detecting pedestrians in an image. You will be working off of the technique of Dalal and Triggs (PDF) from 2005. This technique has three main components:

A feature descriptor. We first need a way to describe an image region with a high-dimensional descriptor. For this project, you will be implementing two descriptors: tiny images and histogram of gradients (HOG) features.
A learning method. Next, we need a way to learn to classify an image region (described using one of the features above) as a pedestrian or not. For this, we will be using support vector machines (SVMs) and a large training dataset of image regions containing pedestrians (positive examples) or not containing pedestrians (negative examples).
A sliding window detector. Using our classifer, we can tell if an image region looks like a pedestrian or not. The final step is to run this classifier as a sliding window detector on an input image in order to detect all instances of pedestrians in that image.

Using our skeleton code as a starting point, you'll be implementing parts of all three of these components, and evaluating your methods by creating precision-recall (PR) curves.

Downloads

Skeleton code

git

git

>> git clone http://www.cs.cornell.edu/courses/cs4670/2012fa/projects/p5/skeleton.git

skeleton

>> git pull

git

Here's

For those that are already using git to work in groups, you can still share code with your partner by having multiple masters to your local repository (one being this original repository and the other some remote service like github where you host the code you are working on); here's a reference with more information.

Solution executables: Mac, Linux, Windows
Pedestrian dataset (18MB). You will use this dataset for training and testing your detector.
Full negatives set (87MB, only for extra credit)

Compiling

Dependencies

libjpeg
CMake

Generating project files with CMake

This project uses cmake to generate compilation files from a set of project description files CMakeLists.txt. For those unfamiliar with cmake you can find out more about it in this wiki. cmake searches for dependencies and can automatically generate compilation instructions in the form of Make files, Visual Studio project files, XCode project files, etc (run cmake -h to see a full list of project formats). The basic procedure for generating these files is to first create directory where the compilation files will go

>> cd path/with/source
>> mkdir build
>> cd build

and then running cmake inside the build directory. The simplest form is

>> cmake .. # Assuming here you are inside the previously created build directory

the command will search for dependencies and generate a Makefile. Now, if you have no errors you can build the project with

>> make

if you are getting compilation errors related to linking and headers that were not found it might useful to run

>> VERBOSE=1 make

this will output all commands that cmake is running (normally it only prints out which file it is currently working on). cmake can also generate build instructions in debug and release modes; you can get it to do this as follows:

>> cmake -DCMAKE_BUILD_TYPE=Debug ..
>> cmake -DCMAKE_BUILD_TYPE=Release ..

Windows

The following suggestions assume that you are using the cmake GUI (cmake-gui) to generate a Visual Studio project. In our experience, cmake will likely fail the first time you try to run cmake because it will not be able to find the include and lib directories for libjpeg. If you don't already have the libjpeg library, it can be obtained from ~~GnuWin32 (install the complete package).~~ Once you have it installed libjpeg for Windows, you can tell CMake where the include and library files are by clicking on JPEG_INCLUDE_DIR and JPEG_LIBRARY and specifying the correct paths. If you used the GnuWin32 installer they should be
C:\Program Files\GnuWin32\include
C:\Program Files\GnuWin32\lib\jpeg.lib
UPDATE: we now recommend using libjpeg-turbo under Windows, rather than the GnuWin32 implementation of libjpeg, as the latter seems to be out-of-date and buggy with new versions of Visual Studio. libjpeg-turbo is available from Sourceforge here. It extracts to a custom path, so you will need to update your JPEG_INCLUDE_DIR and JPEG_LIBRARY cmake variables to point to the propery include directory and .lib file where you extract the code. In our case, we used jpeg-static.lib to avoid the hassle of dealing with an extra dll.

Once these paths are corrected, click on configure and then generate to create Visual Studio files. You still might get compilation errors related to lib jpeg header files not being found. To fix this select the subprojects jpegrw, objectdetect, and image. Right click on them and select "Properties". In Configure Properties -> C/C++ set the search path in "Additional Include Directories" and click apply.

Using the software

This project has no GUI; all parts of the project can be run on the command line, executing the objectdetect binary with one of several modes as the first argument (including FEATVIZ, TRAIN, PRED, PREDSL, and SVMVIZ). The first TODO item you will implement consists of feature extraction, either TinyImage or HOG. You can test to see your code by running the following command

>> objectdetect FEATVIZ hog test.jpg test_hog.jpg

This will extract a HOG feature (you can also try tinyimg) for the image test.jpg and generate a graphical representation that is saved to test_hog.jpg. If you do this with the solution executable you should get

Once your feature extraction code is running correctly you will train a linear SVM to classify image as containing pedestrians or not. This is done with the following command

>> objectdetect TRAIN pedestrian_train.dataset hog hog.svm

This will load the set of images specified in the dataset file pedestrian_train.dataset, extract a HOG feature for each one of them, and then train the SVM classifier. The .dataset file contains a list of filenames and the class of each image. A +1 before the filename indicates a file that contains a pedestrian, while -1 indicates that there are no pedestrians. Finally, the program will save the trained model into the file hog.svm.

You can get an intuition to what the SVM model is doing by visualizing the set of weights it found. To do this you can run the command

>> objectdetect SVMVIZ hog.svm svmhog.jpg

This will generate the following image

Here the left side, in red, shows a visualization of negative weights; these are edge orientations that should not be present in an image region containing a pedestrian. For instance, observe the horizontal edges in the region of the legs. On the right, in green, are the positive weights showing edge orientations that should be present in images of pedestrians.

Once we train an SVM classifier, you will test it to measure how well it performs. You will do this by classifing a set of images that were not present in the set of images used for training; this will measure how well the model you trained generalizes to other images of the same class. We provide a second .dataset file with a separate set of images to use for testing. To test your classifier, run the command:

>> objectdetect PRED pedestrian_test.dataset hog.svm hog.pr hog.preds

This will print out the average precision, and generate two files: hog.pr contains the precision recall curve and hog.preds the classifier output for each image in the same format as the .dataset file. You can inspect this second file to find out to which class each example was assigned to. To visualize the precision-recall curves we provide a MATLAB script plot_pr.m that can plot an arbitrary number of .pr files together (if you don't have MATLAB you can also try using this script with Octave, a freely available alternative that is mostly code compatible; you can also try using Gnuplot, the tool you used in Project 2). To generate the plots you can call the script like

MATLAB>> plot_pr('PR curve', 'tinyimg.pr', 'TinyImg', 'hog.pr', 'HOG', 'output', 'pr.eps')

The first argument is the plot title; this is followed by a list of pairs containing the .pr file together with the curve name (which will show up in the plot legend), and finally you can optionally specify an output image with the 'output' option followed by the output filename. An example of the precision-recall curve for the solution code is show below:

Sliding window detector

So far we have trained and tested the classifier on cropped images. A more realistic use is to run the classifier on an uncropped image, evaluating for every possible location (and potentially scale) wether there is an instance of the object of interests or not. You can do this by runnign the command

>> objectdetector PREDSL test.jpg hog.svm test_score.jpg

This will run the classifier with the model in hog.svm on every position (but only on a single scale) of the input image test.jpg and save a heat map of the classifier output into test_score.jpg. Here's a pair of input and output from the solution code with the HOG feature

you can see bright white spots on three of the people in the scene.

Todo

Features.cpp
- TODO: TinyImageFeatureExtractor::operator()
- TODO: HOGFeatureExtractor::operator().
  The HOG descriptor, as described in class, divides an image region into a set of k x k cells, computes a histogram of gradient orientations for each cell, normalizes each histogram, and then concatenates the histogram for each cell into a single, high-dimensional descriptor vector. Please see the lecture notes and the Dalal and Triggs paper for more information.
SupportVectorMachine.cpp
- TODO: SVM Train
- TODO: SVM Sliding Window

Turnin

In addition to the code, you'll need to turn in a zipfile with your trained detectors, along with a webpage, as the artifact. Your zipfile should contain the following items:

The .svm files generated for your TinyImg and HOG features, named tinyimg.svm and hog.svm.
A webpage containing
- The visualizations generated with FEATVIZ of a sample image for both features.
- The visualizations generated with SVMVIZ for both features.
- Precision recall curves computed with the test dataset containing results for TinyImg and HOG features. You can additionally show the PR curve for other variants of the feature descriptors you implement.
- On your webpage you will also include an svm score image with the PREDSL option for both TinyImg and HOG features. You should choose your own input image (one not provided by us) on which to run your sliding window detector.
- Please describe any extra credit items on your webpage.

Extra credit

Here are some ideas of things you can implement for extra credit (some of these are described in the Dalal and Triggs paper):

A better method for normalizing your HOG features
A way to mine for hard negatives and improve your classifier (see the original paper for an explanation)
A multi-scale detector
Non-maxima suppression
Invent your own crazy feature

Last modified on November 24, 2012

CS4670/5670: Computer Vision, Fall 2012 Project 5: Object Detection