CS4670/5670: Computer Vision, Fall 2013
Project 5:  Object Detection



The goal of this project is to implement a simple, effective method for detecting pedestrians in an image. You will be working off of the technique of Dalal and Triggs (PDF) from 2005. This technique has four main components:
  1. A feature descriptor. We first need a way to describe an image region with a high-dimensional descriptor. For this project, you will be implementing two descriptors: tiny images and histogram of gradients (HOG) features.
  2. A learning method. Next, we need a way to learn to classify an image region (described using one of the features above) as a pedestrian or not. For this, we will be using support vector machines (SVMs) and a large training dataset of image regions containing pedestrians (positive examples) or not containing pedestrians (negative examples).
  3. A sliding window detector. Using our classifer, we can tell if an image region looks like a pedestrian or not. The next step is to run this classifier as a sliding window detector on an input image in order to detect all instances of pedestrians in that image. In order to detect pedestrians at multiple scales we run our sliding window detector at multiple scales to form a pyramid of detector responses.
  4. Non-maxima suppression. Given the pyramid generated by the sliding window detector the final step is to find the best detections in each region by selecting the strongest responses within a neighborhood within an image and across scales.
Using our skeleton code as a starting point, you'll be implementing parts of all four of these components, and evaluating your methods by creating precision-recall (PR) curves.




Generating project files with CMake

This project uses CMake to generate compilation files from a set of project description files CMakeLists.txt. For those unfamiliar with CMake you can find out more about it in this wiki. Cmake is readily available on Linux, and can be downloaded for other platforms (both command line or GUI version available here. CMake searches for dependencies and can automatically generate compilation instructions in the form of Makefiles, Visual Studio project files, XCode project files, etc (run cmake -h to see a full list of project formats). The basic procedure for generating these files with the command line tool is to first create directory where the compilation files will go
>> cd path/with/source
>> mkdir build
>> cd build
and then running cmake inside the build directory. The simplest form is
>> cmake .. # Assuming here you are inside the previously created build directory
the command will search for dependencies and generate a Makefile. Now, if you have no errors you can build the project with
>> make
if you are getting compilation errors related to linking and headers that were not found it might useful to run
>> VERBOSE=1 make
this will output all commands that CMake is running (normally it only prints out which file it is currently working on). CMake can also generate build instructions in debug and release modes with the following flags
>> cmake -DCMAKE_BUILD_TYPE=Debug ..
>> cmake -DCMAKE_BUILD_TYPE=Release ..
To generate project files for other IDEs you can use the flag -G
>> cmake -G Xcode ..


CMake also has a GUI that is especially useful in Microsoft Windows environments. TIPS: When generating project files for Visual Studio make sure to tell CMake to generate 32-bit projects (by selecting Visual Studio 10 as the compiler when it asks, instead of Visual Studio 10 x64). Once you generate the Visual Studio project and open it, you might also want to manually set the startup project to objdet (instead of ALL_BUILD) to get the debugging to work properly (to do this, right-click on the objdet project in Visual Studio, and select Set as StartUp Project.

Using the software

This project has a GUI and a command line interface, which are complimentary to each other. The GUI serves as a way to visually inspect different aspects of the pipline and to fine tune parameters. The command line interface is used to train our classifier and test it on the datasets we will provide. You can also load the generated classifier into the GUI and run it on individual images.

To launch the GUI simply run the command without arguments or double click on its icon. To get help with the command line interface run objdet -h.

Training a classifier

In order to train a new SVM classifier you will run the following command
>> objdet TRAIN pedestrian_train.cdataset -f hog hog.svm
This will load all images in the dataset pedestrian_train, extract HOG descriptors, train the classifier and save it to the file hog.svm. The .dataset file contains a list of filenames and the class of each image. A +1 before the filename indicates a file that contains a pedestrian, while -1 indicates that there are no pedestrians. Finally, the program will save the trained model into the file hog.svm.

The HOG feature extractor created uses the default parameters, if you want to try different settings you can do so by choosing them in the GUI, saving them to file with "File/Save Parameters" and then running

>> objdet TRAIN pedestrian_train.cdataset -p hog.param hog.svm
Note that the -f flag used to choose the descriptor is no longer necessary as this information is saved in the params file.

Once you have a trained classifier you can visualize its weights in the GUI by loading the .svm file and clicking on the menu item "SVM/Show SVM Weights". For the HOG descriptor the GUI will display an image that is similar to the following one

Here the left side, in red, shows a visualization of negative weights; these are edge orientations that should not be present in an image region containing a pedestrian. For instance, observe the horizontal edges in the region of the legs. On the right, in green, are the positive weights showing edge orientations that should be present in images of pedestrians.

Testing the classifier

To test the classifier you will run the command
>> objdet PRED pedestrian_test.cdataset hog.svm hog.pr hog.cdataset
This will load the images in pedestrian_test, extract descriptors, and classify them using the classifier stored in hog.svm. In the terminal you see the average precision of the classifier on the given dataset. The command also generates a .pr file, which contains the Precision-Recall curve, and a .cdataset file, which contains the classifier output for each input image.

To visualize the PR curve we provide the MATLAB script plot_pr.m that can plot multiple curves at once (in case you want to compare results for different settings or descriptors). To generate a plot for the PR curves hog.pr and ti.pr you can invoke the script in MATLAB

MATLAB>> plot_pr('PR curve', 'ti.pr', 'TinyImg', 'hog.pr', 'HOG', 'output', 'pr.eps')
The first argument is the plot title; this is followed by a list of pairs containing the .pr file followed by the curve name (which will show up in the plot legend), and finally you can optionally specify an output image with the 'output' option followed by the output filename. An example of the precision-recall curve for the solution code is show below:

Sliding window detection

So far we have trained and tested the classifier on cropped images, where the image either contained a pedestrian or not. A more realistic use is to run the classifier on an uncropped image, evaluating for every possible location and scale wether there is an instance of the object of interests or not. The final parts of this project involve implementing the functionality that will evaluate the classifier you train on a all scales and locations of an image and to select the best detections inside an image. Once this is done you will test your sliding window detector with the following command
>> objdet PREDSL test_predsl.dataset hog.svm hog_preds.pr hog.dataset
The command above is very similar to the one we used to evaluate the detector on the cropped images. Note however that here we are using a .dataset file, instead of the .cdataset one we used before. This datset file format specifies uncropped images together with the location of possibly multiple pedestrians. Here again you can fine tune the parameters for the image pyramid and non-maxima supression in the GUI, save them, and pass them to the command line with the flag -p. Note that here only the image pyarmid and non-maxima supression parameters in the file will be used, the feature extraction parameters are fixed and contained in the .svm file. When implementing the sliding window detection you might find it useful to inspect the result of applying the classifier to an image. You can visualize this in the GUI in the "SVM Response" tab. To fine tune parameters and visualize the results of your implementation of non-maxima suppression you can use the "Detections" tab in the GUI.

Exposing More Parameters

If you find it necessary to add extra parameters to any of the classes that are manipulated in the GUI (e.g., your feature extractor or your non maxima suppression code) you can easily expose these fields by editing three methods: The three methods manipulate instances of the class ParametersMap, which is essentially a dictionary that associates strings (the parameter name) to the parameter values. By editing these three methods you expose the fields in the GUI and ensure that they are properly read and stored to file.


All TODOs are part of the library subproject od


In addition to the code, you will need to turn in a zipfile with .param files, along with a webpage, as the artifact. Your zipfile should contain the following items:

Further Reading

Extra credit

Here are some ideas of things you can implement for extra credit (some of these are described in the Dalal and Triggs paper):

Last modified on December 4, 2013