CS4670/5670: Computer Vision, Fall 2012
Project 5: Object Detection
Brief
- Assigned: Sunday, November 25, 2012
- Code Due: Friday, November 30, 2012 (by 11:59pm)
- Artifact Due: Saturday, December 1, 2012 (by 11:59pm)
- This assignment should be done in groups of 2 students.
Introduction
The goal of this project is to implement a simple, effective method
for detecting pedestrians in an image. You will be working off of the
technique of Dalal and Triggs
(
PDF) from 2005. This
technique has three main components:
- A feature descriptor. We first need a way to describe an
image region with a high-dimensional descriptor. For this project, you will be implementing two descriptors: tiny images and histogram of gradients (HOG) features.
- A learning method. Next, we need a way to learn to
classify an image region (described using one of the features above)
as a pedestrian or not. For this, we will be using support vector
machines (SVMs) and a large training dataset of image regions
containing pedestrians (positive examples) or not containing
pedestrians (negative examples).
- A sliding window detector. Using our classifer, we can
tell if an image region looks like a pedestrian or not. The final
step is to run this classifier as a sliding window detector
on an input image in order to detect all instances of pedestrians in
that image.
Using our skeleton code as a starting point, you'll be implementing
parts of all three of these components, and evaluating your methods by
creating precision-recall (PR) curves.
Downloads
- Skeleton code
For this assignment we will
distribute the skeleton code
using git. (This should help
make distributing any updates easier.) Please
install git on your system; installed, you can
download the code by typing (using the command-line interface
to git)
>> git clone http://www.cs.cornell.edu/courses/cs4670/2012fa/projects/p5/skeleton.git
This will create the directory skeleton. To get updates to the code
you can then simply run
>> git pull
This will fetch any updates and merge them into your local
copy. If we modify a file you have already
modified git will not overwrite your
changes. Instead, it will mark the file as having a conflict
and then ask you to resolve how to integrate the changes from
these two
sources. Here's
a quick guide on how to resolve these conflicts.
For those that are already using git to work in groups, you
can still share code with your partner by having multiple
masters to your local repository (one being this original
repository and the other some remote service like github
where you host the code you are working
on); here's
a reference with more information.
- Solution executables: Mac, Linux, Windows
- Pedestrian dataset (18MB). You will use this dataset for training and testing your detector.
- Full negatives set (87MB, only for extra credit)
Compiling
Dependencies
Generating project files with CMake
This project
uses cmake to generate compilation files from a set of
project description
files CMakeLists.txt. For those unfamiliar
with cmake you can find out more about it in this wiki. cmake
searches for dependencies and can automatically generate compilation
instructions in the form of Make files, Visual Studio project files,
XCode project files, etc (run cmake -h to see a full list of
project formats). The basic procedure for generating these files is
to first create directory where the compilation files will go
>> cd path/with/source
>> mkdir build
>> cd build
and then running cmake inside the build directory. The
simplest form is
>> cmake .. # Assuming here you are inside the previously created build directory
the command will search for dependencies and generate a Makefile. Now,
if you have no errors you can build the project with
>> make
if you are getting compilation errors related to linking and headers
that were not found it might useful to run
>> VERBOSE=1 make
this will output all commands that cmake is running (normally
it only prints out which file it is currently working on).
cmake can also generate build instructions in debug and
release modes; you can get it to do this as follows:
>> cmake -DCMAKE_BUILD_TYPE=Debug ..
>> cmake -DCMAKE_BUILD_TYPE=Release ..
Windows
The following suggestions assume that you are using
the cmake GUI (cmake-gui) to generate a Visual Studio
project. In our experience, cmake will likely fail the first
time you try to run cmake because it will not be able to find
the include and lib directories for libjpeg. If you don't already have
the libjpeg library, it can be obtained from
GnuWin32
(install the complete package).
Once you have it installed libjpeg
for Windows, you can tell CMake where the include and library files
are by clicking on JPEG_INCLUDE_DIR
and JPEG_LIBRARY and specifying the correct
paths.
If you used the GnuWin32 installer they should be
C:\Program Files\GnuWin32\include
C:\Program Files\GnuWin32\lib\jpeg.lib
UPDATE: we now recommend using libjpeg-turbo under
Windows, rather than the GnuWin32 implementation of libjpeg, as the
latter seems to be out-of-date and buggy with new versions of Visual
Studio. libjpeg-turbo is
available from
Sourceforge here. It extracts to a custom path, so you will need
to update your JPEG_INCLUDE_DIR and JPEG_LIBRARY
cmake variables to point to the propery include directory and .lib
file where you extract the code. In our case, we used jpeg-static.lib
to avoid the hassle of dealing with an extra dll.
Once these paths are corrected, click on configure and then generate
to create Visual Studio files. You still might get compilation errors
related to lib jpeg header files not being found. To fix this select
the subprojects jpegrw, objectdetect, and image. Right click on them
and select "Properties". In Configure Properties -> C/C++ set the
search path in "Additional Include Directories" and click apply.
Using the software
This project has no GUI; all parts of the project can be run on the
command line, executing the
objectdetect binary with one of
several modes as the first argument (including FEATVIZ, TRAIN, PRED,
PREDSL, and SVMVIZ). The first TODO item you will implement consists of
feature extraction, either TinyImage or HOG. You can test to see your
code by running the following command
>> objectdetect FEATVIZ hog test.jpg test_hog.jpg
This will extract a HOG feature (you can also try tinyimg) for the
image
test.jpg and generate a graphical
representation that is saved
to
test_hog.jpg. If you do this with the
solution executable you should get
Once your feature extraction code is running correctly you will train
a linear SVM to classify image as containing pedestrians or not. This
is done with the following command
>> objectdetect TRAIN pedestrian_train.dataset hog hog.svm
This will load the set of images specified in the dataset
file
pedestrian_train.dataset, extract a HOG
feature for each one of them, and then train the SVM
classifier. The
.dataset file contains a
list of filenames and the class of each image. A +1 before the
filename indicates a file that contains a pedestrian, while -1
indicates that there are no pedestrians. Finally, the program will
save the trained model into the
file
hog.svm.
You can get an intuition to what the SVM model is doing by visualizing the set of weights it found. To do this
you can run the command
>> objectdetect SVMVIZ hog.svm svmhog.jpg
This will generate the following image
Here the left side, in red, shows a visualization of negative weights;
these are edge orientations that should not be present in an image
region containing a pedestrian. For instance, observe the horizontal
edges in the region of the legs. On the right, in green, are the
positive weights showing edge orientations that
should be
present in images of pedestrians.
Once we train an SVM classifier, you will test it to measure how
well it performs. You will do this by classifing a set of images that
were not present in the set of images used for training; this will
measure how well the model you trained generalizes to other images of
the same class. We provide a second .dataset
file with a separate set of images to use for testing. To test your
classifier, run the command:
>> objectdetect PRED pedestrian_test.dataset hog.svm hog.pr hog.preds
This will print out the average precision, and generate two
files:
hog.pr contains
the
precision
recall curve and
hog.preds the
classifier output for each image in the same format as
the
.dataset file. You can inspect this
second file to find out to which class each example was assigned to.
To visualize the precision-recall curves we provide a MATLAB
script
plot_pr.m that can plot an arbitrary
number of
.pr files together (if you don't
have MATLAB you can also try using this script
with
Octave, a
freely available alternative that is mostly code compatible; you can
also try using
Gnuplot, the tool you used in Project 2). To
generate the plots you can call the script like
MATLAB>> plot_pr('PR curve', 'tinyimg.pr', 'TinyImg', 'hog.pr', 'HOG', 'output', 'pr.eps')
The first argument is the plot title; this is followed by a list of
pairs containing the
.pr file together with
the curve name (which will show up in the plot legend), and finally
you can optionally specify an output image with the 'output' option
followed by the output filename. An example of the precision-recall
curve for the solution code is show below:
Sliding window detector
So far we have trained and tested the classifier on cropped images. A
more realistic use is to run the classifier on an uncropped image,
evaluating for every possible location (and potentially scale) wether
there is an instance of the object of interests or not. You can do
this by runnign the command
>> objectdetector PREDSL test.jpg hog.svm test_score.jpg
This will run the classifier with the model in
hog.svm on every position (but only on a single scale)
of the input image
test.jpg and save a heat map of the classifier output into
test_score.jpg.
Here's a pair of input and output from the solution code with the HOG feature
you can see bright white spots on three of the people in the scene.
Todo
- Features.cpp
- TODO: TinyImageFeatureExtractor::operator()
- TODO: HOGFeatureExtractor::operator().
The HOG
descriptor, as described in class, divides an image region
into a set of k x k cells, computes a histogram of
gradient orientations for each cell, normalizes each
histogram, and then concatenates the histogram for each
cell into a single, high-dimensional descriptor vector.
Please see the lecture notes and the Dalal and Triggs
paper for more information.
- SupportVectorMachine.cpp
- TODO: SVM Train
- TODO: SVM Sliding Window
Turnin
In addition to the code, you'll need to turn in a
zipfile with your trained detectors, along with a webpage, as the
artifact. Your zipfile should contain the following items:
- The .svm files generated for
your TinyImg and HOG features, named tinyimg.svm
and hog.svm.
- A webpage containing
- The visualizations generated with FEATVIZ
of a sample image for both features.
- The visualizations generated with SVMVIZ
for both features.
- Precision recall curves computed
with the test dataset containing results for TinyImg and HOG
features. You can additionally show the PR curve for other
variants of the feature descriptors you implement.
- On your webpage you will also include an svm score image with
the PREDSL option for both TinyImg
and HOG features. You should choose your own input image (one not provided by us) on which to run your sliding window detector.
- Please describe any extra credit items on your webpage.
Extra credit
Here are some ideas of things you can implement
for extra credit (some of these are described in the Dalal and Triggs
paper):
- A better method for normalizing your HOG features
- A way to mine for hard negatives and improve your classifier (see the original
paper for an explanation)
- A multi-scale detector
- Non-maxima suppression
- Invent your own crazy feature
Last modified on November 24, 2012