CS6670 - Computer Vision

Picture credit: Magritte and some computer vision researchers

Quick info

Instructor: Bharath Hariharan
Lecture time: Tues. and Thurs. 1:25pm - 2:40pm
Lecture venue: Phillips Hall 219
TA: Davis Wertheimer

Office Hours:
Bharath: Wed and Fri 3 pm (at 311 Gates Hall)
Davis: Tues 11 am (G17 Gates Hall)

Objective:

This course will serve as an introduction to computer vision for anyone who wants to do research in this area. It will cover both fundamentals which underly classic techniques, as well as the problems the community is currently working on, and modern techniques being used to solve these problems. A tentative list of topics is below:

Prerequisites:

This course is intended for graduate students starting out in computer vision research. As such it will assume that students are mathematically mature, and are comfortable with: In addition, familiarity with basic machine learning will be useful but is not required.

Guidelines for project proposal:

The project proposal should be approximately one page in length. As a rule of thumb, it should spend about one or two paragraphs each on:

Lectures / Notes:

Reference (for the first part of the course): Rick Szeliski's book. This is not a textbook, in that it covers a lot more material in a lot more detail, but can be used for additional reading. Below is the (tentative) list of classes, with possible additional readings. These may change as the semester progresses.
Date Topic (with linked notes / slides) Additional reading
Aug 22 Introduction
Aug 24 Image Formation - Geometry
Aug 29 All about rotations | Image formation - color
Aug 31 Reconstruction - I
Sep 5 Reconstruction - II (Epipolar Geometry)
Sep 7 The correspondence problem
Sep 12 Optical flow Szeliski 8.4
Sep 14 Grouping Contour detection
Graph-based segmentation (Szeliski 5.4, 5.5)
Segmentation for object proposals (Selective search)
Sep 19 Introduction to machine learning
Example case: logistic regression
Empirical risk minimization
Classical (pre-convnet) recognition
Bag-of-words, Spatial pyramids
Sep 21 Non-linear classifiers and Neural networks
Convolutional networks
Deformable part models
MNIST (Sections I, II and III. Also read the rest and contemplate cyclical nature of research)
- Backpropagation and computation graphs
Image classification
ImageNet
Transfer learning (Many examples)
- Convolutional network architectures VGG16, VGG19, 3x3 convolutions
Batch normalization
Highway networks
Residual networks
- Object detection
Datasets and metrics
R-CNN
Fast R-CNN
Faster R-CNN
SSD
- Semantic segmentation
Instance segmentation
Datasets and metrics
FCN, skip connections
Dilated convolutions, CRFs
Dataset, metrics, segmentation as region classification
Hypercolumns / skip connections, segmentation as detection refinement
Instance segmentation using FCNs
- Pose Estimation
Datasets and metrics
Heatmap representations, graphical model based refinement
Sequential prediction, autocontext and inference machines
Hourglass architectures
- Learning for 3D
Datasets and metrics
Rigid body pose estimation
Deep stereo
Learning to correspond for stereo
Depth estimation from a single image
Normal estimation from a single image
- Learning correspondence Learning optical flow from simulated data
Learning from hallucinated data
Learning from constraints
- Video recognition
Datasets and metrics
Video classification as frame+flow classification
CNN+LSTM
3D convolution
I3D
- Vision and language Captioning
Visual question answering
Attention-based systems
Problems with VQA
- Reducing supervision One- and Few-shot learning
Classic unsupervised learning (See Chapter 2)
Self-supervised learning
Learning from noisy labels
- Vision and action
Active perception
Learning from ego-motion
Learning tasks in robotics