CS6670 - Computer Vision

Picture credit: Magritte and some computer vision researchers

Quick info

Instructor: Bharath Hariharan
Lecture time: Tues. and Thurs. 1:25pm - 2:40pm
Lecture venue: Phillips Hall 219
TA: Davis Wertheimer

Office Hours:
Bharath: Wed and Fri 3 pm (at 311 Gates Hall)
Davis: Tues 11 am (G17 Gates Hall)

Objective:

This course will serve as an introduction to computer vision for anyone who wants to do research in this area. It will cover both fundamentals which underly classic techniques, as well as the problems the community is currently working on, and modern techniques being used to solve these problems. A tentative list of topics is below:

Geometry / Physics of image formation
Properties of images and basic image processing
3D reconstruction from multiple images
Grouping (of image pixels into objects)
Machine learning in computer vision: basics, hand-designed feature vectors, convolutional networks
Detecting and localizing objects using convolutional networks
Combining machine learning and geometric reasoning
Current frontiers: unsupervised learning, vision for action etc.

Prerequisites:

This course is intended for graduate students starting out in computer vision research. As such it will assume that students are mathematically mature, and are comfortable with:

Linear algebra
Basic probability and statistics

In addition, familiarity with basic machine learning will be useful but is not required.

Guidelines for project proposal:

The project proposal should be approximately one page in length. As a rule of thumb, it should spend about one or two paragraphs each on:

Problem statement / motivation
Background / related work, with at least 4 citations
Proposed approach
Planned / expected experiments

Lectures / Notes:

Reference (for the first part of the course): Rick Szeliski's book. This is not a textbook, in that it covers a lot more material in a lot more detail, but can be used for additional reading. Below is the (tentative) list of classes, with possible additional readings. These may change as the semester progresses.

Date	Topic (with linked notes / slides)	Additional reading
Aug 22	Introduction
Aug 24	Image Formation - Geometry
Aug 29	All about rotations \| Image formation - color
Aug 31	Reconstruction - I
Sep 5	Reconstruction - II (Epipolar Geometry)
Sep 7	The correspondence problem
Sep 12	Optical flow	Szeliski 8.4
Sep 14	Grouping	Contour detection Graph-based segmentation (Szeliski 5.4, 5.5) Segmentation for object proposals (Selective search)
Sep 19	Introduction to machine learning Example case: logistic regression Empirical risk minimization Classical (pre-convnet) recognition	Bag-of-words, Spatial pyramids
Sep 21	Non-linear classifiers and Neural networks Convolutional networks	Deformable part models MNIST (Sections I, II and III. Also read the rest and contemplate cyclical nature of research)
Sep 26	Backpropagation and computation graphs Image classification	ImageNet
Sep 28	Transfer learning Convolutional network architectures	Transfer learning (Many examples) VGG16, VGG19, 3x3 convolutions Batch normalization Highway networks Residual networks
Oct 3	Object detection Datasets and metrics	R-CNN Fast R-CNN Faster R-CNN SSD
Oct 5	Semantic segmentation Datasets and metrics	FCN, skip connections Dilated convolutions, CRFs
Oct 12	Instance segmentation Pose Estimation Datasets and metrics	Dataset, metrics, segmentation as region classification Hypercolumns / skip connections, segmentation as detection refinement Instance segmentation using FCNs Heatmap representations, graphical model based refinement Sequential prediction, autocontext and inference machines Hourglass architectures
Oct 17	Learning for 3D Datasets and metrics	Rigid body pose estimation Deep stereo Learning to correspond for stereo Depth estimation from a single image Normal estimation from a single image
Oct 19	Learning correspondence	Learning optical flow from simulated data Learning from hallucinated data Learning from constraints
Oct 31	Detour: Writing Video recognition Datasets and metrics	Video classification as frame+flow classification CNN+LSTM 3D convolution I3D
Nov 2	Vision and language	Captioning Visual question answering Attention-based systems Problems with VQA
Nov 7	Reducing supervision One- and Few-shot learning	Classic unsupervised learning (See Chapter 2) Self-supervised learning Learning from noisy labels
Nov 9	Vision and action Active perception	Learning from ego-motion Learning tasks in robotics
Nov 14	GANs	Generative Adversarial Networks CycleGANs
Nov 16	Adversarial examples and interpreting convnets	Adversarial examples
Nov 30	Taking inspiration from biology	Invariance in biological vision Comparing classical computer vision with the brain Comparing deep networks with the brain The development of embodied cognition