CS 6756: Advanced Topics in Robotics: 3D Perception

CS 6756: Advanced Topics in Robotics: 3D Perception (2011)

See Fall 2013 offerng of CS6756.

Instructor: Ashutosh Saxena.
Tue, Thu: 1:30-2:30pm.
Upson 315.

This course focuses on learning techniques for 3D perception with inexpensive RGBD cameras such as Kinect that give 3D data in addition to an image. We will study machine-learning algorithms (such as graphical models, sampling-based inference methods, max-margin learning) for perceiving the environment, which includes performing object detection, semantic labeling of the environment and human activity recognition. Particular motivating examples include scene understanding, personal robotics, and self-operating cars.

Class webpage: http://www.cs.cornell.edu/~asaxena/cs6756/
Discussion, announcements: http://www.piazza.com/class#fall2011/cs6756/.

Pre-requisities

You should be a PhD student. Seniors and Masters students could take the course with prior instructor approval.

Knowledge of machine learning (e.g., CS 4758/6758, CS 4780 or CS 6780).
Knowledge of basic computer vision is recommended, but not necessary.
Knowledge of ROS (Robot Operating System) and C++/Python or Matlab is recommended.

Grading

This is a research class. Students are expected to do a project with the professor on topics of mutual interest. (The project could be combined with research credits or graduate research with instructor permission, but not with another class.)

Project: 45%. (Separate grading for midterm milestone and final project.)
Paper reading/discussion/presentation: 25%.
One 24-hour take home prelim in November: 20%.
Class participation and others (TBD): 10%.
No homeworks!

Project proposal presentations (5-10 min long) on Sep 6 and 8. Project milestone presentations on Oct 13 and 18. Final project report due in mid December (no formal presentation requirement).

Syllabus

Every Tuesday lecture will consist of a 40 minute long talk by the instructor, which will be followed by discussion. The Thursday lecture will consist of a 30 minute long paper presentation by a student, followed by discussion.

List of papers/topics

Please join Piazza for the updated list and discussion.

Overview of Point Cloud Library (PCL).
What could be done with 3D data?
3D from a RGB Image.
- Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS.
- Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV.
Multi-view approaches in computer vision.
Inferring Semantic Information from RGBD images.
- X. Xiong and D. Huber, “Using context to create semantic 3d models of indoor environments,” in BMVC, 2010.
- Alvaro Collet, Siddhartha Srinivasa, Martial Hebert, Structure Discovery in Multi-modal Data: a Region-based Approach IEEE International Conference on Robotics and Automation (ICRA'11), May, 2011.
Graphial Models in 3D Perception
- Graphical Models Review
- Inference: Particle-based/sampling methods.
Large-margin based learning methods.
- Max-Margin Markov Networks, B. Taskar, C. Guestrin and D. Koller. Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada, December 2003.
- T. Finley, T. Joachims, Training Structural SVMs when Exact Inference is Intractable, Proceedings of the International Conference on Machine Learning (ICML), 2008.
Deep Learning Methods
3D Features.
- Edward Hsiao, Alvaro Collet and Martial Hebert. Making specific features less discriminative to improve point-based 3D object recognition. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) , June, 2010.
Human Pose from RGBD data.
- Real-Time Human Pose Recognition in Parts from a Single Depth Image , Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark Finocchio, Richard Moore, Alex Kipman, and Andrew Blake , CVPR 2011.
3D Scenes and Humans.
- Abhinav Gupta, Scott Satkin, Alexei A. Efros and Martial Hebert, From 3D Scene Geometry to Human Workspace, Computer Vision and Pattern Recognition, 2011.

Useful links

PCL
OpenCV
ROS
RGB-D workshop at RSS
Main conferences: NIPS 2010 proceedings, CVPR 2011 proceedings, RSS 2011 proceedings.
Open Kinect, ROS openni_kinect.