CS 6756: Advanced Topics in Robotics: 3D Perception (2011)
See Fall 2013 offerng of CS6756.
Instructor: Ashutosh Saxena.
Tue, Thu: 1:30-2:30pm.
This course focuses on learning techniques for 3D perception with
inexpensive RGBD cameras such as Kinect that give 3D data in addition to an
image. We will study machine-learning algorithms (such as graphical models,
sampling-based inference methods, max-margin learning) for perceiving the environment, which includes
performing object detection, semantic labeling of the environment and human
activity recognition. Particular motivating examples include
scene understanding, personal robotics, and self-operating cars.
Class webpage: http://www.cs.cornell.edu/~asaxena/cs6756/
Discussion, announcements: http://www.piazza.com/class#fall2011/cs6756/.
You should be a PhD student. Seniors and Masters students could take the course with prior
- Knowledge of machine learning (e.g., CS 4758/6758, CS 4780 or CS 6780).
- Knowledge of basic computer vision is recommended, but not necessary.
- Knowledge of ROS (Robot Operating System) and C++/Python or Matlab is recommended.
This is a research class. Students are expected to do a project with the professor
on topics of mutual interest. (The project could be combined with research credits
or graduate research with instructor permission, but not with another class.)
Project proposal presentations (5-10 min long) on Sep 6 and 8. Project milestone presentations
on Oct 13 and 18. Final project report due in mid December (no formal presentation requirement).
- Project: 45%. (Separate grading for midterm milestone and final project.)
- Paper reading/discussion/presentation: 25%.
- One 24-hour take home prelim in November: 20%.
- Class participation and others (TBD): 10%.
- No homeworks!
Every Tuesday lecture will consist of a 40 minute long talk by the instructor, which will be followed by discussion.
The Thursday lecture will consist of a 30 minute long paper presentation by a student, followed by
List of papers/topics
Please join Piazza for the updated list and discussion.
- Overview of Point Cloud Library (PCL).
- What could be done with 3D data?
- 3D from a RGB Image.
- Saxena, A., Chung, S. and Ng, A.Y. (2005) Learning Depth from Single Monocular Images, NIPS.
- Hoiem, D., Efros, A.A. and Hebert, M. (2005) Geometric Context from a Single Image, ICCV.
- Multi-view approaches in computer vision.
- Inferring Semantic Information from RGBD images.
- X. Xiong and D. Huber, “Using context to create semantic 3d
models of indoor environments,” in BMVC, 2010.
- Alvaro Collet, Siddhartha Srinivasa, Martial Hebert, Structure Discovery in Multi-modal Data: a Region-based Approach IEEE International Conference on Robotics and Automation (ICRA'11), May, 2011.
- Graphial Models in 3D Perception
- Graphical Models Review
- Inference: Particle-based/sampling methods.
- Large-margin based learning methods.
- Max-Margin Markov Networks, B. Taskar, C. Guestrin and D. Koller. Neural Information Processing Systems Conference (NIPS03), Vancouver, Canada, December 2003.
- T. Finley, T. Joachims, Training Structural SVMs when Exact Inference is Intractable, Proceedings of the International Conference on Machine Learning (ICML), 2008.
- Deep Learning Methods
- 3D Features.
- Edward Hsiao, Alvaro Collet and Martial Hebert. Making specific features less discriminative to improve point-based 3D object recognition. IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) , June, 2010.
- Human Pose from RGBD data.
- Real-Time Human Pose Recognition in Parts from a Single Depth
Image , Jamie Shotton, Andrew Fitzgibbon, Mat Cook, Toby Sharp, Mark
Finocchio, Richard Moore, Alex Kipman, and Andrew Blake , CVPR 2011.
- 3D Scenes and Humans.
- Abhinav Gupta, Scott Satkin, Alexei A. Efros and Martial Hebert, From 3D Scene Geometry to Human Workspace, Computer Vision and Pattern Recognition, 2011.