Advances in machine learning have fueled progress towards deploying real-world robots from assembly lines to self-driving. Learning to make better decisions for robots presents a unique set of challenges. Robots must be safe, learn online from interactions with the environment, and predict the intent of their human partners. This graduate-level course dives deep into the various paradigms for robot learning and decision making. We look at:
 
| Date | Lecture | Preread | Resources | |
|---|---|---|---|---|
| 08/23/22 | Introduction to Robot Learning [slides, notes] | Trailer | ||
| Fundamentals | ||||
| 08/25/22 | Interactive Online Learning [slides, notes] | Shai Shalev-Shwartz (Pg.108-111) | Arora et al. "Multiplicative Weights", Generalized Weighted Majority video | |
| 08/30/22 | Markov Decision Process I [slides, notes, python notebook] (Assignment 1 Released) | MACRL (Pg.9-12) | Dan Klein's slides I | |
| 09/01/22 | Markov Decision Process II [slides, notes, python notebook] | MACRL (Ch 5) | Dan Klein's slides II | |
| Model Predictive Control | ||||
| 09/06/22 | Linear Quadratic Regulator: The Analytic MDP [slides, notes, python notebook] | MACRL (Ch 2, Pg. 23-27) | Underactuated robotics, Ch. 8, History of Optimal Control | |
| 09/08/22 | Iterative Linear Quadratic Regulator [slides, notes] | MACRL (Ch 2, Pg. 28-33) | iLQR paper , DDP for helicopter flight | |
| 09/13/22 | Constraints and Games [slides, notes] | MACRL (Ch 4) | Gordon's notes on Lagrange, ALTRO: AuLa + iLQR, | |
| 09/15/22 | Practical Model Predictive Control [slides, notes] | Full scale helicopter flight paper | ||
| Imitation Learning | ||||
| 09/20/22 | Imitation Learning: Feedback and Covariate Shift [slides, notes] (Assignment 2 Released) | MACRL (Ch 6, Pg. 53-57) | Three regimes of covariate shift | |
| 09/22/22 | DAgger: A Reduction to No-Regret Learning [slides, notes] | MACRL (Ch 6, full) | DAGGER , Agnostic SysId | |
| 09/27/22 | Imitation Learning as Inferring Latent Expert Values [slides, notes] | PDL Proof | EIL, HG-DAGGER , Youtube lec | |
| 09/29/22 | Inverse Reinforcement Learning: From Maximum Margin to Maximum Entropy [slides, notes] | MACRL (Ch 6, full) | LEARCH , MaxEntIRL , Youtube lec | |
| 10/04/22 | Distribution Matching, Maximum Entropy, GANs, and all that [slides, notes] | MACRL (Ch 7, full) | Guided cost learning , f-divergence IL , Youtube lec | |
| 10/06/22 | Imitation Learning: The Big Picture [ slides , notes] | Of Moments and Matching , Youtube lec | ||
| Reinforcement Learning | ||||
| 10/13/22 | Reinforcement Learning: From Games to Robotics [ slides , notes] | |||
| 10/18/22 | Temporal Difference Learning (Assignment 3 Released) [ slides , notes] | MACRL (Ch 9, full) | Sutton&Barto (Ch. 5, 6) , DQN , Rainbow DQN | |
| 10/20/22 | Approximate Dynamic Programming [ slides , notes] | MACRL (Ch 8, full) | ||
| 10/25/22 | Black-box vs White-box Policy Optimization [ slides , notes] | MACRL (Ch 10, full) | ||
| 10/27/22 | Halloween Special: Nightmares of Policy Optimization [ slides , notes] | MACRL (Ch 11, full) | ||
| 11/01/22 | Actor Critic Methods [ slides , notes] | MACRL (Ch 11, full) | ||
| 11/03/22 | Planning with Inaccurate Models [ slides , notes] | |||
| 11/08/22 | Dealing with Uncertainty I [ slides , notes] (Extended Abstracts Due) | |||
| 11/10/22 | Dealing with Uncertainty II [ slides , notes] | |||
| 11/15/22 | Learning for Robot Decision Making Recap [ slides , notes] | |||
| Open Challenges | ||||
| 11/17/22 | Causal Confounds in Sequential Decision Making (Guest Lecture by Gokul Swamy) [ slides ] | |||
| 11/22/22 | Interactive Forecasting (Sanjiban) [ slides , notes] | |||
| 11/29/22 | Offline Reinforcement Learning (Dhruv) | |||
| 12/01/22 | No class | |||
| 12/06/22 | No class | |||
| 12/08/22 | No class | |||
| 12/13/22 | Project presentations | |||
| 12/15/22 | Project presentations | 
There will be a total of 3 assignments, each involving a programming component and some theory. All assignments must be done individually. As the course progresses, we will release each assignment in the links below with starter code on Github.
There will also be a final project. This is your chance to get creative and apply what you have learned! For the project, you may work in groups of up to two people. There will be three deliverables - an extended abstract, a final report and a final presentation. The abstract and report will should NeurIPS format. You are welcome to select any topic that may be relevant to your research, an open problem of interest or from a list of potential projects that we will share. We will also have a best paper award as judged by your peers!
The course is extensively based off of the following book:
Other helpful books and notes:
|   | Sanjiban ChoudhuryInstructorsanjibanc@cornell.eduOffice Hours: Tue 11-12 pm, Thurs 12.30 - 1.30 pm Gates 413B |   | Dhruv SreenivasTeaching Assistantds844@cornell.eduOffice Hours: Mon / Wed 11-12 pm, Rhodes 400 | 
Assignments, lectures, and ideas on this syllabus are partially adapted from Drew Bagnell course at Carnegie Mellon University. We thank Drew for insightful discussions and suggestions for how to structure the course.
For graduates: This course is open to both CS and non CS PhD and MS. For non CS PhD and MS students, please add yourself to the waitlist or send an email to cs-course-enroll@cornell.edu. For undergraduates: A prerequisite is Machine Learning (CS 4780). Students should have some background in linear algebra and probability. Familiarity with Python and neural network libraries (Pytorch, TensorFlow) is required.
Here’s a breakdown of grades:
| Component | Details | %Grade | 
|---|---|---|
| Assignments | 3 assignments, each 15% | 45% | 
| Final Project | Extended Abstract: 5%, Final Report: 20%, Final Presentation: 15% | 45% | 
| Participation | In-class participation and Ed discussions | 10% | 
| Total | 100% | 
Assignments must be done individually. Each assignment will require students to turn in a writeup and code in Gradescope. It is acceptable for students to discuss problems with each other; it is not acceptable for students to share answers or code. Please indicate on each homework with whom you collaborated with and what online resources you used.
The final project can be done in groups of up to 2. There are three deliverables - an extended abstract, a final report, and a final presentation. The abstract and report will should NeurIPS format. We will share the rubric for how these will be evaluated in due time, but they will roughly be along the lines of NeurIPS reviewer guidelines. For groups of more than one, we will expect a short paragraph to explain the role of each group member along with the final report. We will also have a best paper award as judged by your peers!
Research has demonstrated that the best learning occurs when the learner is actively involved. We will have frequent opportunities for students to work together during lectures. We expect you to come to class prepared to focus, interact with classmates, and participate in the activities. We also expect you to participate in discussions on Ed and create an engaging environment.
Assignments must be submitted by the posted due date. You are allowed up to 3 total LATE DAYs for any deliverable throughout the entire semester. Any assignment turned in late will incur a reduction in score by 33% for each late day. The final presentation must be presented on time, no late policy applies. Regrade requests, if the case is strong and a significant number of points are at stake, should be submitted online via a private post on Ed within one week of when a deliverable is returned to the student. You must provide a justification for the regrade request.
In case of a legitimate situation or medical emergency that arises during the semester that is going to hinder your ability to complete the work on time, contact Prof. Choudhury as soon as possible. Extensions (beyond the already assigned slip days) will be granted only in exceptional circumstances, such as documented illness, not for situations such as job interviews or large workloads in other courses.
This course adheres to all aspects of Cornell's Code of Academic Integrity. Any work presented as your own must be your own, with no exceptions tolerated. All violations of this policy will result in a penalty depending on the severity. The penalty may be a failing grade on the relevant assignment or exam, or a failing grade in the class. The code can be found at: http://cuinfo.cornell.edu/aic.cfm
Students in this course come from a variety of backgrounds, abilities, and identities. In order to ensure an environment conducive to learning, all members of the course must treat one another and the course staff with respect. If you feel your needs are not being adequately accommodated by the other students or instruction staff, please contact Prof. Choudhury.
For students becoming ill or needing to quarantine during the semester, we will address your needs on a case-by-case basis. Please contact Dr. Choudhury if you have any concerns.
If you have a disability-related need for reasonable academic adjustments in this course, please reach out to Student Disability Services to guide us through next steps. If you are experiencing undue personal or academic stress during the semester, we encourage you to reach out to the instructor for support.
We encourage you to check out the comprehensive set of resources compiled by EARS, Reflect, Cornell Minds Matter, and Body Positive Cornell: Cornell Mental Health Resources Guide 2022-23