Learning for Robot Decision Making

Overview

Machine learning has made significant advances in many AI applications from language (e.g. ChatGPT) to vision (e.g. Diffusion models). However, it has fallen short when it comes to making decisions, especially for robots interacting within the physical world. Robot decision making presents a unique set of challenges - complexities of the real world, limited labeled data, hard physics constraints, safety aspects when interacting with humans, and more. This graduate-level course dives deep into these issues, beginning with the basics and traversing through the frontiers of robot learning. We look at:

Planning in continuous state-action spaces over long-horizons with hard physical constraints.
Imitation learning from various modes of interaction (demonstrations, interventions) as a unified, game-theoretic framework.
Practical reinforcement learning that leverages both model predictive control and model-free methods.
Frontiers such as offline reinforcement learning, LLMs, diffusion policies and causal confounds.

Prerequisites

For graduates: This course is open to both CS and non CS PhD and MS. For non CS PhD and MS students, please add yourself to the waitlist or send an email to cs-course-enroll@cornell.edu. For undergraduates: A prerequisite is Machine Learning (CS 4780). Students should have a solid background in linear algebra and probability. This course involves implementing state of the art algorithms on real world datasets and simulators. Hence, strong familiarity with Python (CS 2110) and neural network libraries (Pytorch) is required.

Updates from Fall 2022

Checkout previous year's course here: Fall 2022. This year, we will introduce new lectures that talk about brand new algorithms (in inverse RL, model-based RL), make new connections to other areas of AI (LLMs, diffusion models) and examine cutting edge results (offline RL, representation learning)!

Schedule (Tentative)

Date	Lecture	Preread	Resources
08/22/23	Introduction: How should robots learn to make good decisions? [slides] (Assignment 0 released)
08/24/23	Interactive Online Learning [slides, notes]	Shai Shalev-Shwartz (Pg.108-111)	Arora et al. "Multiplicative Weights", Generalized Weighted Majority video
	Planning
08/29/23	Markov Decision Process (Assignment 1 Released) [slides, notes]	MACRL Ch. 1	Dan Klein's slides I
08/31/23	Linear Quadratic Regulator: The Analytic MDP [slides, notes]	MACRL (Ch 2, Pg. 23-27)	Underactuated robotics, Ch. 8, History of Optimal Control
09/07/23	Iterative Linear Quadratic Regulator [slides, notes]	MACRL (Ch 2, Pg. 28-33)	iLQR paper , DDP for helicopter flight
09/12/23	Solving Hard MDPs: Constraints, Long Horizons, and more! [slides]	MACRL (Ch 4)	Gordon's notes on Lagrange, ALTRO: AuLa + iLQR,
	Imitation Learning
09/14/23	Imitation Learning: Feedback and Covariate Shift [slides, notes]	MACRL (Ch 6, Pg. 53-57)	Three regimes of covariate shift
09/19/23	DAgger: Interactive Experts and No-Regret Learning [slides, notes]	MACRL (Ch 6, full)	DAGGER , Agnostic SysId
09/21/23	Imitation Learning as Inferring Latent Expert Values [slides, notes]		EIL, Youtube lec
09/26/23	Learning from Interventions [slides, notes]		EIL, Youtube lec
09/28/23	Inverse Reinforcement Learning: From Maximum Margin to Maximum Entropy (Assignment 2 Released) [slides, notes1, notes2, notes3]	MACRL (Ch 7)	LEARCH , MaxEntIRL , Youtube lec
	Reinforcement Learning
10/03/23	Temporal Difference, Q-learning [slides]	MACRL (Ch 9, full)	Sutton&Barto (Ch. 5, 6) , DQN , Rainbow DQN
10/05/23	Approximate Dynamic Programming : Fitted value / policy iteration [slides, notes]	MACRL (Ch 8, full)	Sutton&Barto (Ch. 5, 6) , DQN , Rainbow DQN
10/12/23	Policy Search and Black-box Policy Optimization [slides]	MACRL (Ch 10, full)
10/17/23	Nightmares of Policy Optimization [slides, notes]	MACRL (Ch 11, full)
10/19/23	Actor Critic Methods [slides, notes]
10/24/23	Model-based Reinforcement Learning [slides] (Assignment 3 Released)		Virtue of Laziness in MBRL
10/26/23	Principle of Maximum Entropy I [slides, slides] (Extended Abstracts due)
10/31/23	Principle of Maximum Entropy II [slides]
11/02/23	Dealing with Uncertainty [slides, slides]
	Frontiers
11/07/23	Offline Reinforcement Learning: Pessimism [slides]		Offline RL Tutorial , CQL , ATAC ,
11/14/23	Decision Transformers [slides]		Decision Transformer , You can't count on luck , When does return conditioned supervised learning work? ,
11/16/23	Diffusion Models and Imitation Learning [slides]		Lilian Weng's blog , Yang Song's blog , Markup to diffusion ,
11/21/23	Multi-agent Forecasting and Imitation Learning [slides]		Drew's forecasting blog , MotionLM , Rethinking forecast
11/28/23	Large Language Models and Task Planning slides]		SayCan , Code-as-Policies , Demo2Code
11/30/23	Visuomotor Skill Learning
12/05/23	Causal Representation Learning
12/12/23	Project presentations
12/14/23	Project presentations

Assignments and Final Project

There will be a total of 3 assignments, each involving a programming component and some theory. All assignments must be done individually. As the course progresses, we will release each assignment in the links below with starter code on Github.

Assignment 0: Intro Survey
Assignment 1: iLQR
Assignment 2: Robot Policy Learning
Assignment 3: Inverse Reinforcement Learning

There will also be a final project. This is your chance to get creative and apply what you have learned! For the project, you may work in groups of up to two people. There will be three deliverables - an extended abstract, a final report and a final presentation. The abstract and report should be in NeurIPS format. You are welcome to select any topic that may be relevant to your research, an open problem of interest or from a list of potential projects that we will share.

Resources

Technology

Course Website: The ONE true hub for all information. Please check this frequently and surface any errors or sources of confusion.
Ed: The discussion forum where all announcements are sent, where all student-TA and student-student communications occur.
Gradescope: Where all assignments and projects are submitted.
Canvas: Limited to no use.

Code

Python Notebooks for CS6756: A series of notebooks used in the lectures that are useful for building intuition and learning to code.
Python + Numpy tutorial: An excellent, concise getting started guide for Python and Numpy from CS231@Stanford.
Pytorch tutorial: A 60 minutes Blitz! (for assignment 2 onwards).

Relevant Textbooks

The course is extensively based off of the following book:

Modern Adaptive Control and Reinforcement Learning, James A. Bagnell, Byron Boots, and Sanjiban Choudhury

This a live book that is constantly being updated. Periodically check this link for newer versions. We would love your feedback, please email them to sanjibanc@cornell.edu.

Other helpful books and notes:

Reinforcement Learning: An Introduction, Richard S. Sutton and Andrew G. Barto
Probabilistic Robotics, Sebastian Thrun, Wolfram Burgard and Dieter Fox
Probability Theory: The Logic of Science,, E.T. Jaynes

Courses / Lectures

Imitation Learning: A Series of Deep Dives, Sanjiban Choudhury
Interactive Online Learning: A Unified Algorithmic Framework, Sanjiban Choudhury

Staff

Sanjiban Choudhury

Instructor

sanjibanc@cornell.edu

Office Hours:
Tue 11.30 - 1.30 pm Gates 413B

Kushal Kedia

Teaching Assistant

kk837@cornell.edu

Office Hours:
Thurs 12.30 - 2.30 pm Rhodes 402

Assignments, lectures, and ideas on this syllabus are partially adapted from Drew Bagnell course at Carnegie Mellon University. We thank Drew for insightful discussions and suggestions for how to structure the course.

Syllabus

Learning Outcomes

Formulate various robot decision making problems, e.g. robot manipulation, self-driving, assistive robots, as a Markov Decision Problem (MDP).
Solve different types of MDPs by applying appropriate techniques, e.g. model predictive control (iLQR), value / policy iteration, black-box policy search.
When a MDP is unknown, apply appropriate learning techniques, e.g. imitation learning, model-free / model-based reinforcement learning.
Analyze sample-complexity and performance bounds for various robot learning algorithms using techniques from no-regret online learning.
Develop, evaluate and deploy robot learning algorithms in various robotics applications.

Grading Policy

Here’s a breakdown of grades:

Component	Details	%Grade
Assignments	3 assignments, each 15%	45%
Final Project	Extended Abstract: 5%, Final Report: 20%, Final Presentation: 15%	45%
Participation	In-class participation and Ed discussions	10%
Total		100%

Assignments must be done individually. Each assignment will require students to turn in a writeup and code in Gradescope. It is acceptable for students to discuss problems with each other; it is not acceptable for students to share answers or code. Please indicate on each homework with whom you collaborated with and what online resources you used.

The final project can be done in groups of up to 2. There are three deliverables - an extended abstract, a final report, and a final presentation. The abstract and report will should NeurIPS format. We will share the rubric for how these will be evaluated in due time, but they will roughly be along the lines of NeurIPS reviewer guidelines. For groups of more than one, we will expect a short paragraph to explain the role of each group member along with the final report. We will also have a best paper award as judged by your peers!

Research has demonstrated that the best learning occurs when the learner is actively involved. We will have frequent opportunities for students to work together during lectures. We expect you to come to class prepared to focus, interact with classmates, and participate in the activities. We also expect you to participate in discussions on Ed and create an engaging environment.

Late Policy

Assignments must be submitted by the posted due date. You are allowed up to 3 total LATE DAYs for any deliverable throughout the entire semester. If you exceed the late days, assignments will incur a reduction in score by 33% for each extra day. The final presentation must be presented on time, no late policy applies. Regrade requests, if the case is strong and a significant number of points are at stake, should be submitted via Gradescope one week of when a deliverable is returned to the student. You must provide a justification for the regrade request.

In case of a legitimate situation or medical emergency that arises during the semester that is going to hinder your ability to complete the work on time, contact Prof. Choudhury as soon as possible. Extensions (beyond the already assigned slip days) will be granted only in exceptional circumstances, such as documented illness, not for situations such as job interviews or large workloads in other courses.

Academic Integrity

This course adheres to all aspects of Cornell's Code of Academic Integrity. Any work presented as your own must be your own, with no exceptions tolerated. All violations of this policy will result in a penalty depending on the severity. The penalty may be a failing grade on the relevant assignment or exam, or a failing grade in the class. The code can be found at: http://cuinfo.cornell.edu/aic.cfm

Generative AI

The work you do for CS 6756 consists of writing code and natural language descriptions. To some extent, the new crop of “generative AI” (GAI) tools can do both of these things for you. However, we require that the vast majority of the intellectual work must be originated by you, not by GAI. You may use GAI to look up helper functions, or to proofread your text, but clearly document how you used it.

In this class, for every assignment and final project, you can choose between two options:

Option 1: Avoid all GAI tools. Disable GitHub Copilot in your editor, do not ask chatbots any questions related to the assignment, etc. If you choose this option, you have nothing more to do.

Option 2: Use GAI tools with caution and include a one-paragraph description of everything you used them for along with your writeup. This paragraph must:

Link to exactly which tools you used and describe how you used each of them, for which parts of the work.
Give at least one concrete example (e.g., generated code or Q&A output) that you think is particularly illustrative of the “help” you got from the tool.
Describe any times when the tool was unhelpful, especially if it was wrong in a particularly hilarious way.
Conclude with your current opinion about the strengths and weaknesses of the tools you used for real-world compiler implementation.

Remember that you can pick whether to use GAI tools for every assignment, so using them on one set of tasks doesn’t mean you have to keep using them forever.

Below we provide some guidelines for what is / is not ok when using GAI for this class:

Example of something that is allowed: You write the initial code / writeup. You then use GAI to debug the code / improve writing flow. You do not use the system's output to add extra content.
Example of something that is definitely not allowed: You essentially use GAI to generate most of the code / writeup, even if you later post-edit and correct the output.
Example of something that is OK but requires special treatment: You start with procedure in 1. But, the GAI suggests good points that you hadn’t thought of before, or makes you realize that a point you had made isn’t quite right. You may include this new material, but follow the guidelines above to document the use.

Diversity, Equity and Inclusion

Students in this course come from a variety of backgrounds, abilities, and identities. In order to ensure an environment conducive to learning, all members of the course must treat one another and the course staff with respect. If you feel your needs are not being adequately accommodated by the other students or instruction staff, please contact Prof. Choudhury.

COVID-19 related issues

Zoom recordings of lecturers are not available for absences, including absences due to illness.

If you are a close contact with someone who is diagnosed with COVID19, even if you do not experience symptoms, you should test yourself [and you may want to request masking for five days after exposure regardless of the outcome].

If you have symptoms of COVID19 and have not been tested:

Do not come to class.
Email Dr.Choudhury before class starts to let [me, TA] know that you are not coming.
Get an antigen test.
If the test is negative, you may return to the next class. Please wear a mask until your symptoms are gone, even if you test negative.
If your antigen test is positive, you must immediately upload the result to Daily Check: dailycheck.cornell.edu. This action will trigger instructions and a letter of temporary accommodation. You must forward the temporary accommodations email to Dr. Choudhury to receive an accommodation. (The system will not send it for you.) Once Dr. Choudhury receive the letter, we will provide guidance on how you should keep up with material for the next 5 days.
You may return to class on day 6 provided you are asymptomatic. You must wear a mask through the end of day 10 from your first onset of symptoms.

Accomodations

If you have a disability-related need for reasonable academic adjustments in this course, please reach out to Student Disability Services to guide us through next steps. If you are experiencing undue personal or academic stress during the semester, we encourage you to reach out to the instructor for support.

We encourage you to check out the comprehensive set of resources compiled by EARS, Reflect, Cornell Minds Matter, and Body Positive Cornell: Cornell Mental Health Resources Guide 2022-23

CS 6756, Cornell University, Fall 2023