Cornell CS 6758

Description

Deep learning has become a pivotal force in recent robotics research advancements, from estimating the state of the world to solving long-horizon tasks in unseen environments. The new paradigm shifts from traditional feature and model engineering to learning task-relevant representations from raw data. This is fueled by increasingly more affordable hardware and diverse data sources from which algorithms may learn from. This graduate-level course examines how deep learning approaches have been applied to robotics problems, including various topics of perception and decision making. We will also discuss the recent trend of large-scale representation learning and foundation models for robotics.

Format

This course interleaves lectures and guided discussions. We will first spend a few lectures at the beginning of the semester to review the fundamentals of robot learning. Then, after each lecture on Thursday, we will read two papers and discuss them in class on the next Tuesday. Each discussion will be led by an assigned group of student presenters. Before each discussion, everyone in the class is expected to submit a short review of the required readings as homework. Another significant portion of the class comes from a semester-long project, where you will work in a team of 1-3 people on a research project that is related to the course topics.

Prerequisites

Machine learning: CS 4780 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to stochastic gradient descent and logistic regression, and pre-requisites such as probability theory, multivariable calculus, and linear algebra. Some familiarity with deep learning is recommended as the course will build on deep learning concepts such as backpropagation, convolutional networks, and other deep learning techniques.

Robotics: While it is not a hard requirement, we recommend you to come with some familiarity with basic concepts of robotic control, computer vision, and reinforcement learning. CS 5750, CS 4756, CS 5670, or equivalent would be preferred.

Staff

Kuan Fang

Instructor

kuanfang [at] cornell [dot] edu
Office Hours: Tue 4:00 pm - 5:00 pm at Gates 425

Yunhao Cao

Teaching Assistant

yc2579 [at] cornell [dot] edu
Office Hours: Thu 5:00 pm - 6:00 pm at Rhodes 400

Tentative Schedule

Date	Lecture	Suggested Reading
Week 1 Tue, 08/27	Introduction
Week 1 Thu, 08/29	Overview of Robot Perception and Control	[LP] Ch.2, Ch.3 [SB] Ch.3 Stanford CS231A Course Notes 1
Week 2 Tue, 09/03	Robot Learning Basics	[SB] Ch.13
Part I: Data Scaling
Week 2 Thu, 09/05	Exploration	Exploration Strategies in Deep RL
Week 3 Tue, 09/10	Autonomous Improvements [Paper 1] Self-Supervised Exploration via Disagreement (Pathak and Gandhi et al., 2019) [Paper 2] Reset-Free Reinforcement Learning via Multi-Task Learning: Learning Dexterous Manipulation Behaviors without Human Intervention (Gupta et al., 2021)
Week 3 Thu, 09/12	Offline Reinforcement Learning	Offline RL Survey
Week 4 Tue, 09/17	Robot Learning from Prior Data [Paper 1] Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills (Chebotar et al., 2021) [Paper 2] Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials (Kumar et al., 2022)
Week 4 Thu, 09/19	Physics Simulation and Sim-to-Real Transfer	Domain Randomization for Sim2Real Transfer Solving Rubik’s Cube with a Robot Hand
Week 5 Tue, 09/24	Robot Learning from Simulation [Paper 1] Dex-Net 2.0: Deep Learning to Plan Robust Grasps with Synthetic Point Clouds and Analytic Grasp Metrics (Mahler et al., 2017) [Paper 2] Learning Quadrupedal Locomotion over Challenging Terrain (Lee and Hwangbo at al., 2020)
Week 5 Thu, 09/26	Deep Generative Models	Tutorial on Variational Autoencoders Deep Generative Models in Robotics
Week 6 Tue, 10/01	Robot Learning from Imagination [Paper 1] Meta-Sim: Learning to Generate Synthetic Datasets (Kar et al., 2019) [Paper 2] Scaling Robot Learning with Semantically Imagined Experience (Yu et al., 2023)
Week 6 Thu, 10/03	Affordance Representations Project Proposal Signup Form Deadline: 11:59 pm	The Theory of Affordances
Week 7 Tue, 10/08	Robot Learning from Human Videos [Paper 1] MimicPlay: Long-Horizon Imitation Learning by Watching Human Play (Wang et al., 2023) [Paper 2] Affordances from Human Videos as a Versatile Representation for Robotics (Bahl and Mendonca et al., 2023)
Week 7 Thu, 10/10	Project Proposal Talk
Week 8 Tue, 10/15	<Fall Break>
Part II: Model Scaling
Week 8 Thu, 10/17	Sequence Modeling and Transformers Project Proposal Deadline: 11:59 pm
Week 9 Tue, 10/22	Transformer Policies [Paper 1] Decision Transformer: Reinforcement Learning via Sequence Modeling (Chen et al., 2021) [Paper 2] Gato: A Generalist Agent (Reed et al., 2022)
Week 9 Thu, 10/24	Vision-Language Models
Week 10 Tue, 10/29	Vision-Language-Action Models [Paper 1] RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control (Brohan et al., 2023) [Paper 2] VIMA: General Robot Manipulation with Multimodal Prompts (Jiang et al., 2023)
Week 10 Thu, 10/31	Diffusion Models
Week 11 Tue, 11/05	Control with Diffusion [Paper 1] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (Chi et al., 2023) [Paper 2] Planning with Diffusion for Flexible Behavior Synthesis (Janner and Du et al., 2023)
Week 11 Thu, 11/07	Representation Learning
Week 12 Tue, 11/12	Visual Representations for Robotics [Paper 1] R3M: A Universal Visual Representation for Robot Manipulation (Nair et al., 2022) [Paper 2] Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation (Shen and Yang et al., 2023)
Week 12 Thu, 11/14	Open-Vocabulary Perception
Week 13 Tue, 11/19	Open-World Robotics [Paper 1] Code as Policies: Language Model Programs for Embodied Control (Liang et al., 2022) [Paper 2] MOKA: Open-World Robotic Manipulation through Mark-Based Visual Prompting (Fang and Liu et al., 2024)
Part III: Frontiers
Week 13 Thu, 11/21	Guest Lecture: Prof. Dhruv Shah, Princeton University / Google Deepmind Talk Title: The Foundation Model Path to Open-World Robots
Week 14 Tue, 11/26	Guest Lecture: Prof. Guanya Shi, Carnegie Mellon University Talk Title: Building Generalist Robots with Agility via Learning and Control: Humanoids and Beyond
Week 14 Thu, 11/28	<Thanksgiving Break>
Week 15 Tue, 12/03	Project Spotlight Talk
Week 15 Thu, 12/05	Project Spotlight Talk
Week 16 Fri, 12/13	<No Class> Project Report Deadline: 11:59 pm

Learning Outcomes

Summarize how deep learning is applied for robot perception and decision making.
Explain and compare research paeprs in robot learning.
Identify limitations and weaknesses of prior work to suggest future work.
Apply deep learning to solve real-world robot problems.

Grading Policy

This course interleaves lectures and guided discussions. The course has no midterm or final exams. You will be graded on the basis of homework, class participation, and a course project. In each week, we will read 2 papers related to the previous lecture and discuss them in class. Before each lecture, you are expected to submit a short review of the required readings as homework. Each class will also have a group of presenters who are in charge of leading the discussion. Another significant portion of the grade comes from a semester-long project, where you can work in a team of 1-3 people on a research project that is related to the course topics. The final grade for the course will be tentatively based on the following weights:

Paper reviews (30%)

Write reviews for the papers selected for presentation (paper list is in the syllabus below). You are required to complete 10 paper reviews (based on your choice among the 20 papers that we will discuss) throughout the semester. If you submit more than 10 paper reviews, your grade will be computed based on the 10 reviews which get the highest scores. The review needs to be submitted the day before the presentation (Deadline: 11:59 pm). Please refer to this [guide] and [template] to learn how to write reviews for robot learning papers.

Paper presentation (20%)

An integral component of this course is to conduct a systematic literature review on robot learning research through student presentations and in-class discussions. You will be divided into presentation groups (each of 2-3 students) based on your preference of papers. Each group will present two papers during the semester. To ensure the quality and clarity of the presentations, we expect you to

Read the assigned papers thoroughly and gain a good understanding before making the presentation slides ([template]).
Email the slides and a list of open-ended questions on the topic to the TA and the instructor 5 days prior to the presentation date (e.g., for a presentation on Tuesday, the deadline is on the Thursday before that week) for feedback and revision (Deadline: 11:59 pm).

Failures to email the slides on time would incur a 20% deduction on the presentation score. Presentation for each paper should be 20min (± 2min). The presentations will be graded in the following aspects:

Clarity of presentation (problem formulation, key insights, proposed method, key results).
Presentation of the background material (basic concepts to understand the research improvement).
Review of prior work and the challenges addressed by this work.
Analysis of the strengths and weaknesses of the research.
Discussion of potential research extensions and applications.
Response to student questions.

After the presentation, we will do a 10 min Q&A about the presentation and then we will have a 20 min open-ended discussion. The slides of the presentations will be shared on the course webpage within one week of the presentation date.

Course project (40%)

The course project aims to help the students gain in-depth, hands-on experiences applying learning-based techniques to practical robot perception and decision making problems. It consists of these key milestones: a project proposal (5%), a proposal talk (5%), a final report (20%), and a spotlight talk (10%). The spotlight talk will be hosted in the week 15. Here is a list of potential project ideas worth invetigating, for your reference. You can also come up with any other ideas that you would like to pursue for the project.

In-class participation (10%)

You will get penalized if you miss more than 2 attendance-taking classes.

CS 6758: Deep Learning for Robotics (Fall 2024)

Cornell University

Lectures: Tue, Thu 8:40 am – 9:55 am, TBD