Cornell CS 6758

Description

Deep learning has become a pivotal force in recent robotics research advancements, from estimating the state of the world to solving long-horizon tasks in unseen environments. The new paradigm shifts from traditional feature and model engineering to learning task-relevant representations from raw data. This is fueled by increasingly more affordable hardware and diverse data sources from which algorithms may learn from. This graduate-level course examines how deep learning approaches have been applied to robotics problems, including various topics of perception and decision making. We will also discuss the recent trend of large-scale representation learning and foundation models for robotics.

Format

This course interleaves lectures and guided discussions. We will first spend a few lectures at the beginning of the semester to review the fundamentals of robot learning. Then, after each lecture on Thursday, we will read two papers and discuss them in class on the next Tuesday. Each discussion will be led by an assigned group of student presenters. Before each discussion, everyone in the class is expected to submit a short review of the required readings as homework. Another significant portion of the class comes from a semester-long project, where you will work in a team of 1-3 people on a research project that is related to the course topics.

Prerequisites

Machine learning: CS 4780 or equivalent is a prerequisite. We will be assuming knowledge of concepts including, but not limited to stochastic gradient descent and logistic regression, and pre-requisites such as probability theory, multivariable calculus, and linear algebra. Some familiarity with deep learning is recommended as the course will build on deep learning concepts such as backpropagation, convolutional networks, and other deep learning techniques.

Robotics: While it is not a hard requirement, we recommend you to come with some familiarity with basic concepts of robotic control, computer vision, and reinforcement learning. CS 5750, CS 4756, CS 5670, or equivalent would be preferred.

Staff

Kuan Fang

Instructor

kf382 [at] cornell [dot] edu
Office Hours: Thu 3:00 pm - 4:00 pm at CIS 463

Qianxu Wang

Teaching Assistant

qw325 [at] cornell [dot] edu
Office Hours: Wed 4:00 pm - 5:00 pm at Gates G15

Tentative Schedule

Date	Lecture	Suggested Reading
Week 1 Tue, 08/26	Introduction [slides]
Week 1 Thu, 08/28	Robot Learning Overview [slides]	[LP] Ch.2, Ch.3 [SB] Ch.3 Stanford CS231A Course Notes 1 [SB] Ch.13
Week 2 Tue, 09/02	Paper Reading and Project Planning [slides]
Part I: Data Scaling
Week 2 Thu, 09/04	Offline Reinforcement Learning [slides]	Offline RL Survey
Week 3 Tue, 09/09	Learning from Prior Data Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills (Chebotar et al., 2021) [slides] Pre-Training for Robots: Offline RL Enables Learning New Tasks from a Handful of Trials (Kumar et al., 2022) [slides]
Week 3 Thu, 09/11	Sim-to-Real Transfer [slides]
Week 4 Tue, 09/16	Learning from Simulation Learning Agile Robotic Locomotion Skills by Imitating Animals (Peng et al., 2020) [slides] Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation (Maddukuri et al., 2025) [slides]
Week 4 Thu, 09/18	Generative Models [slides]	Tutorial on Variational Autoencoders Deep Generative Models in Robotics
Week 5 Tue, 09/23	Learning from Imagination Meta-Sim: Learning to Generate Synthetic Datasets (Kar et al., 2019) [slides] Scaling Robot Learning with Semantically Imagined Experience (Yu et al., 2023) [slides]
Week 5 Thu, 09/25	Affordance Representations [slides]	The Theory of Affordances
Week 6 Tue, 09/30	Learning from Human Videos MimicPlay: Long-Horizon Imitation Learning by Watching Human Play (Wang et al., 2023) [slides] Affordances from Human Videos as a Versatile Representation for Robotics (Bahl & Mendonca et al., 2023) [slides]
Week 6 Thu, 10/02	Exploration [slides] Project Proposal Signup Form Deadline: 11:59 pm	Exploration Strategies in Deep RL
Week 7 Tue, 10/07	Autonomous Improvements Self-Supervised Exploration via Disagreement (Pathak & Gandhi et al., 2019) [slides] Reset-Free Reinforcement Learning via Multi-Task Learning (Gupta et al., 2021) [slides]
Week 7 Thu, 10/09	Project Proposal Talk
Week 8 Tue, 10/14	<Fall Break>
Part II: Model Scaling
Week 8 Thu, 10/16	Representation Learning [slides]
Week 9 Tue, 10/21	Visual Representations for Robotics [Paper 1] LIV: Language-Image Representations and Rewards for Robotic Control (Ma et al., 2023) [Paper 2] Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation (Shen & Yang et al., 2023)
Week 9 Thu, 10/23	Diffusion Models [slides]
Week 10 Tue, 10/28	Action Diffusion [Paper 1] Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (Chi et al., 2023) [Paper 2] Planning with Diffusion for Flexible Behavior Synthesis (Janner & Du et al., 2023)
Week 10 Thu, 10/30	Sequence Modeling and Transformers [slides]
Week 11 Tue, 11/04	Transformer Policies [Paper 1] Decision Transformer: Reinforcement Learning via Sequence Modeling (Chen et al., 2021) [Paper 2] Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware (Zhao et al., 2023)
Week 11 Thu, 11/06	Vision-Language Models [slides]
Week 12 Tue, 11/11	Vision-Language-Action Models [Paper 1] VIMA: General Robot Manipulation with Multimodal Prompts (Jiang et al., 2023) [Paper 2] OpenVLA: An Open-Source Vision-Language-Action Model (Kim et al., 2024)
Week 12 Thu, 11/13	Open-Vocabulary Perception [slides]
Part III: Frontiers
Week 13 Tue, 11/18	Guest Lecture: Prof. Antonio Loquercio University of Pennsylvania Title: Lessons Learned from SuperHuman Drone Racing
Week 13 Thu, 11/20	Guest Lecture: Prof. Andrea Bajcsy Carnegie Mellon University Title: What Does Safety Mean for Generalist Robots?
Week 14 Tue, 11/25	Open-World control [Paper 1] Code as Policies: Language Model Programs for Embodied Control (Liang et al., 2022) [Paper 2] Robotic Control via Embodied Chain-of-Thought Reasoning (Zawalski & Chen et al., 2024)
Week 14 Thu, 11/27	<Thanksgiving Break>
Week 15 Tue, 12/02	Project Spotlight Talk
Week 15 Thu, 12/04	Project Spotlight Talk
Week 16 Fri, 12/12	<No Class> Project Report Deadline: 11:59 pm

Learning Outcomes

Summarize how deep learning is applied for robot perception and decision making.
Explain and compare research paeprs in robot learning.
Identify limitations and weaknesses of prior work to suggest future work.
Apply deep learning to solve real-world robot problems.

Grading Policy

This course interleaves lectures and guided discussions. The course has no midterm or final exams. You will be graded on the basis of homework, class participation, and a course project. In each week, we will read 2 papers related to the previous lecture and discuss them in class. Before each lecture, you are expected to submit a short review of the required readings as homework. Each class will also have a group of presenters who are in charge of leading the discussion. Another significant portion of the grade comes from a semester-long project, where you can work in a team of 1-3 people on a research project that is related to the course topics. The final grade for the course will be tentatively based on the following weights:

Paper reviews (30%)

Write reviews for the papers selected for presentation (paper list is in the syllabus below). You are required to complete 10 paper reviews (based on your choice among the 20 papers that we will discuss) throughout the semester. If you submit more than 10 paper reviews, your grade will be computed based on the 10 reviews which get the highest scores. The review needs to be submitted the day before the presentation (Deadline: 11:59 pm). Please refer to this [guide] and [template] to learn how to write reviews for robot learning papers.

Paper presentation (20%)

An integral component of this course is to conduct a systematic literature review on robot learning research through student presentations and in-class discussions. You will be divided into presentation groups (each of 2-3 students) based on your preference of papers. Each group will present two papers during the semester. To ensure the quality and clarity of the presentations, we expect you to

Read the assigned papers thoroughly and gain a good understanding before making the presentation slides ([template]).
Email the slides and a list of open-ended questions on the topic to the TA and the instructor 5 days prior to the presentation date (e.g., for a presentation on Tuesday, the deadline is on the Thursday before that week) for feedback and revision (Deadline: 11:59 pm).

Failures to email the slides on time would incur a 20% deduction on the presentation score. Presentation for each paper should be 20min (± 2min). The presentations will be graded in the following aspects:

Clarity of presentation (problem formulation, key insights, proposed method, key results).
Presentation of the background material (basic concepts to understand the research improvement).
Review of prior work and the challenges addressed by this work.
Analysis of the strengths and weaknesses of the research.
Discussion of potential research extensions and applications.
Response to student questions.

After the presentation, we will do a 10 min Q&A about the presentation and then we will have a 20 min open-ended discussion. The slides of the presentations will be shared on the course webpage within one week of the presentation date.

Course project (40%)

The course project aims to help the students gain in-depth, hands-on experiences applying learning-based techniques to practical robot perception and decision making problems. It consists of these key milestones: a project proposal (5%), a proposal talk (5%), a final report (20%), and a spotlight talk (10%). The spotlight talk will be hosted in the week 15. Here is a list of potential project ideas worth invetigating, for your reference. You can also come up with any other ideas that you would like to pursue for the project.

In-class participation (10%)

You will get penalized if you miss more than 2 attendance-taking classes.

CS 6758: Deep Learning for Robotics (Fall 2025)

Cornell University

Lectures: Tue, Thu 8:40 am – 9:55 am, Hollister Hall 320