- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Observing, Learning, and Executing Fine-Grained Manipulation Activities: A Systems Perspective
Abstract: In the domain of image and video analysis, much of the deep learning revolution has been focused on narrow, high-level classification tasks that are defined through carefully curated, retrospective data sets. However, most real-world applications – particularly those involving complex, multi-step manipulation activities -- occur “in the wild" where there is a combinatorial long tail of unique situations that are never seen during training. These systems demand a richer, fine-grained task representation that is informed by the application context and which supports quantitative analysis and compositional synthesis. As a result, the challenges inherent in both high-accuracy, fine-grained analysis and performance of perception-based activities are manifold, spanning representation, recognition, and task and motion planning.
This talk will summarize our work addressing these challenges. I’ll first describe DASZL, our approach to interpretable, attribute-based activity detection. DASZL operates in both pre-trained and zero shot settings, and it has been applied to a variety of applications ranging from surveillance to surgery. I will then describe our recent work on “Good Robot”, a method for end-to-end training of a robot manipulation system. Good Robot achieves state-of-the-art performance in complex, multi-step manipulation tasks, and we show it can be refactored to support both demonstration-driven and language-guided manipulation. I’ll close with a summary of some directions related to these technologies that we are currently exploring.
Bio: Greg Hager is the Mandell Bellmore Professor of Computer Science at Johns Hopkins University and the Founding Director of the Malone Center for Engineering in Healthcare. Professor Hager’s research interests include computer vision, vision-based and collaborative robotics, time-series analysis of image data, and applications of image analysis and robotics in medicine and in manufacturing. Professor Hager has served on the editorial boards of IEEE TRO, IEEE PAMI, and IJCV and ACM Transactions on Computing for Healthcare. He is a fellow of the ACM and IEEE for his contributions to Vision-Based Robotics and a Fellow of AAAS, the MICCAI Society and of AIMBE for his contributions to imaging and his work on the analysis of surgical technical skill. Professor Hager is a co-founder of Clear Guide Medical and Ready Robotics. He is currently on leave from JHU as a Director of Applied Science for Amazon Physical Stores.