Observing, Learning, and Executing Fine-Grained Manipulation Activities: A Systems Perspective

Abstract: In the domain of image and video analysis, much of the deep learning revolution has been focused on narrow, high-level classification tasks that are defined through carefully curated, retrospective data sets. However, most real-world applications – particularly those involving complex, multi-step manipulation activities -- occur “in the wild" where there is a combinatorial long tail of unique situations that are never seen during training. These systems demand a richer, fine-grained task representation that is informed by the application context and which supports quantitative analysis and compositional synthesis. As a result, the challenges inherent in both high-accuracy, fine-grained analysis and performance of perception-based activities are manifold, spanning representation, recognition, and task and motion planning.

This talk will summarize our work addressing these challenges. I’ll first describe DASZL, our approach to interpretable, attribute-based activity detection. DASZL operates in both pre-trained and zero shot settings, and it has been applied to a variety of applications ranging from surveillance to surgery. I will then describe our recent work on “Good Robot”, a method for end-to-end training of a robot manipulation system. Good Robot achieves state-of-the-art performance in complex, multi-step manipulation tasks, and we show it can be refactored to support both demonstration-driven and language-guided manipulation. I’ll close with a summary of some directions related to these technologies that we are currently exploring.

Bio: Greg Hager is the Mandell Bellmore Professor of Computer Science at Johns Hopkins University and the Founding Director of the Malone Center for Engineering in Healthcare. Professor Hager’s research interests include computer vision, vision-based and collaborative robotics, time-series analysis of image data, and applications of image analysis and robotics in medicine and in manufacturing. Professor Hager has served on the editorial boards of IEEE TRO, IEEE PAMI, and IJCV and ACM Transactions on Computing for Healthcare. He is a fellow of the ACM and IEEE for his contributions to Vision-Based Robotics and a Fellow of AAAS, the MICCAI Society and of AIMBE for his contributions to imaging and his work on the analysis of surgical technical skill. Professor Hager is a co-founder of Clear Guide Medical and Ready Robotics. He is currently on leave from JHU as a Director of Applied Science for Amazon Physical Stores.