Is Scale All We Need for Robotics Foundation Models?

Abstract: In recent years, the field of AI and Machine Learning has witnessed remarkable advancement in developing generalist models. These models can be applied to various tasks in open domains. One notable example is large language models (LLMs) like ChatGPT and GPT-4 from OpenAI. The creation of these generalist AI models, also known as foundation models, primarily relies on the same recipe: the trinity of powerful algorithms, big data, and massive compute. The compelling capabilities of these foundation models have raised a tantalizing question to robotics researchers: How close are we to achieving generalist robots capable of performing physical tasks in everyday environments? Is scaling up all we need for robotics to replicate the success recipe of LLMs? In this talk, I will discuss our recent work on developing principles and methods for generalist robot autonomy. Through these discussions, I will share my vision of a secret recipe for building robotics foundation models in the wild.

Bio: Yuke Zhu is an Assistant Professor in the Computer Science Department of UT-Austin, where he directs the Robot Perception and Learning (RPL) Lab. He is also a core faculty at Texas Robotics and a senior research scientist at NVIDIA. He focuses on developing intelligent algorithms for generalist robots and embodied agents to reason about and interact with the real world. His research spans robotics, computer vision, and machine learning. He received his Master's and Ph.D. degrees from Stanford University. His work has won various awards and nominations, including the Best Conference Paper Award in ICRA 2019, the Outstanding Learning Paper Award at ICRA 2022, the Outstanding Paper Award at NeurIPS 2022, and Best Paper Award Finalists in IROS 2019, 2021, and RSS 2022, 2023. He received the NSF CAREER Award and faculty awards from Amazon and JP Morgan.