"ALFRED -- A Simulated Playground for Connecting Language, Action, and Perception"(via Zoom)

Abstract: Vision-and-Language Navigation has become a popular task in the grounding literature, but the real world includes interaction, state-changes, and long horizon planning (Actually, the real world requires motors and torques, but let's ignore that for the moment).  We present ALFRED (Action Learning From Realistic Environments and Directives) as a benchmark dataset with the goal of facilitating more complex embodied language understanding.  In this talk, I'll discuss the benchmark itself and subsequent pieces of work enabled by the environment and annotations.  Our goal is to provide a playground for moving embodied language+vision research closer to robotics enabling the community to work on uncovering abstractions and interactions between planning, reasoning, and action taking.

Bio: Yonatan Bisk is an Assistant Professor in the Language Technologies Institute at Carnegie Mellon University.  He received his PhD from The University of Illinois at Urbana-Champaign where he worked on CCG induction with Julia Hockenmaier. Having pursued CCG syntax instead of semantics for years, battling with Yoav over how best to approach language learning, he has conceded the fight and now focuses on language grounding where his primary research question is --  What knowledge can't be learned from text?