Grounded Language Understanding with Realistic Agents 

Abstract: Natural language understanding in grounded interactive scenarios is tightly coupled with the actions the system generates and its observations of the environment. The system actions, or its interface, define the output space, while sensory observations ground instruction meaning. How we define the output space and the type of the environment the agent observes determines the complexity of the problem, and the type of reasoning required. While mapping instructions to actions has been studied extensively, the majoirty of work focused on simple discrete actions and was developed in lab environments. Outside of the lab and with real robotic agents, new questions of scalability arise, including how can we use demonstrations to learn to bridge the gap between the high-level concepts of language and low-level robot controls? How do we design models that continuously observe and control? And outside the lab, how can we reason about complex real-life observations? In this talk I will present our recent work on studying grounded language understanding in realistic scenarios. First, I will describe our approach to learning to map instructions and observations to continuous control in a realistic quadcopter drone. Second, I will briefly present our recent study of instructional and spatial language with real-life observations using Google StreetView. Both parts of the talk use new publicly available evaluation benchmarks.