AI Seminar - The Principal-Agent Value Alignment Problem in Artificial Intelligence

Abstract: The design, evaluation, and (successful) deployment of an artificial intelligence system depends on a complex set of interactions between system designers and builders, users, and broader society. These human-robot interactions are pervasive, in the sense that they are present in almost any non-trivial application of artificial intelligence, and crucial, in that they often determine the success or failure of an A.I. system as much as, if not more than, the quality of the underlying A.I. technology. The talk will present research on one of the most important of these interactions: the relationship between an A.I. agent, e.g., a robot, and a (human) principal, the person that agent is supposed to act on behalf of. This interaction is characterized by a shared goal, described as the person's value or utility function, and partial information about that goal, as the robot does not initially know that goal. In order to generate value for the principal, the A.I. system must first align itself to principal's utility function, and so we refer to this as the Principal-Agent Value Alignment Problem. The talk will begin with a brief overview of principal-agent problems as they are studied in economics and discuss analogies in the A.I. version of this problem. Then, it will provide an overview of Inverse Reward Design and Cooperative Inverse Reinforcement Learning, two related models that allow us to investigate algorithmic and theoretical approaches to understanding and solving the principal-agent value alignment problem in artificial intelligence.

Bio: Dylan is a Ph.D. student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. His goal is to design algorithms and systems that learn about and pursue the intended goal of their users, designers, and society in general. His work lies at the intersection of cognitive science, human-robot interaction, economics, and sequential decision-making. His recent work has focused on algorithms for human-robot interaction with unknown preferences, reliability engineering for learning systems, and bridging the gap between social science and human-robot interaction.