Theory Seminar - Adaptive Discretization for Decision Making in Large Continuous Spaces

Adaptive Discretization for Decision Making in Large Continuous Spaces

Abstract: In this talk, I will present a sequence of two works that explore adaptive discretization for decision making in continuous state and action spaces. In the first work, I will present a Q-learning policy with adaptive data-driven discretization for episodic RL on continuous state-action spaces. We recover the regret guarantees of prior algorithms for continuous state-action spaces, and experiments demonstrate how our algorithm automatically adapts to the underlying structure of the problem, resulting in better performance compared both to heuristics and Q-learning with uniform discretization. In the second work, we consider the question of learning the metric amongst actions when it is unknown, for a nonparametric contextual multi-arm bandit problem. Suppose that there is a large set of actions, yet there is a simple but unknown structure amongst the actions, e.g. finite types or smooth with respect to an unknown metric space. We present an algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We will discuss regret bounds and show simulations that illustrate the potential gains for learning data driven similarities.

Bio: Christina Lee Yu is an Assistant Professor at Cornell University in Operations Research and Information Engineering. Prior to Cornell, she was a postdoc at Microsoft Research New England. She received her PhD in 2017 and MS in 2013 in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in the Laboratory for Information and Decision Systems. She received her BS in Computer Science from California Institute of Technology in 2011. She received honorable mention for the 2018 INFORMS Dantzig Dissertation Award. Her recent interests include matrix and tensor estimation, multi-arm bandits, and reinforcement learning.