Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium

Abstract: We consider on-policy reinforcement learning for two-player zero-sum Markov games with simultaneous moves, which generalizes both matrix games and Markov decision processes. To ensure exploration, our algorithm needs to construct upper and lower confidence bounds for both players in each round. We show that this can be done by finding the Coarse Correlated Equilibrium of an appropriate general-sum matrix game, which can be solved in polynomial time. The algorithm applies to the offline setting where one aims to find the Nash Equilibria of the Markov game, and the online setting where one plays against an arbitrary opponent and aims to minimize regret. For both settings, we provide sample complexity and regret guarantees.

Our results generalize to Markov games with continuous and unbounded state spaces. Using linear function approximation, we develop an optimistic version of least-squares minimax value iteration, for which we provide performance guarantees that only depend on the linear dimension. This setting highlights the analytical challenges that arise due to the non-Lipschitzness of CCEs of general-sum games.

Based on this paper.

Bio: Yudong Chen is an assistant professor at the School of Operations Research and Information Engineering (ORIE), Cornell University.  His research interests include machine learning, high-dimensional and robust statistics, convex and non-convex optimization, and reinforcement learning. Before joining Cornell, he was a postdoctoral scholar at the Department of Electrical Engineering and Computer Sciences at University of California, Berkeley. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, and his M.S. and B.S. from Tsinghua University.