- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Reinforcement Learning for Zero-Sum Markov Games Using Function Approximation and Correlated Equilibrium
Abstract: We consider on-policy reinforcement learning for two-player zero-sum Markov games with simultaneous moves, which generalizes both matrix games and Markov decision processes. To ensure exploration, our algorithm needs to construct upper and lower confidence bounds for both players in each round. We show that this can be done by finding the Coarse Correlated Equilibrium of an appropriate general-sum matrix game, which can be solved in polynomial time. The algorithm applies to the offline setting where one aims to find the Nash Equilibria of the Markov game, and the online setting where one plays against an arbitrary opponent and aims to minimize regret. For both settings, we provide sample complexity and regret guarantees.
Our results generalize to Markov games with continuous and unbounded state spaces. Using linear function approximation, we develop an optimistic version of least-squares minimax value iteration, for which we provide performance guarantees that only depend on the linear dimension. This setting highlights the analytical challenges that arise due to the non-Lipschitzness of CCEs of general-sum games.
Based on this paper.
Bio: Yudong Chen is an assistant professor at the School of Operations Research and Information Engineering (ORIE), Cornell University. His research interests include machine learning, high-dimensional and robust statistics, convex and non-convex optimization, and reinforcement learning. Before joining Cornell, he was a postdoctoral scholar at the Department of Electrical Engineering and Computer Sciences at University of California, Berkeley. He obtained his Ph.D. in Electrical and Computer Engineering from the University of Texas at Austin, and his M.S. and B.S. from Tsinghua University.