- About
- Events
- Calendar
- Graduation Information
- Cornell Learning Machines Seminar
- Student Colloquium
- BOOM
- Fall 2024 Colloquium
- Conway-Walker Lecture Series
- Salton 2024 Lecture Series
- Seminars / Lectures
- Big Red Hacks
- Cornell University - High School Programming Contests 2024
- Game Design Initiative
- CSMore: The Rising Sophomore Summer Program in Computer Science
- Explore CS Research
- ACSU Research Night
- Cornell Junior Theorists' Workshop 2024
- People
- Courses
- Research
- Undergraduate
- M Eng
- MS
- PhD
- Admissions
- Current Students
- Computer Science Graduate Office Hours
- Advising Guide for Research Students
- Business Card Policy
- Cornell Tech
- Curricular Practical Training
- A & B Exam Scheduling Guidelines
- Fellowship Opportunities
- Field of Computer Science Ph.D. Student Handbook
- Graduate TA Handbook
- Field A Exam Summary Form
- Graduate School Forms
- Instructor / TA Application
- Ph.D. Requirements
- Ph.D. Student Financial Support
- Special Committee Selection
- Travel Funding Opportunities
- Travel Reimbursement Guide
- The Outside Minor Requirement
- Diversity and Inclusion
- Graduation Information
- CS Graduate Minor
- Outreach Opportunities
- Parental Accommodation Policy
- Special Masters
- Student Spotlights
- Contact PhD Office
Offline Policy Evaluation for Reinforcement Learning under Unmeasured Confounding (via Zoom)
Abstract: In the context of reinforcement learning (RL), offline policy evaluation (OPE) is the problem of evaluating the value of a candidate policy using data that was previously collected from some existing logging policy. This is of crucial importance in many application areas such as medicine, healthcare, or robotics, where the cost of actually executing a potentially bad policy could be catastrophic. Unfortunately, in many of the applications that inspire OPE, we may reasonably expect the available logged data to be affected by unmeasured confounding, in which case standard OPE methods may be arbitrarily biased.
In this talk I will present some of my recent work on OPE under unmeasured confounding. First, I will discuss an infinite-horizon stationary setting, where the confounding occurs iid at each time step. In this setting, we may correct for the effects of confounding as long as we can infer an accurate latent variable model of the confounders. Then, I will discuss an episodic setting, where the confounding may be modeled using a Partially Observed Markov Decision Process (POMDP). Even in this more challenging setting, we may still account for confounding via a sequential reduction to contextual bandit-style policy evaluation, using the recently-proposed proximal causal inference framework. Finally, I will provide a high-level discussion of the open challenges surrounding RL with unmeasured confounders.
The talk is based on joint work with Nathan Kallus, Lihong Li, and Ali Mousavi
Bio: Andrew is a fifth year PhD student at Cornell University in the Computer Science department, supervised by Nathan Kallus. His current research focus is at the intersection of causal inference, machine learning, and econometrics, with particular interest in causal inference under unmeasured confounding, reinforcement learning, and efficiently solving high-dimensional conditional moment problems. Previously, during his Masters at the University of Melbourne, Andrew conducted research in Natural Language Processing and Computational Linguistics.