Date: August 25, 2025

Time: 3:45-5 p.m.

Location: Gates 310 or via Zoom

Speaker: Alexander Terenin, Cornell University

 

A color photo of a man standing outside, smiling.

 

Bio: Alexander Terenin is an Assistant Research Professor (equiv. 3-year postdoctoral research fellowship) at Cornell. His PhD is from Imperial College London in 2022. His current research focuses on non-discrete decision-making under uncertainty.

 

Abstract: In day-to-day life, humans often learn by trying random things and seeing what works. This is true for both categorical decisions - for example, finding the best restaurant in a new city - and for fine motor control - for example, in cooking or sports. Given this ubiquity, it is of fundamental scientific importance to ask: is there a theoretical framework which can algorithmically describe how an abstract agent should learn by taking random actions?

The online learning game is a minimax game which captures this question, and is known by many names, including no-regret learning, and prediction with expert advice. It is arguably the simplest such variant: many other decision problems - including adversarial bandits, partial monitoring games, equilibrium computation, and variants of reinforcement learning - can be solved by reduction to an online learning oracle. Most approaches are based on mirror descent variants such as exponential weights, which are rooted in convex optimization and well-understood only for discrete action spaces.

In this talk, we study the non-discrete online learning game. Given that this game involves an adversary, and that most algorithms are convex-analytic in nature, our setting at first appears to have little to do with Bayesian decision-making. In spite of this, we show that a Bayesian approach - specifically, Thompson sampling over the space of the adversary's future actions (as opposed to the more-obvious state space), admits a no-regret online learning guarantee. The proof includes a novel probabilistic argument, which can be used more generally to analyze correlated follow-the-perturbed-leader algorithms.

Concretely, we show against a bounded Lipschitz adversary that certain Gaussian process priors with good game-theoretic properties achieve a low excess regret, and therefore a low minimax regret. This yields the first perturbation-based algorithm for non-discrete online learning, and can be seen as a first step towards extending adversarial bandits and other fundamentally important bandit-based learning algorithms to non-discrete action spaces.