Relaxations and Regret Bounds for Online Problems

Abstract: Sequential optimization problems can be represented and solved via Markov Decision Process (MDP) formulations. In addition to the so-called ``curse of dimensionality'', the optimal strategies in finite horizon settings have complicated representations. To overcome these challenges, researchers often look for approximations that are computationally tractable and produce solutions with provable optimality-gap bounds. 

A natural relaxation stems from considering an offline controller, or ``prophet'', who can use future information to make decisions. We offer a relaxation framework, coupled with optimality-gap guarantees, that is based on (1) generalizing the Bellman Equations to *Bellman Inequalities * and (2) using these inequalities to obtain a relaxation and an online algorithm.
We apply this approach to resource allocation problems and show constant regret.