Small-loss bounds for online learning with partial information

Abstract: I will discuss the problem of online learning with graph-based feedback in the adversarial (non-stochastic) setting. At each round, a decision maker selects an action from a finite set of alternatives and receives feedback based on a combinatorial graph-based feedback model introduced by Mannor and Shamir. This encapsulates as special cases important partial information paradigms such as bandits, online routing, and contextual bandits. An important challenge in such partial-feedback settings is that, in order to keep up with the non-stochasticity in the environment, the learner needs to explore often all actions, including suboptimal ones. I will tackle this hurdle by providing a general black-box reduction to attain effective regret guarantees that avoid this over-exploration.