Policy Optimizer for Exponential Models
Author:Adith Swaminathan <firstname.lastname@example.org>
Department of Computer Science
Policy Optimizer for Exponential Models is a simple gradient optimizer for learning structured output models (like Conditional Random Fields) using the Counterfactual Risk Minimization principle. This release includes all the code and data needed to reproduce all experiments from the ICML '15 and NIPS '15 papers [1,2].
Here you will find tar-balls of multi-label classification datasets from the LibSVM repository processed to be compatible with scikit-learn. Coming soon: A suitably anonymized query log fraction from the arXiv search system with correct randomization and logging. This will avoid the need to employ the Supervised-to-Bandit experiment methodology to evaluate future algorithms for batch learning from bandit feedback.>
Usage: python MultiLabelExperiment.py [exptNum]
Optionally, you may specify a second flag to parallelize the validation runs. This additionally needs PathOS multi-processing libraries. There are three broad classes of counterfactual estimators implemented: Vanilla (this is dominated by the other two estimators), Majorized (employs stochastic optimization, could scale to large datasets), Self-Normalized (is more robust when used in the Counterfactual Risk Minimization principle but is limited to the l-BFGS optimizer).
 A. Swaminathan and T. Joachims. Counterfactual Risk Minimization: Learning from logged bandit feedback, ICML, 2015. [arXiv][slides][poster].
 A. Swaminathan and T. Joachims. The Self-Normalized Estimator for Counterfactual Learning, NIPS, 2015. [proceedings][slides][poster].