POEM

Policy Optimizer for Exponential Models

Author: Adith Swaminathan <adith@cs.cornell.edu>
Cornell University
Department of Computer Science

Version: 0.02
Date: 06.06.2015

Overview

Policy Optimizer for Exponential Models is a simple gradient optimizer for learning structured output models (like Conditional Random Fields) using the Counterfactual Risk Minimization principle. This release includes all the code and data needed to reproduce all experiments from the ICML '15 and NIPS '15 papers [1,2].

Data

Here you will find tar-balls of multi-label classification datasets from the LibSVM repository processed to be compatible with scikit-learn.

[Data]

Code

Usage: python MultiLabelExperiment.py [exptNum]

Optionally, you may specify a second flag to parallelize the validation runs. This additionally needs PathOS multi-processing libraries. There are three broad classes of counterfactual estimators implemented: Vanilla (this is dominated by the other two estimators), Majorized (employs stochastic optimization, could scale to large datasets), Self-Normalized (is more robust when used in the Counterfactual Risk Minimization principle but is limited to the l-BFGS optimizer).

[ICML'15]

[NIPS'15]

References

[1] A. Swaminathan and T. Joachims. Counterfactual Risk Minimization: Learning from logged bandit feedback, ICML, 2015. [arXiv][slides][poster].

[2] A. Swaminathan and T. Joachims. The Self-Normalized Estimator for Counterfactual Learning, NIPS, 2015. [proceedings][slides][poster].