Overview
Online metrics measured through A/B tests have become
the gold standard for many evaluation questions. But can
we get the same results as A/B tests without actually fielding
a new system? And can we train systems to optimize
online metrics without subjecting users to an online learning
algorithm? This tutorial summarizes and unifies the
emerging body of methods on counterfactual evaluation and
learning. These counterfactual techniques provide a wellfounded
way to evaluate and optimize online metrics by exploiting
logs of past user interactions. In particular, the
tutorial unifies the causal inference, information retrieval,
and machine learning view of this problem, providing the
basis for future research in this emerging area of great potential
impact.
Slides and Video link
The slides for the tutorial are in four parts, and pdf's exported from Powerpoint are provided below.
Video recording of the tutorial is in two parts, and embedded below. To view it in a separate tab, click here:
Part I,
Part II.
Part 1:
Part 2:
Code and Data for News Recommendation Demo
Download all required files for the demo here: Code_Data.zip
Dependencies
All evaluation and most learning approaches are written in Python3.
The one exception -- learning using ERM and Inverse Propensity Scoring (or Doubly Robust estimators) -- uses Vowpal Wabbit.
Recommended installation process:
- Download and install Anaconda with Python3 (e.g. Python 3.5).
- Ensure the following python packages are installed : Numpy,
Scipy, Scikit-learn,
joblib. With Anaconda, just use:
conda install [package]
- Install Vowpal Wabbit. On Windows, use cygwin and follow these instructions.
- Unzip the Code_Data.zip downloaded from above, and navigate to Code_Data/Code/Synth_TestBed/.
To run the evaluation experiments, use
python EvalExperiment.py [-h]
Allowed values for the -e and -f options (the policy to evaluate, and the policy that serves as the scorer for Plackett-Luce randomization) are
ridge,lasso,lstsq,tree
Allowed values for the -v option (the metric to measure/optimize) are
Revenue,Constant
To plot the results of running evaluation experiments, use
python plot_sigir.py --mode [estimate/error] --path [../../Logs/ssynth...pkl]
To run the learning experiments, use
python OptExperiment.py [-h]
If you wish to also observe Vowpal Wabbit results for learning, supply the "-d" option to the OptExperiment script [WARNING: Dumps massive dataset files.]
Additionally run,
vw -d vw_train.dat --cb_adf -f cb.model --passes 20 –cache_file cb.cache
vw -t -d vw_test.dat -i cb.model -p test.predict
python vw_helper.py –d vw_test2.dat –p test.predict
Note that Vowpal Wabbit has several tunable parameters and implements an efficient grid-search and cross-validation strategy. Have fun playing with multiple hyper-parameters!
References
\bibitem{Aslam2009}
J.~A. Aslam, E.~Kanoulas, V.~Pavlu, S.~Savev, and E.~Yilmaz.
Document selection methodologies for efficient and effective
learning-to-rank.
In {\em SIGIR}, pages 468--475, 2009.
\bibitem{AtheyImbens}
S.~Athey and G.~Imbens.
{Recursive Partitioning for Heterogeneous Causal Effects}.
{\em ArXiv e-prints}, 2015.
\bibitem{beygelzimer09offset}
A.~Beygelzimer and J.~Langford.
The offset tree for learning with partial labels.
In {\em KDD}, pages 129--138, 2009.
\bibitem{BeygelzimerKDD2010}
A.~Beygelzimer and J.~Langford.
Learning through exploration.
In {\em KDD}, 2010.
\bibitem{Bottou2013}
L.~Bottou, J.~Peters, J.~Q. Candela, D.~X. Charles, M.~Chickering,
E.~Portugaly, D.~Ray, P.~Y. Simard, and E.~Snelson.
Counterfactual reasoning and learning systems: The example of
computational advertising.
{\em Journal of Machine Learning Research}, 14(1):3207--3260, 2013.
\bibitem{Carterette2010}
B.~Carterette, E.~Kanoulas, V.~Pavlu, and H.~Fang.
Reusable test collections through experimental design.
In {\em SIGIR}, pages 547--554, 2010.
\bibitem{Carterette2010Tutorial}
B.~Carterette, E.~Kanoulas, and E.~Yilmaz.
Low cost evaluation in information retrieval.
In {\em SIGIR}, pages 903--903, 2010.
\bibitem{CarteretteKY12}
B.~Carterette, E.~Kanoulas, and E.~Yilmaz.
Advances on the development of evaluation measures.
In {\em SIGIR}, pages 1200--1201, 2012.
\bibitem{Cesa-Bianchi2006}
N.~Cesa-Bianchi and G.~Lugosi.
{\em Prediction, Learning, and Games}.
Cambridge University Press, New York, NY, USA, 2006.
\bibitem{Interleaving2012}
O.~Chapelle, T.~Joachims, F.~Radlinski, and Y.~Yue.
Large-scale validation and analysis of interleaved search evaluation.
{\em {ACM} Transactions on Information Systems}, 30(1):6, 2012.
\bibitem{Chuklin_SIGIR2015_Tutorial1}
A.~Chuklin, I.~Markov, and M.~de~Rijke.
An introduction to click models for web search.
In {\em SIGIR}, pages 1113--1115, 2015.
\bibitem{Cortes2010}
C.~Cortes, Y.~Mansour, and M.~Mohri.
Learning bounds for importance weighting.
In {\em NIPS}, pages 442--450, 2010.
\bibitem{Dudik2014}
M.~Dud{\'\i}k, D.~Erhan, J.~Langford, and L.~Li.
Doubly robust policy evaluation and optimization.
{\em Statistical Science}, pages 485--511, 2014.
\bibitem{Dudik2011}
M.~Dud{\'{i}}k, J.~Langford, and L.~Li.
Doubly robust policy evaluation and learning.
In {\em ICML}, pages 1097--1104, 2011.
\bibitem{Workshop2015WWW}
N.~Gupta, E.~Koh, and L.~Li.
Workshop on online and offline evaluation of web-based services.
In {\em WWW Companion}, 2015.
\bibitem{HL}
J.~M. Hernández-Lobato, N.~Houlsby, and Z.~Ghahramani.
Probabilistic matrix factorization with non-random missing data.
In {\em ICML}, pages 1512--1520, 2014.
\bibitem{Hofmann2014Tutorial}
K.~Hofmann.
Online experimentation for information retrieval.
In {\em RuSSIR}, pages 21--41, 2014.
\bibitem{Hofmann2013}
K.~Hofmann, A.~Schuth, S.~Whiteson, and M.~de~Rijke.
Reusing historical interaction data for faster online learning to
rank for {IR}.
In {\em WSDM}, pages 183--192, 2013.
\bibitem{kohavi2009controlled}
R.~Kohavi, R.~Longbotham, D.~Sommerfield, and R.~M. Henne.
Controlled experiments on the web: survey and practical guide.
{\em Knowledge Discovery and Data mining}, pages 140--181, 2009.
\bibitem{LangfordNIPS2013}
J.~Langford.
Learning to interact.
In {\em NIPS}, 2013.
\bibitem{Langford2008}
J.~Langford, A.~Strehl, and J.~Wortman.
Exploration scavenging.
In {\em ICML}, pages 528--535, 2008.
\bibitem{langford2008epoch}
J.~Langford and T.~Zhang.
The epoch-greedy algorithm for multi-armed bandits with side
information.
In {\em Advances in Neural Information Processing Systems}, 2008.
\bibitem{LihongWSDM2015}
L.~Li.
Offline evaluation and optimization for interactive systems.
In {\em WSDM}, 2015.
\bibitem{LiCKG15}
L.~Li, S.~Chen, J.~Kleban, and A.~Gupta.
Counterfactual estimation and optimization of click metrics in search
engines: {A} case study.
In {\em WWW Companion}, pages 929--934, 2015.
\bibitem{Li2011}
L.~Li, W.~Chu, J.~Langford, and X.~Wang.
Unbiased offline evaluation of contextual-bandit-based news article
recommendation algorithms.
In {\em WSDM}, pages 297--306, 2011.
\bibitem{Li2015}
L.~Li, R.~Munos, and C.~Szepesvari.
Toward minimax off-policy value estimation.
In {\em AISTATS}, 2015.
\bibitem{Marlin/Zemel/09}
B.~M. Marlin and R.~S. Zemel.
Collaborative prediction and ranking with non-random missing data.
In {\em RecSys}, pages 5--12, 2009.
\bibitem{Mary2014}
J.~Mary, P.~Preux, and O.~Nicol.
Improving offline evaluation of contextual bandit algorithms via
bootstrapping techniques.
In {\em ICML}, pages 172--180, 2014.
\bibitem{Rosenbaum1983}
P.~Rosenbaum and D.~Rubin.
The central role of the propensity score in observational studies for
causal effects.
{\em Biometrika}, 70(1):41--55, 1983.
\bibitem{Rubinstein2008}
R.~Rubinstein and D.~Kroese.
{\em Simulation and the Monte Carlo Method}.
Wiley, 2 edition, 2008.
\bibitem{mnar}
T.~Schnabel, A.~Swaminathan, A.~Singh, N.~Chandak, and T.~Joachims.
Recommendations as treatments: Debiasing learning and evaluation.
{\em Preprint}, 2016.
\bibitem{Strehl2010}
A.~L. Strehl, J.~Langford, L.~Li, and S.~Kakade.
Learning from logged implicit exploration data.
In {\em NIPS}, pages 2217--2225, 2010.
\bibitem{Sutton1998}
R.~S. Sutton and A.~G. Barto.
{\em {Reinforcement Learning: An Introduction}}.
The MIT Press, 1998.
\bibitem{Swaminathan2015}
A.~Swaminathan and T.~Joachims.
Counterfactual risk minimization: Learning from logged bandit
feedback.
In {\em ICML}, 2015.
\bibitem{Swaminathan15b}
A.~Swaminathan and T.~Joachims.
The self-normalized estimator for counterfactual learning.
In {\em NIPS}, pages 3213--3221, 2015.
\bibitem{slates}
A.~Swaminathan, A.~Krishnamurthy, A.~Agarwal, M.~Dud\'{i}k, and J.~Langford.
Off-policy evaluation and optimization for slate recommendation.
2016.
\bibitem{Vapnik1998}
V.~Vapnik.
{\em Statistical Learning Theory}.
Wiley, Chichester, GB, 1998.
\bibitem{Yang2014Tutorial}
H.~Yang, M.~Sloan, and J.~Wang.
Dynamic information retrieval modeling.
In {\em SIGIR}, 2014.
\bibitem{Evangelosetal}
E.~Yilmaz, E.~Kanoulas, and J.~A. Aslam.
A simple and efficient sampling method for estimating {AP} and
{NDCG}.
In {\em SIGIR}, pages 603--610, 2008.
\bibitem{Zadrozny2003}
B.~Zadrozny, J.~Langford, and N.~Abe.
Cost-sensitive learning by cost-proportionate example weighting.
In {\em ICDM}, pages 435--, 2003.
Acknowledgements
We acknowledge and thank for the support under NSF Awards
IIS-1247637, IIS-1217686, and IIS-1513692, as well as a gift
from Bloomberg.