
Syllabus
 08/26: Introduction [slides] [slides 6up]
 Examples of machine learning problems the require counterfactual reasoning.
 Overview of course.
 Administrative issues and course policies.
 09/02: Canceled
 09/09: Online Learning from User Interactions through Interventions [slides] [slides 6up]
 Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The Karmed Dueling Bandits Problem. In COLT, 2009. (paper) [TJ]
 P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper) [TJ]
 09/16: Counterfactual Model for Online Systems [slides] [slides 6up]
 Imbens, Rubin, Causal Inference for Statistical Social Science, 2015. Chapters 1,3,12.
 09/23: System Evaluation via Counterfactual Estimation [slides] [slides 6up]
 L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextualbanditbased news article recommendation algorithms. In WSDM, pages 297306, 2011. (paper) [Davis]
 L. Li, J.Y. Kim, I. Zitouni. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In WSDM, 2015. (paper) [Moontae]
 09/30: Causal Reasoning for Online Systems[slides] [slides 6up]
 L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):32073260, 2013. (paper) [Adith]
 10/07: Batch Learning from Bandit Feedback 1 [slides] [slides 6up] [
 A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):17311755, 2015. (paper) [TJ]
 A. Swaminathan and T. Joachims. The selfnormalized estimator for counterfactual learning. In NIPS, pages 32133221, 2015. (paper) [TJ]
 10/14: Batch Learning from Bandit Feedback 2 [slides] [slides 6up]
 A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129138, 2009. (paper) [Michael L]
 S.~Athey and G.~Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv eprints, 2015. (paper) [Aman]
 10/21: Online Contextual Bandits and Variance Reduction [slides] [slides 6up]
 M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 10971104, 2011. (paper) [Ashudeep]
 J. Langford and T. Zhang. The epochgreedy algorithm for multiarmed bandits with side information. In NIPS, 2008. (paper) [Ziteng]
 10/28: Observational Data[slides] [slides 6up]
 J. Langford, A. Strehl, and J. Wortman. Exploration scavenging. In ICML, pages 528535, 2008. (paper) [Pantelis]
 Alex Strehl, John Langford, Sham Kakade, Lihong Li. Learning from Logged Implicit Exploration Data. NIPS, pages 22172225, 2010. (paper) (paper) [Angela]
 11/04: Missing Data and Selection Bias [slides] [slides 6up]
 T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. In ICML, 2016. (paper) [Tobias]
 B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with nonrandom missing data. In RecSys, pages 512, 2009. (paper) [Wouter]
 11/11: Click Models [slides] [slides 6up]
 A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015. (paper) [Canceled]
 11/18: ERM Learning with Behavior Propensity Model [slides] [slides 6up]
 T. Joachims, A. Swaminathan, T. Schnabel, Unbiased LearningtoRank with Biased Feedback, In WSDM, 2017. (paper) [TJ]
 12/02: Wrapup [slides] [slides 6up]


Reference Material
We will mostly read original research papers, but the following books provide entry points for the main topics of the class:
 Imbens, Rubin, "Causal Inference for Statistics, Social, and Biomedical Sciences", Cambridge University Press, 2015. (online via Cornell Library)
 Morgan, Winship "Counterfactuals and Causal Inference", Cambridge University Press, 2007.
Other sources for general background on machine learning are:
 Kevin Murphy, "Machine Learning  a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
 Cristianini, ShaweTaylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
 Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
 Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
 Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
 Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
 Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997.
 Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
 Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
 Vapnik, "Statistical Learning Theory", Wiley, 1998.
Bias in Human Feedback
 Ruben Sipos, Arpita Ghosh, Thorsten Joachims. Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content. WWW, 2014. (paper)
 O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, LargeScale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.16.41, 2012. (paper)
 T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper)
 O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper)
 A. Chuklin, I. Markov, and M. de Rijke. Click Models for Web Search. Morgan & Claypool, 2015. (paper)
 S. Wager, N. Chamandy, O. Muralidharan, A. Najmi. Feedback Detection for Live Predictors. In NIPS, 2015. (paper)
Online Learning with Interactive Control
 Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The Karmed Dueling Bandits Problem. In COLT, 2009. (paper)
 P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper)
 K. Hofmann, A. Schuth, S. Whiteson, and M. de Rijke. Reusing historical interaction data for faster online learning to rank for {IR}. In WSDM, pages 183192, 2013. (paper)
 R. Kohavi, R. Longbotham, D. Sommerfield, and R. M. Henne. Controlled experiments on the web: survey and practical guide. Data Mining and Knowledge Discovery, pages 140181, 2009. (paper)
 J. Langford and T. Zhang. The epochgreedy algorithm for multiarmed bandits with side information. In NIPS, 2008. (paper)
 F. Lattimore, T. Lattimore, M. Reid. Causal Bandits: Learning Good Interventions via Causal Inference, NIPS, 2016. (paper)
Batch Learning from Controlled Interventions
 L. Li, W. Chu, J. Langford, and X. Wang. Unbiased offline evaluation of contextualbanditbased news article recommendation algorithms. In WSDM, pages 297306, 2011. (paper)
 A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In KDD, pages 129138, 2009. (paper)
 S.~Athey and G.~Imbens. Recursive Partitioning for Heterogeneous Causal Effects. ArXiv eprints, 2015. (paper)
 A. Swaminathan and T. Joachims. Counterfactual risk minimization: Learning from logged bandit feedback. In ICML, 2015. (paper)
 A. Swaminathan, T. Joachims, Batch Learning from Logged Bandit Feedback through Counterfactual Risk Minimization, JMLR Special Issue in Memory of Alexey Chervonenkis, 16(1):17311755, 2015. (paper)
 A. Swaminathan and T. Joachims. The selfnormalized estimator for counterfactual learning. In NIPS, pages 32133221, 2015. (paper)
 A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudik, and J. Langford. Offpolicy evaluation and optimization for slate recommendation. Arxiv Preprint, 2016. (paper)
 M. Dudik, J. Langford, and L. Li. Doubly robust policy evaluation and learning. In ICML, pages 10971104, 2011. (paper)
 L. Bottou, J. Peters, J. Q. Candela, D. X. Charles, M. Chickering, E. Portugaly, D. Ray, P. Y. Simard, and E. Snelson. Counterfactual reasoning and learning systems: The example of computational advertising. Journal of Machine Learning Research, 14(1):32073260, 2013. (paper)
 C. Cortes, Y. Mansour, and M. Mohri. Learning bounds for importance weighting. In NIPS, pages 442450, 2010. (paper)
 L. Li, S. Chen, J. Kleban, and A. Gupta. Counterfactual estimation and optimization of click metrics in search engines: {A} case study. In WWW Companion, pages 929934, 2015. (paper)
 J. Mary, P. Preux, and O. Nicol. Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques. In ICML, pages 172180, 2014. (paper)
 J. Schulman, S. Levine, P. Moritz, M. Jordan, P. Abbeel. Trust Region Policy Optimization. In ICML, 2015. (paper)
Batch Learning from Observational Feedback
 T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims. Recommendations as treatments: Debiasing learning and evaluation. In ICML, 2016. (paper)
 B. M. Marlin and R. S. Zemel. Collaborative prediction and ranking with nonrandom missing data. In RecSys, pages 512, 2009. (paper)
 J. M. HernándezLobato, N. Houlsby, and Z. Ghahramani. Probabilistic matrix factorization with nonrandom missing data. In ICML, pages 15121520, 2014. (paper)
 T. Joachims, A. Swaminathan, T. Schnabel, Unbiased LearningtoRank with Biased Feedback, Arxiv Preprint, 2016. (paper)
 L. Li, J.Y. Kim, I. Zitouni. Toward Predicting the Outcome of an A/B Experiment for Search Relevance. In WSDM, 2015. (paper)
 D. Liang, L. Charlin, J. McInerney, D. Blei. Modeling User Exposure in Recommendation. In WWW, 2016. (paper)
