CS6784 Advanced Topics in Machine Learning, T. Joachims, Cornell University

	Advanced Topics in Machine Learning CS6784 Spring 2014 Prof. Thorsten Joachims Cornell University, Department of Computer Science

	Time and Place First lecture: January 23, 2014 Last lecture: May 6, 2014 Tuesday, 1:25pm - 2:40pm in Hollister 306 Thursday, 1:25pm - 2:40pm in Hollister 306
	Syllabus CS6784 is an advanced machine learning course for students that have already taken CS 4780 or CS 6780 or an equivalent machine learning class, giving in-depth coverage of currently active research areas in machine learning. The course will connect to open research questions in machine learning, giving starting points for future work. In particular, the course will focus on recent work in the following areas: Structured Output Prediction: In conventional classification and regression, the prediction is a single number. Many application problems, however, require the prediction of complex multi-part objects like trees (e.g. natural language parsing), alignments (e.g. protein threading), rankings (e.g. search engines), and paths (e.g. navigation assistant). How can one tractably model and learn to make such complex predictions? Learning with Humans in the Loop: Much of the data used for machine learning is gathered by observing human behavior (e.g. search engine logs, purchase data, fraud detection). However, it is known that this data is biased (e.g. users can click only on results that were presented). How can one account for these biases during learning? Or how can the learning algorithm deal with these biases by not being a passive observer, but by actively interacting with the human? Learning Representations: There are large classes of objects that do not come with representations that reflect their semantic properties well (e.g., songs, movies, words, people). However, there exists data of how these objects interact (e.g., playlists of songs, user ratings of movies, sentences of words, interactions among people). Can we learn meaningful representations of the objects from these interactions? The content of the course will reflect a balance of learning methods, algorithms, and their theoretical understanding, focusing on approaches with practical relevance.
	Staff Prof. Thorsten Joachims (homepage), office hour Thursdays 3:00-4:00 in Gates 418. Joshua Moore (homepage)
	Course Material 01/23: Introduction (slides) Overview of course topics Course administration and grading Warm-up assignment 01/28: Generative vs. Discriminative Supervised Learning (slides) Template for project idea pitch and project guidelines. Generative codels, conditional probabilistic models, decision models. Maximum-likelihood estimation and empirical risk minimization. Naive Bayes, logisitic regression, and support vector machines. 01/30: Generative Hidden Markov Models (slides) Representation and assumptions of HMMs Maximum-likelihood estimation of HMMs Most probable configurations and Viterbi algorithm Reading: Koller, Friedman, Getoor, Taskar, �Graphical Models in a Nutshell�. (paper) 02/04: Project Pitches See Piazza for the slides. 02/06: I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, Support Vector Machine Learning for Interdependent and Structured Output Spaces, ICML, 2004. (paper) (slides) 02/11: Ben Taskar, Carlos Guestrin and Daphne Koller. Max-Margin Markov Networks. NIPS, 2004. (paper) (slides) 02/11: D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng. Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data. CVPR, 2005. (paper) (slides) 02/20: Matthew Blaschko, Christoph Lampert. Learning to Localize Objects with Structured Output Regression. ECCV, 2008. (paper) (slides) 02/20: John Lafferty, Andrew McCallum, Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML, 2001. (paper) (slides) 02/25: Nathan Ratliff, Andrew Bagnell, Martin Zinkevich. Maximum Margin Planning. ICML, 2006. (paper) (slides) 02/25: J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik, Kernel Dependency Estimation, NIPS, 2002. (paper) (slides) 02/27: T. Joachims. A Support Vector Method for Multivariate Performance Measures. ICML, 2005. (paper) (slides) 02/27: Yisong Yue, T. Joachims. Predicting Diverse Subsets Using Structural SVMs. ICML, 2008. (paper) (slides) 03/04: T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper) (slides) 03/06: B. Carterette, P. Bennett, D. Chickering, S. Dumais. Here or There: Preference Judgments for Relevance. ECIR, 2008. (paper) (slides) 03/06: O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. (paper) (slides) 03/11: Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. JCSS, 2012. (paper) (slides) 03/13: P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper) (slides) 03/18: E. Agichtein, E. Brill, S. T. Dumais and R. Ragno. Learning user interaction models for predicting web search preferences. SIGIR, 2006. (paper) (slides) 03/18: O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper) (slides) 03/20: Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, Serge Belongie. Visual Recognition with Humans in the Loop. ECCV, 2010. (paper) (slides) 03/20: Seyda Ertekin, Haym Hirsh, Cynthia Rudin. Selective Sampling of Labelers for Approximating the Crowd. AAAI Fall Symposium, 2012. (paper) (slides) 3/25: Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, Daphne Koller. Tuned Models of Peer Assessment in MOOCs. EDM, 2013. (paper) (slides) 3/27: Shuo Chen, Joshua Moore, Douglas Turnbull, Thorsten Joachims, Playlist Prediction via Metric Embedding, ACM Conference on Knowledge Discovery and Data Mining (KDD), 2012. (paper) (slides) 3/27: J. Moore, Shuo Chen, T. Joachims, D. Turnbull, Taste over Time: the Temporal Dynamics of User Preferences, Conference of the International Society for Music Information Retrieval (ISMIR), 2013. (paper) (slides) 4/8: Yoshua Bengio, Rejean Ducharme, Pascal Vincent, Christian Jauvin. A Neural Probabilistic Language Model. JMLR, Vol 3, 2003. (paper) (slides) (slides) 4/8: Jason Weston, Samy Bengio, Nicolas Usunier. WSABIE: Scaling Up To Large Vocabulary Image Annotation. IJCAI, 2011. (paper) (slides) 4/10: Tomas Mikolov, Ilyu Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013. (paper) (slides) 4/10: Richard Socher, Brody Huval, Christopher Manning, Andrew Y. Ng. Semantic compositionality through recursive matrix-vector spaces. EMNLP, 2012. (paper) (slides) 4/15: D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research (JMLR), 3(5):993�1022, 2003. (paper) (slides) 4/15: Prem Gopalan, Jake Hofman, David Blei. Scalable Recommendation with Poisson Factorization. Online report, 2013. (paper) (slides) 4/17: Steffen Rendle, Lars Schmidt-Thieme. Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation. WSDM, 2010. (paper) (slides)
	Reference Material Structured Output Prediction I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, Support Vector Machine Learning for Interdependent and Structured Output Spaces, ICML, 2004. (paper) Ben Taskar, Carlos Guestrin and Daphne Koller. Max-Margin Markov Networks. NIPS, 2004. (paper) D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng. Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data. CVPR, 2005. (paper) Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. ICML, 2000. (paper) John Lafferty, Andrew McCallum, Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML, 2001. (paper) Chun-Nam John Yu, T. Joachims, R. Elber, J. Pillardy. Support Vector Training of Protein Alignment Models. Journal of Computational Biology, 15(7): 867-880, September 2008. (paper) Brooke Cowan, Ivona Kucerova, and Michael Collins, A Discriminative Model for Tree-to-Tree Translation, EMNLP 2006. (paper) Yisong Yue, T. Finley, F. Radlinski, T. Joachims. A Support Vector Method for Optimizing Average Precision. SIGIR, 2007. (paper) Yisong Yue, T. Joachims. Predicting Diverse Subsets Using Structural SVMs. ICML, 2008. (paper) Matthew Blaschko, Christoph Lampert. Learning to Localize Objects with Structured Output Regression. ECCV, 2008. (paper) Rajhans Samdani, Dan Roth. Efficient Decomposed Learning for Structured Prediction. ICML, 2012 (paper) A. Fix, T. Joachims, S. Park, R. Zabih. Structured learning of sum-of-submodular higher order energy functions. ICCV, 2013. (paper) Nathan Ratliff, Andrew Bagnell, Martin Zinkevich. Maximum Margin Planning. ICML, 2006. (paper) Ulf Brefeld, Tobias Scheffer, Semi-Supervised Learning for Structured Output Variables, ICML, 2006. (paper) J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik, Kernel Dependency Estimation, NIPS, 2002. (paper) Hal Daume, John Langford, Daniel Marcu, Search-based Structured Prediction, Machine Learning, 2009. (paper) Matthew Richardson, Pedro Domingos, Markov Logic Networks, Machine Learning, Vol. 62, Number 1-2, pp. 107-136, 2006. (paper) Kuzman Ganchev, Joao Graca, Jennifer Gillenwater, Ben Taskar. Posterior Regularization for Structured Latent Variable Models. JMLR, 10, 2010. (paper) Chun-Nam Yu, Thorsten Joachims. Learning Structural SVMs with Latent Variables. ICML 2009. (paper) Machine Learning with Humans in the Loop T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper) Ben Carterette, Rosie Jones. Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks. NIPS, 2007. (paper) F. Radlinski, M. Kurup, T. Joachims. How Does Clickthrough Data Reflect Retrieval Quality? CIKM, 2008. (paper) O. Chapelle, T. Joachims, F. Radlinski, Yisong Yue, Large-Scale Validation and Analysis of Interleaved Search Evaluation, ACM Transactions on Information Systems (TOIS), 30(1):6.1-6.41, 2012. (paper) O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper) E. Agichtein, E. Brill, S. T. Dumais and R. Ragno. Learning user interaction models for predicting web search preferences. SIGIR, 2006. (paper) D. Beeferman, A. Berger. Agglomerative clustering of search engine query logs. KDD, 2000. (paper) Alex Strehl, John Langford, Sham Kakade, Lihong Li. Learning from Logged Implicit Exploration Data. NIPS, 2010. (paper) Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. JCSS, 2012. (paper) Abner Guzman-Rivera, Dhruv Batra, Pushmeet Kohli. Multiple Choice Learning: Learning to Produce Multiple Structured Outputs, NIPS, 2012. (paper) B. Carterette, P. Bennett, D. Chickering, S. Dumais. Here or There: Preference Judgments for Relevance. ECIR, 2008. (paper) P. Shivaswamy, T. Joachims. Online Structured Prediction via Coactive Learning, ICML, 2012. (paper) A. Jain, B. Wojcik, T. Joachims, A. Saxena. Learning Trajectory Preferences for Manipulators via Iterative Improvement. NIPS, 2013. (paper) Seyda Ertekin, Haym Hirsh, Cynthia Rudin. Selective Sampling of Labelers for Approximating the Crowd. AAAI Fall Symposium, 2012. (paper) Steve Branson, Catherine Wah, Florian Schroff, Boris Babenko, Peter Welinder, Pietro Perona, Serge Belongie. Visual Recognition with Humans in the Loop. ECCV, 2010. (paper) Chris Piech, Jonathan Huang, Zhenghao Chen, Chuong Do, Andrew Ng, Daphne Koller. Tuned Models of Peer Assessment in MOOCs. EDM, 2013. (paper) Ruben Sipos, Arpita Ghosh, Thorsten Joachims. Was This Review Helpful to You? It Depends! Context and Voting Patterns in Online Content. WWW, 2014. (paper) Learning Representations Shuo Chen, Joshua Moore, Douglas Turnbull, Thorsten Joachims, Playlist Prediction via Metric Embedding, ACM Conference on Knowledge Discovery and Data Mining (KDD), 2012. (paper) J. Moore, Shuo Chen, T. Joachims, D. Turnbull, Taste over Time: the Temporal Dynamics of User Preferences, Conference of the International Society for Music Information Retrieval (ISMIR), 2013. (paper) Geoffrey Hinton, Sam Roweis. Stochastic Neighbor Embedding. NIPS, 2002. (paper) David Gleich, Matthew Rasmussen, Kevin Lang, and Leonid Zhukov. The world of music: User ratings; spectral and spherical embeddings; map projections. Online report, 2006. (paper) John Platt. Fast Embedding of Sparse Music Similarity Graphs. NIPS, 2004. (paper) Steffen Rendle, Lars Schmidt-Thieme. Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation. WSDM, 2010. (paper) Prem Gopalan, Jake Hofman, David Blei. Scalable Recommendation with Poisson Factorization. Online report, 2013. (paper) Thomas Hofmann. Probabilistic Latent Semantic Indexing. SIGIR, 1999. (paper) D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research (JMLR), 3(5):993�1022, 2003. (paper) Tomas Mikolov, Ilyu Sutskever, Kai Chen, Greg Corrado, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013. (paper) Ding Zhou, Shenghuo Zhu, Kai Yu, Xiaodan Song, Belle Tseng, Hongyuan Zha, Lee Giles. Learning Multiple Graphs for Document Recommendations. WWW, 2008. (paper) Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby. Euclidean Embedding of Co-occurrence Data. JMLR, Vol 8, 2007. (paper) Yoshua Bengio, Rejean Ducharme, Pascal Vincent, Christian Jauvin. A Neural Probabilistic Language Model. JMLR, Vol 3, 2003. (paper) Andriy Mnih, Geoffrey Hinton. Three new graphical models for statistical language modelling. ICML, 2007. (paper) Richard Socher, Brody Huval, Christopher Manning, Andrew Y. Ng. Semantic compositionality through recursive matrix-vector spaces. EMNLP, 2012. (paper) Eric Huang, Richard Socher, Christopher Manning, Andrew Y. Ng. Improving word representations via global context and multiple word prototypes. ACL, 2012. (paper) Jason Weston, Samy Bengio, Nicolas Usunier. WSABIE: Scaling Up To Large Vocabulary Image Annotation. IJCAI, 2011. (paper) Andriy Mnih, Yee Whye Teh. A fast and simple algorithm for training neural probabilistic language models. ICML, 2012. (paper)
	Communication Piazza discussion forum: This forum is our main platform for announcements, questions, and discussions. This is the best way to reach the course staff and the students in the class. CMT Reviewing System: We use this system for most submissions and peer reviewing. CMS Course Management System: We use CMT for the quiz results.
	Prerequisites The prerequisites for the class are: Knowledge of machine learning at the level of CS4780/5780 or CS4758/6758 or CS6780. Programming skills at the level of CS 2110. Knowledge of linear algebra at the level of MATH 2940. Knowledge of probability theory at the level of STSCI 3080.
	Academic Integrity This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.