CS 6784 - Advanced Topics in Machine Learning
Advanced Topics in Machine Learning

CS 6784 - Spring 2010
Cornell University
Department of Computer Science

 
Time and Place
First lecture: January 26, 2010
  • Tuesday, 1:25pm - 2:40pm in Upson 215
  • Thursday, 1:25pm - 2:40pm in Upson 215
Instructor
Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.
Office Hours
Wednesdays, 3:00pm - 4:00pm, Upson 4153
Description
CS6784 is an advanced machine learning course for students that have already taken CS 4780 or CS 6780 or an equivalent machine learning class, giving in-depth coverage of currently active research areas in machine learning. The course will connect to open research questions in machine learning, giving starting points for future work. In particular, the course will focus on recent work in the following areas:
  • Structured Output Prediction: In conventional classification and regression, the prediction is a single number. Many application problems, however, require the prediction of complex multi-part objects like trees (e.g. natural language parsing), alignments (e.g. protein threading), rankings (e.g. search engines), and paths (e.g. navigation assistant). How can one tractably model and learn to make such complex predictions?
  • Humans in the Loop: Much of the data used for machine learning is gathered by observing human behavior (e.g. search engine logs, purchase data, fraud detection). However, it is known that this data is biased (e.g. users can click only on results that were presented). How can one learn despite these biases? Or how can the learning algorithm gather unbiased data by not being a passive observer, but by actively interacting with the human?
  • Understanding Archives: We are capturing and archiving more and more data (e.g. email, blogs, photos). While search engines give good microscopic access to individual data item, much work is needed to get a more macroscopic view of the content of an archive. How can machine learning help understand and summarize content, trends, dependencies, and idea flows in such archives?

The content of the course will reflect a balance of learning methods, algorithms, and their theoretical understanding, putting an emphasis on approaches with practical relevance.

Course Material
  • 01/26: Introduction and Administration (PDF)
  • 01/28: Review, Notation and Terminology (PDF) (template slide for "pitch" PPT, ODP, PDF)
  • 02/02: Primer on Hidden Markov Models (PDF)
    • Reading on HMMs: [Manning/Schuetze/99] Chapter 9 (online)
    • Optional background on general Graphical Models: (PDF)
  • 02/04: Project pitches (PDF)
  • 02/09: [Tsochantaridis/etal/04] (PDF)
  • 02/11: [Yu/etal/08] (PDF)
  • 02/16: [Taskar/etal/04] (PDF)
  • 02/18: [Anguelov/etal/04] (PDF)
  • 02/18: [Weston/etal/02] (PDF)
  • 02/23: [McCallum/etal/00] (PDF)
  • 02/23: [Lafferty/etal/01] (PDF)
  • 02/25: [Brefeld/Scheffer/06] (PDF)
  • 03/02: [Xu/etal/06] (PDF)
  • 03/02: [Cowan/etal/06] (PDF)
  • 03/04: [Yue/etal/07, Yue/Joachims/08] (PDF)
  • 03/09: [Blaschko/Lampert/08] (PDF)
  • 03/09: [Abbeel/Ng/04] (PDF)
  • 03/11: [Daume/etal/09] (PDF)
  • 03/16: [Richardson/Domingos/06] (PDF)
  • 03/30: [Joachims/etal/07] (PDF)
  • 04/01: [Carterette/Jones/07] (PDF)
  • 04/01: [Radlinski/etal/08] (PDF)
  • 04/06: [Chapelle/Zhang/09] (PDF)
  • 04/06: [Agichtein/etal/06] (PDF)
  • 04/08: [Beeferman/Berger/00] (PDF)
  • 04/08: [Langford/etal/08] (PDF)
  • 04/13: [Yue/etal/09] (PDF)
  • 04/20: [Pohl/etal/08] (PDF)
  • 04/20: [Knoll/etal/09] (PDF)
  • 04/22: [Shaparenko/Joachims/07] (PDF)
  • 04/27: [Blei/etal/03] (PDF)
  • 04/29: [Kleinberg/02] (PDF)
Reading


Structured Output Prediction

  • 02/09: I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, Support Vector Machine Learning for Interdependent and Structured Output Spaces, ICML, 2004. (paper)
  • 02/11: Chun-Nam John Yu, T. Joachims, R. Elber, J. Pillardy. Support Vector Training of Protein Alignment Models. Journal of Computational Biology, 15(7): 867-880, September 2008. (paper)
  • 02/16: Ben Taskar, Carlos Guestrin and Daphne Koller. Max-Margin Markov Networks. NIPS, 2004. (paper) [Lu] (30 min)
  • 02/18: D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng. Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data. CVPR, 2005. (paper) [Sarah] (20 min)
  • 02/18: J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik, Kernel Dependency Estimation, NIPS, 2002. (paper) [Alex] (20 min)
  • 02/23: Andrew McCallum, Dayne Freitag, and Fernando Pereira. Maximum entropy Markov models for information extraction and segmentation. ICML, 2000. (paper) [Ruogu] (20 min)
  • 02/23: John Lafferty, Andrew McCallum, Fernando Pereira, Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. ICML, 2001. (paper) [Guozhang,Rohit]  (20 min)
  • 02/25: Ulf Brefeld, Tobias Scheffer, Semi-Supervised Learning for Structured Output Variables, ICML, 2006. (paper) [Jean-Baptiste] (20 min)
  • 03/02: Linli Xu, Dana Wilkinson, Finnegan Southey, Dale Schuurmans. Discriminative Unsupervised Learning of Structured Predictors. ICML, 2006. (paper) [Kent,Mark] (20 min)
  • 03/02: Brooke Cowan, Ivona Kucerova, and Michael Collins, A Discriminative Model for Tree-to-Tree Translation, EMNLP 2006. (paper) [Martin,Ruben] (20 min)
  • 03/04: Yisong Yue, T. Finley, F. Radlinski, T. Joachims. A Support Vector Method for Optimizing Average Precision. SIGIR, 2007. (paper)
  • 03/04: Yisong Yue, T. Joachims. Predicting Diverse Subsets Using Structural SVMs. ICML, 2008. (paper)
  • 03/09: Matthew Blaschko, Christoph Lampert. Learning to Localize Objects with Structured Output Regression. ECCV, 2008. (paper) [Adarsh,Yimeng] (20 min)
  • 03/09: Pieter Abbeel and Andrew Y. Ng., Apprenticeship Learning via Inverse Reinforcement Learning, ICML, 2004. (paper) [Vasu,Dane] (20 min)
  • 03/11: Hal Daume, J. Langford, and Daniel Marcu, Search-based Structured Prediction, Machine Learning, 2009. (paper) [Michael,Sudip] (45 min)
  • 03/16: Matthew Richardson, Pedro Domingos, Markov Logic Networks, Machine Learning, Vol. 62, Number 1-2, pp. 107-136, 2006. (paper) [Yue,Joel] (45 min)

Learning with Humans in the Loop

  • 03/30: T. Joachims, L. Granka, Bing Pan, H. Hembrooke, F. Radlinski, G. Gay. Evaluating the Accuracy of Implicit Feedback from Clicks and Query Reformulations in Web Search, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2 (April), 2007. (paper)
  • 04/01: Ben Carterette, Rosie Jones. Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks. NIPS, 2007. (paper) [CongCong] (20 min)
  • 04/01: F. Radlinski, M. Kurup, T. Joachims. How Does Clickthrough Data Reflect Retrieval Quality? CIKM, 2008. (paper)
  • 04/06: O. Chapelle and Y. Zhang. A dynamic Bayesian network click model for web search ranking. WWW Conference, 2009. (paper) [Michaela,Vikram] (20 min)
  • 04/06: E. Agichtein, E. Brill, S. T. Dumais and R. Ragno. Learning user interaction models for predicting web search preferences. SIGIR, 2006. (paper) [Christie,Jacob] (20 min)
  • 04/08: D. Beeferman, A. Berger. Agglomerative clustering of search engine query logs. KDD, 2000. (paper) [Cangmin,Ronan] (20 min)
  • 04/08: John Langford, Alexander Strehl, and Jennifer Wortman. Exploration Scavenging, ICML, 2008. (paper) [Nikos,Devin] (20 min)
  • 04/13: Yisong Yue, J. Broder, R. Kleinberg, T. Joachims. The K-armed Dueling Bandits Problem. [COLT, 2009], preprint of journal version. (paper)

Understanding Archives

  • 04/20: S. Pohl, F. Radlinski, T. Joachims. Recommending Related Papers Based on Digital Library Access Records. JCDL, 2007. (paper)
  • 04/20: S. Knoll, A. Hoff, D. Fischer, S. Dumais and E. Cutrell (2009). Viewing personal data over time. In Proceedings of CHI 2009 Workshop on Interacting with Temporal Data. AND also using the references therein. (paper) [Jimmy] (20 min)
  • 04/22: B. Shaparenko, T. Joachims, Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases, KDD), 2007. (paper)
  • 04/27: D. Blei, A. Ng, M. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research (JMLR), 3(5):993–1022, 2003. (paper) [Zhaoyin,Ainur] (45 min)
  • 04/29: J. Kleinberg. Bursty and Hierarchical Structure in Streams. KDD, 2002. (paper) [Amir] (20 min)
Other Reference Material
  • T. Mitchell, "Machine Learning", McGraw Hill, 1997.
  • B. Schoelkopf, A. Smola, "Learning with Kernels", MIT Press, 2001. (online)
  • C. Bishop, “Pattern Recognition and Machine Learning”, Springer, 2006.
  • R. Duda, P. Hart, D. Stork, "Pattern Classification“, Wiley, 2001.
  • T. Hastie, R. Tishirani, and J. Friedman, "The Elements of Statistical Learning“, Springer, 2001.
  • N. Cristianini, J. Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online)
  • C. Manning, H. Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online)
  • E. Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
Communication
TBD
Prerequisites
CS4780 or CS6780 or an introductory machine learning class. Basic knowledge of linear algebra, calculus, and probability theory. If you are unsure whether you fulfill the prerequisites, contact the instructor.
Grading
This is a project-focused class that is part lecture and part seminar. A key component of the class is a semester-long research project. The class can be taken either for letter grade or as pass fail. Audit is not allowed, unless you have very good arguments. Grades will be determined as follows:
  • Letter grade: project (50%), paper presentation (25%), quizzes and feedback (15%), discussion (10%)
  • Pass/Fail: paper presentation (50%), quizzes and feedback (30%), discussion (20%)
Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.