Topics in Machine Learning
Learning to Predict Structured Objects

COM S 778 - FALL 2006
Cornell University
Department of Computer Science

 
Time and Place
  First lecture: August 29th, 2006
Last lecture: November 28th, 2006
  • Tuesday, 1:25pm - 2:40pm in Upson 5126
Instructor
  Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.
Syllabus
  Over the last decade, much of the research on discriminative learning has focused on problems like classification and regression, where the prediction is a single univariate variable. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise, for example, when a natural language parser needs to predict the correct parse tree for a given sentence, when predicting which ranking to return in response to a query, when predicting the alignment between two proteins, or when one needs to optimize a performance measure like the F1-score. Over the recent years, the first discriminative learning algorithms have been developed for these structured prediction problems, leading to an emerging area of research. The course will review the state of the art, tie existing approaches together, and identify new applications and open questions.
Preparing the Presentation
  Your presentations should be a self-contained summary of the paper, adding background and examples as necessary. Plan to speak for about 20 minutes. Please schedule a meeting with me so that I can give you feedback on your presentation on or before the Friday preceding your presentation.
Prerequisites
  The course assumes basic knowledge of machine learning as covered in either COM S 478 or COM S 578. If you did not take either of these classes, please come talk to me.
Grading
  This is a 3-credit course which can either be taken Pass/Fail or for letter grade (no audit). The course will consist of some faculty lectures providing background and paper presentations by students. Every student has to present a paper, and everybody is expected to have read the assigned paper before the lecture. There will be short quizzes about the paper to motivate careful reading. Pass/Fail will be determined based on the presentation, class participation, and the quizzes. Students taking the class for letter grade will do a class project in addition.
Academic Integrity
  This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.
References
 

Background References on Machine Learning

Ø      Author: Tom Mitchell
Title: "Machine Learning"
Publisher: McGraw Hill, 1997

Ø     Author: Ethem Alpaydin
Title: "Introduction to Machine Learning"
Publisher: MIT Press, 2004


Ø      Authors: Christopher Manning and Hinrich Schutze
Title: "Foundations of Statistical NLP"
Publisher: MIT Press, 1999

Ø      Authors: R. Duda, P. Hart, D. Stork
Title: "Pattern Classification"
Publisher: Wiley, 2001

Ø      Authors: T. Hastie, R. Tishirani, and J. Friedman
Title: "The Elements of Statistical Learning"
Publisher: Springer, 2001

Ø     Author: B. Schoelkopf and A. Smola
Title: "Learning with Kernels"
Publisher: MIT Press, 2001


Overviews and Tutorials

Ø      Author: Thomas G. Dietterich
Title: Machine Learning for Sequential Data: A Review
Journal::
In Structural, Syntactic, and Statistical Pattern Recognition; Lecture Notes in Computer Science, Vol. 2396, T. Caelli (Ed.), pp. 15–30
Publisher: Springer-Verlag, 2002


Ø      Author: Ben Taskar
Title: Max-Margin Methods for NLP: Estimation, Structure, and Applications
Proceedings:
Tutorial at the Conference of the Association for Computational Linguistics (ACL05), Ann Arbor, MI, June 2005

Ø      Author: Thorsten Joachims
Title: Learning to Predict Trees, Sequences, and other Structured Outputs
Proceedings:
Tutorial at NESCAI, 2006

Ø      Author: H. Wallach
Title: Conditional Random Fields: An Introduction.

Predicting Sequences

Ø      ***Authors: Andrew McCallum, Dayne Freitag, and Fernando Pereira
Title: Maximum entropy Markov models for information extraction and segmentation
Proceedings: International Conference on Machine Learning (ICML 2000), pages 591-598, Stanford, California, 2000


Ø     Authors: Yasemin Altun, Thomas Hofmann, Mark Johnson
Title: Loss Functions and Optimization Methods for Discriminative Learning of Label Sequences
Proceedings: Empirical Methods in Natural Language Processing (EMNLP), 2003


Ø     Authors: Yasemin Altun, Thomas Hofmann & Mark Johnson
Title: Discriminative Learning for Label Sequences via Boosting
Proceedings: Advances in Neural Information Processing Systems (NIPS*15), 2003


Ø     Authors: Jospeh Bockhorst and Mark Craven
Title: Markov Networks for Detecting Overlapping Elements in Sequence Data.
Proceedings: Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.


Neural Networks

Ø     ***Authors: Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner
Title: Gradient-based learning applied to document recognition.
Journal: IEEE, 86(11):2278{2324} 1998


Ø      ***Authors: LeCun and Huang
Title: Loss Functions for Discriminative Training of Energy-Based Models
Proceedings: AI-Stats, 2005


Structural SVMs and Maximum Margin Markov Networks

Ø      Authors: Yasemin Altun and Thomas Hofmann
Title: Large Margin Methods for Label Sequence Learning.
Proceedings: 8th European Conference on Speech Communication and Technology (EuroSpeech), 2003.


Ø     Authors: Yasemin Altun, Ioannis Tsochantaridis and Thomas Hofmann
Title: Hidden Markov Support Vector Machines
Proceedings: 20th International Conference on Machine Learning (ICML), 2003

 

Ø      Author: T. Joachims
Title: Learning to Align Sequences: A Maximum-Margin Approach
Technical Report, 2003

 

Ø      ***Author: Ben Taskar, Carlos Guestrin and Daphne Koller
Title: Max-Margin Markov Networks.
Proceedings: In Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

 

Ø      Authors: Simon Lacoste-Julien
Title: Combining SVM with graphical models for supervised classification: an introduction to Max-Margin Markov Networks
CS281A Project Report, UC Berkeley, 2003

 

Ø     Authors: Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun
Title:
Support Vector Machine Learning for Interdependent & Structured Output Spaces,
Proceedings: International Conference on Machine Learning (ICML), 2004.

 

Ø      *** Authors: Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher Manning
Title: Max-Margin Parsing.
Proceedings: EMNLP 2004.

 

Ø      Author: Ben Taskar
Title: Learning Structured Prediction Models: A Large Margin Approach.
Publisher: Stanford University, CA, December 2004.

Ø     Author: Peter Bartlett, Michael Collins, Ben Taskar, and David McAllester
Title: Exponentiated gradient algorithms for large-margin structured classification.
Proceedings: NIPS04, 2005.

 

Ø     Authors: T. Finley and T. Joachims
Title:
Supervised Clustering with Support Vector Machines,
Proceedings: International Conference on Machine Learning (ICML), 2005.

 

Ø      Authors: T. Joachims, T. Galor, and R. Elber
Titles:
Learning to Align Sequences: A Maximum-Margin Approach,
Book:
New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Publisher: Springer, 2005.

 

Ø      Authors: I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun
Title:
Large Margin Methods for Structured and Interdependent Output Variables,
Journal: Journal of Machine Learning Research (JMLR),
6(Sep):1453-1484, 2005.

 

Ø     Author: B. Taskar, S. Lacoste-Julien, and D. Klein,
Title: A Discriminative Matching Approach to Word Alignment
Proceedings: Empirical Methods in Natural Language Processing (EMNLP05),

 

Ø     Author: B. Taskar, V. Chatalbashev, D. Koller and C. Guestrin.
Title: Learning Structured Prediction Models: A Large Margin Approach.  
Proceedings: International Conference on Machine Learning (ICML05), 2005.

 

Ø      ***Author: D. Anguelov, B. Taskar, V. Chatalbashev, D. Koller, D. Gupta, G. Heitz, A. Ng.
Title: Discriminative Learning of Markov Random Fields for Segmentation of 3D Scan Data.  
Proceedings: International Conference on Computer Vision and Pattern Recognition (CVPR05), San Diego, CA, June 2005.

 

Ø     Authors: Ryan McDonald, Koby Crammer, and Fernando Pereira
Title: Online large-margin training of dependency parsers.
Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2005), July 2005.

 

Ø     Author: B. Taskar, S. Lacoste-Julien, and M. Jordan,
Title: Structured Prediction via the Extragradient Method
Proceedings: Neural Information Processing Systems Conference (NIPS05), Vancouver, British Columbia, 2006. [Longer version]


Perceptron and Reranking

Ø     Authors: Brian Roark, Murat Saraclar, Michael Collins, and Mark Johnson
Title: Discriminative language modeling with conditional random fields and the perceptron algorithm.
Proceedings: ACL 2004.

Ø      *** Author: Michael Collins
Title: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms.
Proceedings: EMNLP 2002.
(This paper includes theorems and proofs which apply to the algorithms in the ACL 2002 papers.)

 

Ø     Authors: Michael Collins and Terry Koo
Title: Discriminative Reranking for Natural Language Parsing. (gzipped version)
Journal: Computational Linguistics 31(1):25-69.

 

Ø     Authors: Michael Collins and Nigel Duffy
Title: New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron
Proceedings: ACL 2002.

 

Ø      Author: Michael Collins
Title: Ranking Algorithms for Named-Entity Extraction: Boosting and the Voted Perceptron.
Proceedings: ACL 2002.

 

Ø     Author: Michael Collins
Title: Discriminative Reranking for Natural Language Parsing.
Proceedings: ICML 2000.


Gaussian Process

Ø      Authors: Yasemin Altun, Thomas Hofmann and Alexander J. Smola
Title: Gaussian process classification for segmenting and annotating sequences.
Proceedings: International Conference on Machine Learning (ICML 2004), 2004.


Conditional Random Fields

Ø      ***Authors: John Lafferty, Andrew McCallum, Fernando Pereira
Title: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data.
Proceedings: International Conference on Machine Learning (ICML-2001), 2001.

 

Ø      Authors: Fei Sha and Fernando Pereira
Title: Shallow Parsing with Conditional Random Fields.
Proceedings: Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL-03), 2003.

 

Ø     Author: Andrew McCallum
Title: Efficiently Inducing Features of Conditional Random Fields.
Proceedings: Conference in Uncertainty in Articifical Intelligence (UAI-2003), 2003.

 

Ø     Authors: David Pinto, Andrew McCallum, Xing Wei and W. Bruce Croft
Title: Table Extraction Using Conditional Random Fields.
Proceedings: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2003), 2003.

 

Ø     Authors: Andrew McCallum and Wei Li
Title: Early Results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons.
Proceedings: Conference on Natural Language Learning (CoNLL), 2003.

 

Ø     Authors: Andrew McCallum, Khashayar Rohanimanesh and Charles Sutton
Title: Dynamic Conditional Random Fields for Jointly Labeling Multiple Sequences.
Proceedings: Workshop on Syntax, Semantics, Statistics; 16th Annual Conference on Neural Information Processing Systems (NIPS 2003), 2004.

 

Ø     Authors: Kevin Murphy, Antonio Torralba and William T.F. Freeman
Titles: Using the forest to see the trees: a graphical model relating features, objects and scenes.
Proceedings: Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

 

Ø      Authors: Sanjiv Kumar and Martial Hebert
Title: Discriminative Fields for Modeling Spatial Dependencies in Natural Images.
Proceedings: Advances in Neural Information Processing Systems 16 (NIPS 2003), 2004.

 

Ø      Author: Burr Settles
Title: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets.
Proceedings: International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), 2004.

 

Ø     Authors: Charles Sutton, Khashayar Rohanimanesh and Andrew McCallum
Title: Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data.
Proceedings: International Conference on Machine Learning (ICML 2004), 2004.

 

Ø     Authors: John Lafferty, Xiaojin Zhu and Yan Liu
Title: Kernel conditional random fields: representation and clique selection.
Proceedings: International Conference on Machine Learning (ICML 2004), 2004.

 

Ø     Authors: Xuming He, Richard Zemel, and Miguel Á. Carreira-Perpiñán
Title: Multiscale conditional random fields for image labelling.
Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2004), 2004.

 

Ø     Author: Yasemin Altun, Alex J. Smola, Thomas Hofmann
Title:
Exponential Families for Conditional Random Fields.
Proceedings: Conference on Uncertainty in Artificial Intelligence (UAI-2004), 2004.

 

Ø     Authors: Michelle L. Gregory and Yasemin Altun
Title: Using Conditional Random Fields to Predict Pitch Accents in Conversational Speech.
Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004.

 

Ø      Author: Brian Roark, Murat Saraclar, Michael Collins and Mark Johnson
Title: Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm.
Proceedings: Annual Meeting of the Association for Computational Linguistics (ACL 2004), 2004.

 

Ø     Author: Trausti T. Kristjansson, Aron Culotta, Paul Viola and Andrew McCallum
Title: Interactive Information Extraction with Constrained Conditional Random Fields.
Proceedings: National Conference on Artificial Intelligence (AAAI 2004), 2004.

 

Ø     Author: Thomas G. Dietterich, Adam Ashenfelter and Yaroslav Bulatov
Title: Training Conditional Random Fields via Gradient Tree Boosting.
Proceedings: International Conference on Machine Learning (ICML 2004), 2004.

 

Ø      ***Author: Fuchun Peng and Andrew McCallum
Title: Accurate Information Extraction from Research Papers using Conditional Random Fields.
Proceedings: Human Language Technology Conference and North American Chapter of the Association for Computational Linguistics (HLT/NAACL-04), 2004.

 

Ø      Author: Ariadna Quattoni, Michael Collins and Trevor Darrel
Title: Conditional Random Fields for Object Recognition.
Proceedings: Advances in Neural Information Processing Systems 17 (NIPS 2004), 2005.

 

Ø      Authors: Yuan Qi, Martin Szummer and Thomas P. Minka
Title: Bayesian Conditional Random Fields.
Proceedings: International Workshop on Artificial Intelligence and Statistics (AISTATS 2005), 2005.

 

Ø      Authors: Aron Culotta, David Kulp and Andrew McCallum
Title: Gene Prediction with Conditional Random Fields.
Technical Report UM-CS-2005-028. University of Massachusetts, Amherst, 2005.

 

Ø      Authors: Yang Wang and Qiang Ji
Title: A Dynamic Conditional Random Field Model for Object Segmentation in Image Sequences.
Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), Volume 1, 2005.

 

Ø     Author: Ryan McDonald and Fernando Pereira
Title: Identifying gene and protein mentions in text using conditional random fields
Journal: BMC Bioinformatics, 6(Suppl 1):S6, May 2005.

 

Ø     Author: Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan
Title:
Identifying Sources of Opinions with Conditional Random Fields and Extraction Patterns.
Proceedings: HLT-EMNLP, 2005.

 

Ø     Authors: S.V.N. Vishwanathan, Nicol N. Schraudolph, Mark W. Schmidt, Kevin P. Murphy,
Title: Accelerated Training of Conditional Random Fields with Stochastic Gradient Methods,
Proceedings: ICML, 2006.

 

Post-Processing to Enforce Dependencies

Ø      ***Authors: Punyakanok, V., Roth, D., Yih, W., & Zimak, D. (2004).
Title: Semantic role labeling via integer linear programming inference.
Proceedings: COLING-2004.

 

Ø      Authors: D. Roth and W. Yih
Title: Integer Linear Programming Inference for Conditional Random Fields.
Proceedings: International Conference on Machine Learning (ICML)  (2005) pp. 737--744 

 

Ø      Authors: V. Punyakanok, D. Roth, W. Yih, and D. Zimak
Title: Learning and Inference over Constrained Output.
Proceedings: International Joint Conference on Artificial Intelligence (IJCAI)  (2005) pp. 1124--1129

 

Learning a Search Heuristic

Ø      ***Authors: Hal Daume and Daniel Marcu
Title: Learning as Search Optimization: Approximate Large Margin Methods for Structured Prediction
Proceedings: International Conference on Machine Learning (ICML), 2005.

 

Ø      Authors: Hal Daume, John Langford, and Daniel Marcu
Title: Search-based Structured Prediction
Unpublished.

 

Kernel Dependency Estimation

Ø      ***Authors: J. Weston, O. Chapelle, A. Elisseeff, B. Schoelkopf and V. Vapnik
Title: "Kernel Dependency Estimation"
Proceedings: NIPS 2002.

 

Ø      Authors: G. Bakir, J. Weston and B. Schölkopf
Title: "Learning to find Pre-Images"
Proceedings: NIPS, 2003

 

Ø      Authors: J. Kwok, I. Tsang
Title: "The Pre-Image Problem in Kernel Methods"
Journal: IEEE Transactions on Neural Networks, Vol. 15, No. 6, 1517 - 1525, 2004.

 

Ø      Authors: J. Weston, B. Schoelkopf and O. Bousquet
Title: "Joint Kernel Maps"
Proceedings: From the 8th International Work-Conference on Artificial Neural Networks, LNCS 3512, 176-191.

 

Ø      Authors: Corinna Cortes, Mehryar Mohri and Jason Weston
Title: "A General Regression Technique for Learning Transductions"
Proceedings: ICML, 2005

 

Ø      Author: G. Bakir
Title: "Extensions to Kernel Dependency Estimation"
Thesis, 2006.

 

Semi-Supervised Learning and Transduction

Ø      Authors: John Lafferty, Yan Liu and Xiaojin Zhu.
Title: Kernel Conditional Random Fields: Representation, Clique Selection, and Semi-Supervised Learning.
Journal: Technical Report CMU-CS-04-115, Carnegie Mellon University, 2004.

 

Ø     Author: Terry Koo and Michael Collins
Title: Hidden-Variable Models for Discriminative Reranking.
Proceedings: EMNLP 2005.

 

Ø     ***Authors: Ulf Brefeld, Tobias Scheffer.
Title: Semi-Supervised Learning for Structured Output Variables,
Proceedings: ICML, 2006.

 

Ø      ***Authors: Linli Xu, Dana Wilkinson, Finnegan Southey, Dale Schuurmans
Title: Discriminative Unsupervised Learning of Structured Predictors
Proceedings: ICML, 2006.

 

Reinforcement Learning

Ø      ***Authors: Pieter Abbeel and Andrew Y. Ng.
Title: Apprenticeship learning via inverse reinforcement learning
Proceedings: International Conference on Machine Learning, 2004.

 

Ø     Authors: Nathan D. Ratliff, J. Andrew Bagnell, Martin A. Zinkevich,
Title: Maximum Margin Planning,
Proceedings: ICML, 2006.

 

Loss Functions

Ø      Authors: Lijuan Cai, Thomas Hofmann,
Title: Hierarchical Document Categorization with Support Vector Machines,

Proceedings: ACM 13th Conference on Information and Knowledge Management, 2004.

 

Ø     Author: T. Joachims
Title:
A Support Vector Method for Multivariate Performance Measures,
Proceedings: International Conference on Machine Learning (ICML), 2005.

 

Ranking

Ø     Authors: William W. Cohen, Rob Schapire, Yoram Singer
Title: Learning to Order Things in J. Artif. Intell. Res. (JAIR) 10: 243-270 (1999). (Originally published as: William W. Cohen, Robert E. Schapire, Yoram Singer (1997): Learning to Order Things
Proceedings: NIPS 1997.

 

Ø     Authors: Ralf Herbrich, Thore Graepel, Klaus Obermayer
Title: Support Vector Learning for Ordinal Regression
Proceedings: International Conference on Artificial Neural Networks 97--102

 

Ø      Authors: Ralf Herbrich, Thore Graepel, Klaus Obermayer
Title: Large Margin Rank Boundaries for Ordinal Regression
Book: Advances in Large Margin Classifiers 115--132, 2000.

Ø      Authors: Koby Crammer and Yoram Singer,
Title: PRanking with Ranking
Proceedings: Neural Information Processing Systems (NIPS), 2001.

 

Ø     Author: T. Joachims
Title:
Optimizing Search Engines Using Clickthrough Data,
Proceedings: ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002.

 

Ø      Author: Yoav Freund, Raj D. Iyer, Robert E. Schapire, Yoram Singer:
Title: An Efficient Boosting Algorithm for Combining Preferences
Journal: Journal of Machine Learning Research 4: 933-969 (2003).

 

Ø      ***Authors: Marie desJardins, Eric Eaton, Kiri L. Wagstaff,
Title: Learning User Preferences for Sets of Objects,
Proceedings: ICML, 2006.

 

Alignment and Translation

Ø      Author: T. Joachims
Title: Learning to Align Sequences: A Maximum-Margin Approach
Technical Report, 2003

 

Ø      Author: T. Joachims, T. Galor, and R. Elber
Title: Learning to Align Sequences: A Maximum-Margin Approach
Book: New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Springer, 2005.

 

Ø     Authors: Luke S. Zettlemoyer and Michael Collins
Title: Learning to Map Sentences to Logical Form: Structured Classification with Probabilistic Categorial Grammars.
Proceedings: UAI 2005.

 

Ø      ***Authors: Brooke Cowan, Ivona Kucerova, and Michael Collins
Title: A Discriminative Model for Tree-to-Tree Translation.
Proceedings: EMNLP 2006.

 

Ø      ***Author: P. Liang, Alexandre Bouchard-Cote, D. Klein and B. Taskar
Title: An End-to-End Discriminative Approach to Machine Translation,
Proceedings: Association for Computational Linguistics (ACL06), Sydney, Australia, July 2006.

 

Ø      ***Author: P. Liang, B. Taskar, and D. Klein.
Title: Alignment by Agreement,
Proceedings: Human Language Technology conference and North American chapter of the Association for Computational Linguistics (HLT-NAACL06), New York, June 2006.

 

Ø      ***Authors: Andrew McCallum, Kedar Bellare, and Fernando Pereira
Title: A conditional random field for discriminatively-trained finite-state string edit distance.
Proceedings: Conference on Uncertainty in Artificial Intelligence (UAI 2005), July 2005.

 

Ø     ***Authors: Chuong B. Do, Samuel S. Gross, and Serafim Batzoglou
Title: CONTRAlign: Discriminative Training for Protein Sequence Alignment,
Proceedings: RECOMB, 2006.

 

Ø     Authors: Chun-Nam Yu, T. Joachims and R. Elber
Title: Training Protein Threading Models Using Structural SVMs,
Proceedings: ICML Workshop on Learning in Structured Output Spaces, 2006.