NSF-Project IIS-0412894

Cornell University

Department of Computer Science

Over the last decade, research on discriminative learning methods like Support Vector Machines (SVMs) and Boosting has raised the state of the art in machine learning not only with respect to prediction accuracy, but also in terms of theoretical understanding and robustness. However, so far almost all of this research has been limited to problems of classification and regression. But what if the object we want to predict is not a single class or a real number or predict a complex object like a tree, a sequence, or a set of dependent labels? Such problems are ubiquitous, for example, in natural language parsing, information extraction, and text classification.

The project will extend highly successful learning methods--in particular large-margin methods like support vector machines (SVMs)--to the problem of predicting such multivariate and interdependent outputs. In particular, this project will produce methods that can handle three types of dependencies: structure, correlation, and inductive dependencies. The intellectual merit of this project is the development of methods, their underlying theory, and efficient algorithms that can handle and exploit dependencies in complex outputs. Broader impact will come from applied work in several domains (e.g., bioinformatics, computational lingusitics), as well as from making software implementations of the algorithms publicly available for teaching and research in applied fields.

- Thorsten Joachims (PI)
- Rich Caruana (Co-PI)
- Thomas Finley (Ph.D. Student)
- Chun-Nam Yu (Ph.D. Student)

[Joachims/06a] |
T. Joachims, Training Linear SVMs in Linear Time,
Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2006.[Postscript] [PDF] [Software] KDD
Best Paper Award) |

[Yu/Joachims/06a] |
Chun-Nam Yu
and T. Joachims, Training Protein Threading Models Using Structural SVMs,
ICML Workshop on Learning in Structured Output Spaces, 2006.[PDF] |

[Joachims/05a] |
T. Joachims, A Support Vector Method for
Multivariate Performance Measures, Proceedings of the International
Conference on Machine Learning (ICML), 2005.[Postscript] [PDF] [Software] ICML Best Paper Award) |

[Finley/Joachims/05a] |
T. Finley and T. Joachims, Supervised Clustering
with Support Vector Machines, Proceedings of the International
Conference on Machine Learning (ICML), 2005.[Postscript] [PDF] ICML
Outstanding Student Paper Award) |

[Joachims/Hopcroft/05a] |
T. Joachims and J. Hopcroft, Error Bounds for
Correlation Clustering, Proceedings of the International Conference on
Machine Learning (ICML), 2005.[Postscript] [PDF] |

[Tsochantaridis/etal/05a] |
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y.
Altun, Large Margin Methods for Structured and Interdependent
Output Variables, Journal of Machine Learning
Research (JMLR), 6(Sep):1453-1484, 2005. [PDF] |

[Tsochantaridis/etal/04a] |
I. Tsochantaridis,
T. Hofmann, T. Joachims, and Y. Altun, |

This material is based upon work supported by the National Science Foundation under CAREER Award No. 0412894. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation (NSF).