CS 678 Spring 2006

COURSE: COM-S 678 (096-006) Spring 2006
TITLE: Advanced Topics in Machine Learning
INSTRUCTOR: Rich Caruana (caruana@cs.cornell.edu)
SCHEDULE: MWF 1:25-2:15 Upson 109


Final Project Reports are due MONDAY, MAY 15.

The schedule below has been updated using the information from the sign-up sheet in Monday's class.  Please let me know ASAP if there is a problem.

Course Description:

This graduate-level course is aimed at students who already have a solid foundation in machine learning and want to delve deeper into topics currently under study in the research community. The course will focus on three areas of active research in machine learning: Ensemble Learning, Inductive Transfer, and Semi-Supervised Learning and Clustering.

ENSEMBLE LEARNING is a large collection of methods that yield improved generalization by training more than one model on each problem (e.g. train 100 SVMs instead of just one SVM), and then combine the predictions these models make by averaging, voting, or other methods. Ensemble learning methods include bagging, boosting, Bayesian averaging, random forests, error-correcting output codes, probing, and ensemble selection. These methods differ in how they train different models, how they combine model predictions, and in what kinds of guarantees they provide.

INDUCTIVE TRANSFER (a.k.a. multitask learning, lifelong learning, learning-to-learn, representation learning) is a class of learning algorithms that yield improved generalization by learning groups of related problems in series or in parallel. Generalization performance improves when what is learned for some of the learning problems is transfered to the other learning problems. Most of the research in inductive transfer focuses on how to transfer learned structure between related problems and how to characterize learning problems for which inductive transfer is most beneficial.

SEMI-SUPERVISED LEARNING (a.k.a. clustering with side information, meta-clustering, transduction, ...) is a collection of methods where partial supervision in the form of labels or constraints is provided for some training cases, but is missing for most training cases. In these problems learning often consists of clustering the data using the side information to guide the algorithm towards clusterings that are consistent with the auxiliary information. Research in semi-supervised learning is important because many real datasets have only partial labels or side information, yet is somewhat handicapped by the difficulty of objectively evaluating cluster quality.

Because most ensemble, inductive transfer, and semi-supervised learning methods are too new to be included in textbooks, much of the course will focus on reading recent research papers once a general introduction to each area has been given. Grading will be based on class participation and a significant final project. Possible final projects include applying one of the methods in a novel way to a novel problem, comparing competing methods on a set of test problems, implementing/testing a potential improvement to a method, or developing/testing a new method. Successful projects may yield publishable research.

Prerequisites: grade of B+ or higher in 478 or 578, or equivalent with permission of the instructor.

If you want to take the course please email me, or just show up for the first class next Monday at 1:25 in Upson 109



Jan 23: Administrivia and Introduction
Jan 25: Ensemble Selection
Jan 27: no class
Jan 30: Ensemble Selection
Feb 01: Ensemble Selection
Feb 03: Bias/Variance Decomposition
Feb 06: Bias/Variance, Bagging
Feb 08: Bagging and Boosting
Feb 10: Boosting
Feb 13: Multitask Learning
Feb 15: Multitask Learning
Feb 17: Multitask Learning
Feb 20: Bauer & Kohavi: Empirical Comparison of Ensemble Methods (Karan Singh & Nick Hamatake)
Feb 22: Random Forests (Robert Young & 
Feb 24: Meta Clustering and Semi-Supervised Learning
Feb 27: Ludmila, Kunchev & Whitaker: Measures of Diversity (TJ & Justin Wick)
Mar 01: Domingos: Bayesian Averaging (Daria Sorokina & Yunpeng)
Mar 03: Valentini & Dietterich: Bias-Variance Analysis of Support Vector Machines for the Development of SVM-Based Ensemble Methods (David Michael & Tim)
Mar 06: Stacking with MDTs (Yunsong Guo & Nam Nguyen)
Mar 08: O'Sullivan, Langford, Caruana: Feature Boosting (Caruana) & Ensemble Selection Code and Models
Mar 10: Collins & Schapire: Logistic Regression, AdaBoost and Bregman Distances (James Lenfestey & Yisong Yue)
Mar 13: J Wu, JM Rehg, MD Mullin: Rare Event Detection Cascade by Direct Feature Selection (Soo Yeon Lee & Michael Schmidt)
Mar 15: Ensemble Selection Totorial
Mar 17: TG Dietterich, G Bakiri: Solving Multiclass Learning Problems via Error-Correcting Output Codes (Art Munson & Alex Niculescu)
Mar 18-26: Spring Break
Mar 27: TG Dietterich: Ensemble Methods in Machine Learning  - Multiple Classifier Systems, 2000 (Muthiah Chettiar)
Mar 29: Ando & Zhang: "A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data"
http://www-cs-students.stanford.edu/~tzhang/papers/jmlr05_semisup.pdf  (TJ & Justin Wick)

Mar 31: Thrun. Is learning the nth thing any easier than learning the first? In Advances in NIPS, 640-646, 1996. (Dave Michael & Robert Jung)
Apr 03: paper review
Apr 05: Combining Labeled and Unlabeled Data with Co-Training.  (Muthiah & Yunpeng)
Apr 07: Selective Transfer of Task Knowledge Using Stochastic Noise Silver and McCracken, 2003, Advances in AI (Nam Nguyen & Yunsong) http://www.springerlink.com/link.asp?id=eb5jrhahbx3uq32g
Apr 10: Bakker & Heskes: Task Clustering and Gating for Bayesian Multitask Learning. (Art Munson & Daria Sorokina) http://www.ai.mit.edu/projects/jmlr/papers/volume4/bakker03a/bakker03a.pdf
Apr 12: Rosenstein, Marx, Kaelbling, Dietterich: To Transfer or Not to Transfer (Karan Singh & Nick Hamatake)
Apr 14: Ben-David, S., and Schuller, R., "Exploiting Task Relatedness for Multitask Learning" Proceedings of COLT 2003. (Alex Niculescu & Tim Harbers)
Apr 17: Kai Yu & Volker Tresp: Learning to Learn and Collaborative Filtering (Aaron & Yisong) http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/yu+tresp-2005.pdf
Apr 19: 
Apr 21: Alex Niculescu: Multitask Bayes Net Structure Learning (Alex)
Apr 24: Caruana, Elhawary, Nuguyen, Smith: Meta Clustering (Nuguyen)
Apr 26: Kiri Wagstaff, Claire Cardie, Seth Rogers, and Stefan Schroedl: Constrained K-means Clustering with Background Knowledge (Michael Schmidt, Robert Jung, TJ)
Apr 28: Eric P. Xing, Andrew Y. Ng, Michael I. Jordan and Stuart Russell: Distance Metric Learning with Application to Clustering with Side-Information in NIPS 03 (Yunsong, Yunpeng, Yisong)
May 01: Linli Xu, James Neufeld, Bryce Larson, Dale Schuurmans: Maximum Margin Clustering (NIPS 2004) http://books.nips.cc/papers/files/nips17/NIPS2004_0834.pdf (Tim Harbers, Aaron, Soo Yeon Lee)
May 03: S Basu, A Banerjee, RJ Mooney: Semi-supervised Clustering by Seeding http://www.cs.utexas.edu/users/ml/papers/semi-icml-02.ps.gz (Karan, Nick, Dave Michael)
May 05: Learning from Labeled and Unlabeled Data using Graph Mincuts www.cs.cmu.edu/~shuchi/papers/mincut.ps (David Siegel, Muthiah, Daria)

Lecture Notes

Ensemble Selection Slides (ensemble.selection.slides)

Bias/Variance, Bagging, Boosting Slides (bias.variance.bagging.boosting.slides)

Performance Measures (background infor for those not familiar with ROC, Precision/Recall, ...) (performance.measure.slides)

Inductive Transfer and Multitask Learning Slides (mtl.slides)

Meta Clustering and Semi-Supervised Learning (semisup.slides)

Empirical Comparison by Bauer & Kohavi  (bauer.slides)

Ensemble Selection Compact 5k Libraries (ES.small.libs.tar).  Ensemble selection code (shotgun.dist.tar.gz)

Semi-Supervised Learning/Clustering Papers

X Zhu: Semi-Supervised Learning Literature Survey

G Fung, OL Mangasarian: Semi-supervised support vector machines for unlabeled data classification, Optimization Methods and Software, 2001


Alexander Strehl, Joydeep Ghosh: Cluster Ensembles Knowledge Reuse Framework for Combining Partitionings

Xiaojin Zhu, Zoubin Ghahramani, John Lafferty: Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions

Transductive Learning via Spectral Graph Partitioning

Adaptation of Maximum Entropy Capitalizer: Little Data Can Help a Lot

D Zhou, O Bousquet, TN Lal, J Weston, B Scholkopf: Learning with local and global consistency, NIPS*2004

RK Ando, T Zhang: A High-Performance Semi-Supervised Learning Method for Text Chunking 

A Blum, J Lafferty, MR Rwebangira, R Reddy: Semi-Supervised Learning Using Randomized Mincuts, ICML*2004

Kristen P. Bennett: Semi-Superivised Support Vector Machines

Jing Gao: Semi-Supervised Clustering with Partial Background Information

X. Z. Fern, C. E. Brodley: Solving Clustering Ensemble Problems by Bipartite Grpah Partitioning (ICML 2004)

Lafferty and Zhu: Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning 

Sugato Basu, Mikhail Bilenko, and Raymond J. Mooney: A Probabilistic Framework for Semi-Supervised Clustering (Best Research Paper Award KDD-2004) 

Integrating constraints and metric learning in semi-supervised clustering 

T. Finley and T. Joachims: Supervised Clustering with Support Vector Machines in ICML05 (distinguished student paper award)

Inductive Transfer Papers

Neil D. Lawrence, John C. Platt, "Learning to Learn with the Informative Vector Machine". International Conference on Machine Learning, Paper 65, 2004. http://delivery.acm.org/10.1145/1020000/1015382/p178-lawrence.pdf?key1=1015382&key2=8406218311&coll=GUIDE&dl=GUIDE&CFID=63282663&CFTOKEN=21052308

Joshua Tenebaum, Thomas Griffiths. (2001) "Structure learning in human causal induction". Advances in Neural Information Processing Systems 13. Cambridge, MA: MIT Press. 

Composition of Conditional Random fields for transfer learning http://www.cs.umass.edu/~mccallum/papers/transfer-emnlp05.pdf

Semiparametric Latent Factor Models http://www.cs.berkeley.edu/~jordan/papers/teh-seeger-jordan04.pdf

Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement Schmidhuber, Zhao, and Wiering, 1997, Journal of Machine Learning http://www.springerlink.com/link.asp?id=l550222682001578

Jonathan Baxter. "A model of inductive bias learning". JAIR, 12, 149-198, 2000. http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume12/baxter00a.pdf

Learning to Learn and Collaborative Filtering Kai Yu & Volker Tresp http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/yu+tresp-2005.pdf

Inductive Transfer using Kernel Multitask Latent Analysis Z. Xiang and K. P. Bennett http://iitrl.acadiau.ca/itws05/Papers/ITWS17-XiangBennett_REV.pdf

Learning Multiple Tasks with Kernel Methods Theodoros Evgeniou, Charles A. Micchelli and Massimiliano Pontil http://delivery.acm.org/10.1145/1090000/1088693/6-615-evgeniou.pdf?key1=1088693&key2=6965218311&coll=GUIDE&dl=GUIDE&CFID=63282663&CFTOKEN=21052308


Pengcheng Wu, Thomas Dietterich, Improving SVM Accuracy by Training on Auxiliary Data Sources (ICML 2004).

"Learning Gaussian Processes from Multiple Tasks" http://www.machinelearning.org/proceedings/icml2005/papers/128_GaussianProcesses_YuEtAl.pdf

"Regularized multi-task learning." http://www.cs.berkeley.edu/~russell/classes/cs294/f05/papers/evgeniou+pontil-2004.pdf

Christophe Giraud-Carrier, Ricardo Vilalta, Pavel Brazdil. (2004) "Introduction to the Special Issue on Meta-Learning". Mach. Learn. 54, 3 (Mar. 2004), 187-193.

"Transfer in Variable-Reward Hierarchical Reinforcement Learning" Neville Mehta, Sriraam Natarajan, Prasad Tadepalli, Alan Fern http://iitrl.acadiau.ca/itws05/Papers/ITWS14-Mehta-vrhrl_REF._pdf

Transfer Learning of Object Classes: From Cartoons to Photographs Geremy Heitz, Gal Elidan, Daphne Koller http://iitrl.acadiau.ca/itws05/Papers/ITWS06-Hietz_REV.pdf

Benefitting from the variables that variable selection discards Rich Caruana, Virginia R. de Sa https://portal.acm.org/poplogin.cfm?dl=ACM&coll=GUIDE&comp_id=944972&want_href=delivery%2Ecfm%3Fid%3D944972%26type%3Dpdf&CFID=63303065&CFTOKEN=46354793&td=1138150069390

Ensemble Learning Papers

Leo Breiman. "Bagging Predictors," Machine Learning, 24, 123-140, 1996.

Rich Caruana, Alex Niculescu, Geoff Crew, and Alex Ksikes, "Ensemble
Selection from Libraries of Models," The International Conference on
Machine Learning (ICML'04), 2004.

Eric Bauer and Ron Kohavi
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Random Forests by Leo Breiman, Machine Learning Vol 45 No.1 Oct 2001.

Ensemble Methods in Machine Learning TG Dietterich - Multiple
Classifier Systems, 2000

Robert E. Schapire. "The Boosting Approach to Machine Learning: An
Overview," MSRI Workshop on Nonlinear Estimation and Classification.

Friedman, and Popescu, "Predictive Learning via Rule Ensembles."
<http://www-stat.stanford.edu/~jhf/ftp/RuleFit.pdf> (Feb. 2005)

O'Sullivan, Langford, Caruana, and Blum. "Feature Boosting" ICML2000.

A comparison of stacking with MDTs to bagging, boosting, and other stacking
methods. http://ai.ijs.si/bernard/mdts/pub03.pdf
In this paper, we present an integration of the algorithm MLC4.5 for
learning meta decision trees (MDTs) into the Weka data mining
suite. MDTs are a method for combining multiple classifiers. Instead
of giving a prediction, MDT leaves specify which classifier should be
used to obtain a prediction. The algorithm is based on the C4.5
algorithm for learning ordinary decision trees. An extensive
performance evaluation of stacking with MDTs on twenty-one data sets
has been performed. We combine base-level classifiers generated by
three learning algorithms: an algorithm for learning decision trees, a
nearest neighbor algorithm and a naive Bayes algorithm. We compare
MDTs to bagged and boosted decision trees, and to combined classifiers
with voting and three different stacking methods: with ordinary
decision trees, with naive Bayes algorithm and with multi-response
linear regression as a meta-level classifier. In terms of performance,
stacking with MDTs gives better results than other methods except when
compared to stacking with multi- response linear regression as a
meta-level classifier; the latter is slightly better than MDTs.

Ludmila I. Kunchev, Christopher J. Whitaker: Measures of Diversity in
Classifier Ensembles and Their Relationship with the Ensemble Accuracy
(in Machine Learning Vol 51/2 (2003))

Solving Multiclass Learning Problems via Error-Correcting Output Codes
TG Dietterich, G Bakiri - Arxiv preprint cs.AI/9501101, 1995

"Bayesian Averaging of Classifers and the Overfitting Problem"

"Logistic Regression, AdaBoost and Bregman Distances"

Learning a Rare Event Detection Cascade by Direct Feature Selection
(looks like an interesting modification of Viola/Jones technique)
J Wu, JM Rehg, MD Mullin - Advances in Neural Information Processing
Systems, 2004

Bias-Variance Analysis of Support Vector Machines for the Development
of SVM-Based Ensemble Methods
<http://portal.acm.org/citation.cfm?id=3D1016783&coll=3DPortal&dl=3D=GUIDE&CFID=3D63305426&CFTOKEN=3D19334353>by Georgio Valentini, Thomas
G Dietterich, Journal Of Machine Learning Research, Vol 5, Dec 2004, MIT Press.

Learning Ensembles from Bites: A Scalable and Accurate Approach Nitesh
V. Chawla, Lawrence O. Hall, Kevin W. Bowyer, W. Philip Kegelmeyer

Theoretical Views of boosting and applications