What: Advanced Topics in Empirical Machine Learning
When: MWF 1:25pm-2:15pm
Where: Upson 111
Who: Rich Caruana
Why: Time to write a textbook on empirical machine learning
Jan 22: Administrivia and Introduction (Caruana)
Jan 24: Empirical Comparison of Learning Methods (Caruana) (slides)
Jan 26: Caruana & Niculescu-Mizil: Empirical Comparison of Learning Methods
(Caruana)
Jan 29: Niculescu-Mizil & Caruana: Predicting Good Probabilities with
Supervised Learning (Caruana)
Jan 31: Platt: Probabilistic Outputs for SVMs and Comparison to Regularized
Likelihood Methods (Nikos Karampatziakis) (slides)
Feb 02: Data Sets and Learning Methods for High-D Empirical Study
Feb 05: Drish: Obtaining Calibrated Probability Estimates from SVMs (Amit Belani)
(slides)
Feb 07: Zadrozny & Elkan: Transforming Classifier Scores into Accurate
Multiclass Probability Estimates (Myle Ott) (slides)
Feb 09: Data Sets and Learning Methods for High-D Empirical Study
Feb 12: Provost & Fawcett: Analaysis and Visualization of Classifier
Performance: Comparison under Imprecise Class and Cost Distributions (Ramazan
Bitirgen) (slides)
Feb 14: classes canceled due to snow
Feb 16: Fawcett & Niculescu-Mizil: Technical Note: PAV and the ROC Convex
Hull (Lars Backstrom)
Feb 19: Empirical Comparison of Learning Methods (Caruana) (same slides as
above)
Feb 21: Empirical Comparison of Learning Methods (Caruana) (same slides as
above)
Feb 23: Class Project
Feb 26: Margineantu, D. D. and Dietterich, T. G. (2002): Improved class
probability estimates from decision tree models (Michael Friedman)
Feb 28: Dietterich, T. G., (1998): Approximate Statistical Tests for Comparing
Supervised Classification Learning Algorithms. Neural Computation, 10 (7)
1895-1924. Postscript
preprint. (Revised December 30, 1997).
Mar 02: Class Project
Mar 05: Model Compression (Caruana)
Mar 07: Data Mining in Metric Space (Caruana)
Mar 09: Statlog
Mar 12: Statlog
Mar 14: Lowds & Domingo: Naive Bayes Probability Estimation (Peter Majek)
Mar 16: Results of Class Project
Mar 19: Spring Break
Mar 21: Spring Break
Mar 23: Spring Break
Mar 26: George Forman: An Extensive Empirical Study of Feature Selection Metrics
for Text Classification JMLR 3(Mar):1415-1438, 2003 (Artit)
Mar 28: Saul & Roweis: An Introduction to Locally Linear Embedding (Sergei
Fotin)
Mar 30: PCA Tutorial: http://www.dgp.toronto.edu/~aranjan/tuts/pca.pdf
(Ainura)
Apr 02: Tatti: Distances between Data Sets Based on Summary Statistics (Nam
Nguyen)
Apr 04: Kari Torkkola: Feature Extraction by Non-Parametric Mutual Information
Maximization JMLR 3(Mar):1415-1438, 2003 (Chun-Nam)
Apr 06: Zhou, Foster, Stine, Ungar: Streaming Feature Selection Using
Alpha-Investing KDD 2005 (Amit Belani)
Apr 09: Tishby, Preira, Bialek: The Information Bottleneck Method,
Conference on Communication, Control, and Computing 1999 (Fan Yanga)
Apr 11: Friedman, Hastie, Tibshirani: Additive Logistic Regression: A
Statistical View of Boosting Annals of Statistics (2000) www.cse.psu.edu/~zha/CSE598/paper1.pdf
(Daria Sorokina) NOTE: It's a long paper, and youa re reading it on short
notice, so read the intro, skim the paper, and be sure to take a look at the
interesting discussion at the end. Also might want to look at the text The
Elements of Statistical Learning by Hastie, Tibshirani, and Friedman chapters
10.1-10.6 and 10.9-10.13.
Apr 13: project discussion
Apr 16: project discussion
Apr 18: project discussion
Apr 20: Hyvarinen & Oja: Independent Component Analysis: Algorithms and
Applications sections 1,2,3,4.1,4.2 (skim 4.2.1), 4.3,71 (Art Munson)
Apr 23: Galbraith & van Norden: The Resolution and Calibration of
Probabililistic Economic Forecasts (Myle Ott)
Apr 25: 5-minute project summaries
Apr 27: Breiman: Prediction Games and arcing algorithms and Reyzin &
Schapire: How boosting the margin can also boost classifier complexity (Nikos
Karampatziakis)
Apr 30:
May 02:
Survey of Empirical Methods (empirical.caruana.678.07.pdf)