Empirical Methods in Machine Learning & Data Mining
Computer Science Department
Cornell University
Fall 2006

General Information    Lecture Notes    ML Links    Assignments    Project


Time and Place

2:55 PM to 4:10 PM
Tuesdays & Thursdays
205 Thurston Hall



Email (@cs.cornell.edu)

Office Hours



Rich Caruana

Tue 4:30 - 5:00
Wed 10:30-11:30

Upson 4157

Teaching Assistant

Art Munson

Mon 10:00-11:00am
Fri 1:30-2:30

Upson 5156

Teaching Assistant

Yisong Yue

Tue 4:30-5:30
Wed 11:00-12:00

Upson 4154

Teaching Assistant

Alex Niculescu-Mizil

Thu 10:30-11:30

Upson 5154


Melissa Totman 

M-F 9:00-4:00

Upson 4147

Go to top

General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork
Pattern Recognition and Machine Learning by Christopher Bishop

Grading policies:

Academic integrity policy

Go to top

Lecture Notes

  1. Intro Lecture (CS578.06_INTRO_lecture.4up.pdf)
  2. Decision Tree Lecture (and t-test mini-lecture) (CS578.06_DT_lecture.ppt.pdf)
  3. UNIX Introduction files:
  4. Performance Measures Lecture (performance_measures.pdf)
  5. Experimental Design Information
  6. KNN Lecture (CS578_knn_lecture.pdf)
  7. Feature Selection / Missing Value Lecture (CS578_featsel_missing_lecture.pdf)
  8. Bagging, Boosting, Random Forests, and Ensemble Learning (CS578.bagging.boosting.lecture.pdf)
  9. SVM lecture (long notes) (short notes you are responsible for)
  10. Clustering lecture (responsible up to slide 34 "Mean Point Happiness" for Prelim 2) (cs578_clustering_lecture.pdf)

Go to top


Homework 3

Download HW3 here: 578.hw3.2006.tar.gz

Homework 2

Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 
Download HW2 here: cs578.hw2.tar.gz

Homework 1

Download HW1 here: cs578.hw1.tar
IND decision tree code for MacOS: ind.macos10.3.tar
UNIXSTAT utility code for MacOS: unixstat.macos10.3.tar
Tips for installing IND / unixstat on Cygwin

Go to top

Final Project

Go to top

ML Links

Go to top