Empirical Methods in Machine Learning & Data Mining
Computer Science Department
Cornell University
Fall 2007

General Information    Lecture Notes    ML Links    Assignments    Project


Time and Place:

2:55 PM to 4:10 PM
Tuesdays & Thursdays
206 Hollister




Email (@cs.cornell.edu)

Office Hours



Rich Caruana

Tue 4:30 - 5:00
Wed 10:30-11:30

Upson 4157

Teaching Assistant

Daria Sorokina

Mon 11:00-12:00
Thu 13:15-14:15

Upson 5156

Teaching Assistant

Ainur Yessenalina

Mon   2:30-3:30
Wed  2:30-3:30

Upson 328

Teaching Assistant

Alex Niculescu-Mizil

Fri 10:30-11:30

Upson 5154


Melissa Totman 

M-F 9:00-4:00

Upson 4147

Go to top

General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork
Pattern Recognition and Machine Learning by Christopher Bishop

Grading policies:

Academic integrity policy

Go to top

Lecture Notes

Introduction to Statistics, Machine Learning, and Data Mining
Introduction to Decision Trees
Introduction to Hypothesis Testing
Introduction to Machine Learning Performance Measures
Introduction to Memory Based Learning (KNN, ...)

Missing Values and Feature Selection
SVM lecture (long notes) (short notes you are responsible for)
Bagging and Boosting

Go to top


Homework 1

Note about Homework 1: Prediction column is the last column "amegfi" in the dataset, which takes 2 values.
Cygwin Installation Tips
IND decision tree code for MacOS ind.macos10.3.tar
UNIXSTAT utility code for MacOS unixstat.macos10.3.tar
Download HW1 here: cs578.hw1.tar

Homework 2

Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 
Download HW2 here: HW2.tar

Homework 3

Download HW3 here: HW3.578.2007.tar

Go to top

Final Project

Predictions for the final project must be submitted by noon on Wed December 12.
The report for the final project must be handed in by noon on Thu December 13.
Train and test sets for final project are available at the top of this web page, or here.

Go to top

ML Links

Go to top