Empirical Methods in Machine Learning & Data Mining
CS578
Computer Science Department
Cornell University
Fall 2007

General Information    Lecture Notes    ML Links    Assignments    Project

Announcements:


Time and Place:

2:55 PM to 4:10 PM
Tuesdays & Thursdays
206 Hollister

 

 

 

Email (@cs.cornell.edu)

Office Hours

Office

Instructor

Rich Caruana

caruana
Tue 4:30 - 5:00
Wed 10:30-11:30

Upson 4157

Teaching Assistant

Daria Sorokina

daria
Mon 11:00-12:00
Thu 13:15-14:15

Upson 5156

Teaching Assistant

Ainur Yessenalina

ainur
Mon   2:30-3:30
Wed  2:30-3:30

Upson 328

Teaching Assistant

Alex Niculescu-Mizil

alexn
Fri 10:30-11:30

Upson 5154

Administrative
Assistant

Melissa Totman 

mtotman
M-F 9:00-4:00

Upson 4147


Go to top


General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Textbooks:
Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork
Pattern Recognition and Machine Learning by Christopher Bishop

Grading policies:

Academic integrity policy

Go to top


Lecture Notes

Introduction to Statistics, Machine Learning, and Data Mining
Introduction to Decision Trees
Introduction to Hypothesis Testing
Introduction to Machine Learning Performance Measures
Introduction to Memory Based Learning (KNN, ...)

Missing Values and Feature Selection
SVM lecture (long notes) (short notes you are responsible for)
Bagging and Boosting
Clustering

Go to top


Assignments

Homework 1

Note about Homework 1: Prediction column is the last column "amegfi" in the dataset, which takes 2 values.
Cygwin Installation Tips
IND decision tree code for MacOS ind.macos10.3.tar
UNIXSTAT utility code for MacOS unixstat.macos10.3.tar
Download HW1 here: cs578.hw1.tar

Homework 2

Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 
Download HW2 here: HW2.tar

Homework 3

Download HW3 here: HW3.578.2007.tar

Go to top


Final Project

Predictions for the final project must be submitted by noon on Wed December 12.
The report for the final project must be handed in by noon on Thu December 13.
Train and test sets for final project are available at the top of this web page, or here.

Go to top


ML Links

Go to top