Empirical Methods in Machine Learning & Data Mining
Computer Science Department
Cornell University
Fall 2005

General Information    Lecture Notes    ML Links    Assignments    Project


Homework 3 is now available.  It's due at the beginning of class on Tue Nov 22.

Clarification for midterm: XOR of three inputs is defined to be +1 if the number of +1 inputs is odd, and -1 if the number of +1 inputs is even.

Small error in the midterm: on the decision tree question where you are asked to calculate Info_gain, Gain_Ratio, and Gini_Score,  Gini_Score is a typo and you should be calculating RMS_Score instead.  This should be obvious from the rest of the question, but just in case.

The midterm handed out in class is missing the last page.  Here is the complete text file: trick-or-treat.2005.txt  Very sorry for the confusion.  The midterm is due 2:55pm on Thu Nov 3.  Unlike homework, we will not accept late midterms.  Good luck.  I hope you enjoy it.

Homework Assignment 2 is now available.  It's due at the beginning of class on Thu Oct 20.

There are problems getting IND to compile using the new compiler in recent versions of CYGWIN.  You can avoid these problems by telling CYGWIN to use compiler 3.3.3 (instead of the newer 3.4.4) in the CYGWIN Setup Tool.  Note that you also should use the Setup Tool to install tcsh, bison, and make5

The Unix/UnixSTAT/Scripting tutorial will be held Tue 7:30-9:30 in Olin Hall 165.  The tutorial is optional and is intended to help people who are not familiar with Unix or writing scripts get up to speed.

Homework Assignment 1 is now available.  It's due at the beginning of class on Tue Sep 20.

We requested that the Mitchell textbook be put on reserve, but it may take a few days for it to be available because the library's copy from last year was not returned.

Time and Place




Email (@cs.cornell.edu)

Office Hours



Rich Caruana

Tue 4:30 - 5:00
Wed 10:00-11:00

Upson 4157

Teaching Assistant

Cristian Bucila

 Thu 11:30-12:00
Fri 2:00-3:00

Upson 322

Teaching Assistant

Lars Backstrom

Mon 11:30-12:00
Wed 3:30-4:35

Upson 4124

Teaching Assistant

Alex Niculescu-Mizil

Mon 12:00-1:00

Upson 5154

 Administrative Asst.

Amy Fish 

M-F 9:00-4:00

Upson 4146

Go to top

General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Tentative Course Syllabus

Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork

Grading policies:

Academic integrity policy

Go to top

Lecture Notes

Unsupervised Learning Clustering slides (cs578_clustering_lecture.4up.pdf)

MTL slides (cs578.mtl.lecture.4up.pdf)

SVM slides (slides_sigir03_tutorial-modified.v3.pdf)

Bagging & Boosting Slides (CS578.bagging.boosting.lecture.pdf)

Special Topics: Missing Values & Feature Selection Slides (missing_featsel_lecture.pdf)

KNN Slides (CS578_knn_lecture.4up.pdf)

Revised Performance Measures Slides (last update 10/18/05) (performance_measures.4up.pdf)

History File from Unix Tutorial (cu.578.05.unix.history.txt)

Revised Decision Tree Slides (last updated 9/15/05) (CS578.05_DT_lecture.2up.pdf)

Introduction to COMS 578 and a Brief History of Statistics, Machine Learning, and Data Mining (CS578.05_INTRO_lecture.pdf)

Go to top


HW1 Decision Tree Assignment (due start of class Tue Sep 20): cs578.hw1.tar
IND download for MacOS 10.3: ind.macos10.3.tar
UnixStat download for MacOS 10.3: unixstat.macos10.3.tar

HW2 Neural Nets Assignment (due start of class Thu Oct 20 ): hw2.tar.gz
Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 

HW3 KNN Assignment (due start of class Tue Nov 22): hw3.ps  hw3.data.gz

Go to top

Final Project

Go to top

ML Links

Go to top