Empirical Methods in Machine Learning & Data Mining
Computer Science Department
Cornell University
Fall 2003

General Information    Lecture Notes    ML Links    Assignments    Project


missing values and feature selection slides available here (slides.ppt)

The Final Exam is open book, open notes, but you are not allowed to use any devices that might be wireless.  That means you cannot use laptops, PDAs, etc. during the exam.

For regrades on the mid-term it's best to talk to the individual who graded that question:

Rich graded  OG &  CA
Alex - SVM & CV
Casey - ANN
Cristi - kNN
Radu - CO & DT

For the final project, class labels are in the first column, not the last column.  The class labels have all been set to "0" for the test data.

Predictions are due midnight Sat Dec 6.  We will not start assessing late penalties until Monday Dec 8.  Penalty is -5 for predictions emailed to us Mon Dec 8, -10 for predictions emailed Tue the 9th, -15 for Wed the 10th, and -20 for Thu the 11th.  No predictions will be accepted after Thu.  The report *must* be handed in by Thursday.  Late reports will not be accepted --- no exceptions!

We will ignore predictions that do not obey the submission format!  No extensions for submitting the wrong format.  (NOTE: you should test email the predictions to yourself to verify that they are being sent as attachments, and not as text inside the body of the email.  If possible, send the files gzip'd (unix) or winzip'd (windows).)

The report for the project is due Thursday, Dec 11 at noon!  Give the report to my admin Cindy Robinson in Upson 4146, or to one of the TAs.  Unlike homeworks,  the final report is graded on how clear, concise, and well organized it is.  

The project is out. Predictions due Dec 6. Text is here (project.ps) and data is here (project.data.tar.gz) and the code for perf5 is here (perf5.c).  

NOTE: when you submit predictions for accuracy for the final project, you should adjust your predictions so that when we use a 0.5 threshold it does what you wanted, or, apply the threshold yourself and just send us the test cases classified as 0 or 1.

Hw3 is due Nov 25. Text is here (hw3.ps) and data is here (hw3.data.gz)

Mid-term take-home exam due 5pm Nov 7.  (midterm.txt)

HW2 now due Fri 10/31/03 at 5pm.  Small extra credit for handing it in on Thu.  As before, we will be generous with late penalties so it is better to hand in an excellent HW a little late than to hand in a poor HW on time.  Happy Halloween!

RMSE baseline computation

Radu's Friday Office Hours have changed !

Have a good Fall Break!

Homework 2 is due in 3 weeks at start of class on Thursday 10/30/03.  This is a computationally expensive assignment so don't wait until the last minute to start.  Train the neural nets now while learning about SVMs.  Download data here (hw2.data.gz).  Download assignment here (hw2.ps).  

Measurements that might be useful for hw1. Handed out in class Tue Sep 23 (.ps)

Small update to cgwin install instructions.  New instructions here.  If it's already working for you don't bother.  

You should "gunzip cs578.hw1.tar.gz" before untaring file.

Homework 1 available.  Due in 2 weeks at start of class on Thursday 9/25/03
Instructions for installing IND on Windows machines are here

New Room Starting 9/02/03: Thurston 205

Time and Place




Office Hours



Richard Caruana

 Tue 4:30-5:00
Wed 10:30-11:30

Upson 4157

Teaching Assistant

Alexandru Niculescu-Mizil

 Mon 11:30-12:30
Thu 12:00-1:00

 Rhodes 419

Teaching Assistant

Radu Popovici 

Mon 3:30-4:30
Fri 5:15-6:15

Upson 5132

Teaching Assistant

Casey Smith

Wed 3:30-4:30

Upson 5132

 Administrative Asst.

Cindy Robinson 

M-F 7:30-3:30

Upson 4146

Go to top

General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Tentative Course Syllabus

Machine Learning by Tom Mitchell
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.

Optional references:
 Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork

Grading policies:

Academic integrity policy

Go to top

Lecture Note

1. Decision Trees (.pdf)

2. Support Vector Machines (.pdf)

3. KNN (.pdf)

4. Missing Values and Feature Selection (.ppt)

5. Bagging-Boosting (.ppt)

6. Performance Measures (.pdf)

7. Clustering and Unsupervised Learning (.pdf) 

8. Multi-Task Learning (.ppt) 

Go to top


Download homework #1.  Due Thu 9/25/03 by start of class.
Instructions to install IND on a windows machine.

Go to top

Final Project

Go to top

ML Links

Go to top