Empirical Methods in Machine Learning & Data Mining
CS578
Computer Science Department
Cornell University
Fall 2001

 

Announcements

Aug 31 The syllabus might change in order to cover more Data Mining

Sept 11 Homework assignment 1 is available.

Sept 14 The IND package can be installed under Windows. See instructions here.

Sept 17 IND works on Sun. See instructions here.

Oct 5 Homework assignment 2 is available.

Oct 9 The data set for homework 2 is available here.

Oct 29 The midterm text is here. Sorry for the delay.

Dec 19 The class photo is available. Click here 


Time and Place


Personnel

Office Hours Office
Instructor Richard Caruana
Tuesday 16:30 - 17:30
Wednesday 13:30 - 14:30
 
Upson 4157
Teaching Assistant Alexandru Niculescu-Mizil
Monday 11:15 - 13-15
 
Upson 4162

Go to top

General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course will also cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, and for exploring the conditions under which each is most appropriate.

Tentative Course Syllabus

Textbooks:
Machine Learning by Tom Mitchell
Data Mining:Concepts and Techniques by Jiawei Han, Micheline Kamber
Optional references:
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork

Grading policies:

Academic integrity policy

Go to top


Lecture Notes

  1. Decision trees.
  2. k-nearest neighbor.
  3. Missing values and feature selection
  4. Clustering.
  5. PCA, MDS and Canopies.
  6. Fractal dimensions.

Go to top

Assignments

ASSIGNMENT 1 : Due Thursday October 4, 2001
The assignment description
Download IND package
Instructions for installing the IND package under Windows
Instructions for installing the IND package on Sun

ASSIGNMENT 2 : Due Tuesday October 23, 2001
The assignment description
The data set

Midterm : Due Thursday November 1, 2001
The questions

ASSIGNMENT 3 : Due Thursday November 15, 2001
The assignment description
The data set
The ROC code
The simplified ROC code

ASSIGNMENT 4 (Optional)  : Due Wednesday December 19, 2001
The assignment description
Protein data in upper diagonal format

Go to top

Final Project

Go to top


ML Links

Go to top