CS578 Home Page

Empirical Methods in Machine Learning & Data Mining
CS578
Computer Science Department
Cornell University
Fall 2002

General Information    Lecture Notes    ML Links    Assignments    Project

 

Announcements

Dec 5 Last handout: Notes about exam, final project, and optional 4th hw

Dec 3 Optional make-up homework assignment available: hw4.txt, hw4.data

Nov 15 Final project (due Dec 7) files available: project.txt, train, test, roceasy.c

Nov 7 Homework assignment 3 (postscript, text) available.  Due Thu, Nov 21, 2002

Oct 31 Take home mid term exam now available.  Due 2:55pm Thu Nov 7, 2002

Oct 10  Homework assignment 2 is available; kNN lecture slides are available

Sept 12 Homework assignment 1 is available.


Time and Place

  • Tuesday, Thursday: 2:55pm-4:10pm, Hollister Hall B14
  • Project due: Friday, December 6
  • Midterm Exam: Due back October ??
  • Final Exam: Wednesday, December 18, 9:00-11:30am, Phillips 203.

Personnel

 

 

Office Hours

Office

Instructor

Richard Caruana

 Tue 4:30-5:00
Wed 1:30-2:30
 

Upson 4157

Teaching Assistant

Alexandru Niculescu-Mizil

 Mon 1:30-2:30
Thu 12:00-1:00
 

 Rhodes 419


Go to top


General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Tentative Course Syllabus

Textbooks:
Machine Learning by Tom Mitchell
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.

Optional references:
 Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork

Grading policies:

  • 20% Midterm (take home)
  • 20% Final (open book in class)
  • 30% Assignments (individual)
  • 30% Final Project (group project comparing various learning methods on two test problems)
  • Bonus points for class participation
  • Homeworks, the take-home mid-term, and the final exam must be your own work. For homework, it is OK to talk with other students about the assignment, ask each other questions, and in general learn from each other. But the homework you hand in must be your own work. If other students gave you significant help with your homework you should briefly acknowledge them in what you hand in.

Academic integrity policy

Go to top


Lecture Notes

CS578.02_intro_lecture.ppt
CS578.02_DT_lecture.ppt


CS578.02_kNN_lecture.ppt

CS578.02_performance_measures_lecture.ppt

CS578.02_missingvalues_featsel_lecture.ppt

CS578_clustering_lecture.ppt
Go to top


Assignments

Assignment 1: Due Thursday September 26
Download IND package
Instructions for installing the IND package using CYGWIN under Windows
If you have trouble  installing the IND package on Sun try this

Assignment 2: Due Tuesday October 29
hw2 handout (same as handout from class)
Download the dataset: hw2.knn.data

Go to top


Final Project

Nov 15 Final project (due Dec 7) files: project.txt, train, test, roceasy.c

  • You are encouraged to work on the project in groups of 1-4. Please register your group once it is formed. If you wish to work on the project with other students, but cannot find partners, please let us know and we'll try to match you up. It is OK to work on the project alone if you prefer.
  • The final project is a mini competition. We'll hand out a training set and a final test set. For the training set you will know the target values so that you can do supervised learning using decision trees, k-nearest neighbor, artificial neural nets,
  •  etc. You are allowed to use any of the learning methods we discuss in class. You can also combine several different learning methods if you wish. For the test set you will not know the target values. Your job is to train models on the training set, and use the best models you can build to make predictions for the test set. You then send us your predictions on the test set, and we'll measure your "accuracy". The projects getting the highest performance on the test set "wins". Part of the grade for the project will be based on how well your models perform on the test set. The final grade for the project also will take into account what methods you tried, how well you tackled each problem, and the quality of the write-up.  Read the project handout for more details.
  • The final project is 30% of the course grade. Exceptional projects may get extra credit.
  • Final report. Due Saturday, Dec 7.

Go to top


ML Links

 

Go to top