Empirical Methods in Machine Learning & Data Mining
CS578
Computer Science Department
Cornell University
Fall 2005

General Information    Lecture Notes    ML Links    Assignments    Project

Announcements:

Homework 3 is now available.  It's due at the beginning of class on Tue Nov 22.

Clarification for midterm: XOR of three inputs is defined to be +1 if the number of +1 inputs is odd, and -1 if the number of +1 inputs is even.

Small error in the midterm: on the decision tree question where you are asked to calculate Info_gain, Gain_Ratio, and Gini_Score,  Gini_Score is a typo and you should be calculating RMS_Score instead.  This should be obvious from the rest of the question, but just in case.

The midterm handed out in class is missing the last page.  Here is the complete text file: trick-or-treat.2005.txt  Very sorry for the confusion.  The midterm is due 2:55pm on Thu Nov 3.  Unlike homework, we will not accept late midterms.  Good luck.  I hope you enjoy it.

Homework Assignment 2 is now available.  It's due at the beginning of class on Thu Oct 20.

There are problems getting IND to compile using the new compiler in recent versions of CYGWIN.  You can avoid these problems by telling CYGWIN to use compiler 3.3.3 (instead of the newer 3.4.4) in the CYGWIN Setup Tool.  Note that you also should use the Setup Tool to install tcsh, bison, and make5

The Unix/UnixSTAT/Scripting tutorial will be held Tue 7:30-9:30 in Olin Hall 165.  The tutorial is optional and is intended to help people who are not familiar with Unix or writing scripts get up to speed.

Homework Assignment 1 is now available.  It's due at the beginning of class on Tue Sep 20.

We requested that the Mitchell textbook be put on reserve, but it may take a few days for it to be available because the library's copy from last year was not returned.


Time and Place


Personnel

 

 

Email (@cs.cornell.edu)

Office Hours

Office

Instructor

Rich Caruana

caruana
Tue 4:30 - 5:00
Wed 10:00-11:00

Upson 4157

Teaching Assistant

Cristian Bucila

cristi
 Thu 11:30-12:00
Fri 2:00-3:00

Upson 322

Teaching Assistant

Lars Backstrom

lars
Mon 11:30-12:00
Wed 3:30-4:35

Upson 4124

Teaching Assistant

Alex Niculescu-Mizil

alexn
Mon 12:00-1:00

Upson 5154

 Administrative Asst.

Amy Fish 

amyfish
M-F 9:00-4:00

Upson 4146


Go to top


General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Tentative Course Syllabus

Textbooks:
Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork

Grading policies:

Academic integrity policy

Go to top


Lecture Notes

Unsupervised Learning Clustering slides (cs578_clustering_lecture.4up.pdf)

MTL slides (cs578.mtl.lecture.4up.pdf)

SVM slides (slides_sigir03_tutorial-modified.v3.pdf)

Bagging & Boosting Slides (CS578.bagging.boosting.lecture.pdf)

Special Topics: Missing Values & Feature Selection Slides (missing_featsel_lecture.pdf)

KNN Slides (CS578_knn_lecture.4up.pdf)

Revised Performance Measures Slides (last update 10/18/05) (performance_measures.4up.pdf)

History File from Unix Tutorial (cu.578.05.unix.history.txt)

Revised Decision Tree Slides (last updated 9/15/05) (CS578.05_DT_lecture.2up.pdf)

Introduction to COMS 578 and a Brief History of Statistics, Machine Learning, and Data Mining (CS578.05_INTRO_lecture.pdf)

Go to top


Assignments

HW1 Decision Tree Assignment (due start of class Tue Sep 20): cs578.hw1.tar
IND download for MacOS 10.3: ind.macos10.3.tar
UnixStat download for MacOS 10.3: unixstat.macos10.3.tar

HW2 Neural Nets Assignment (due start of class Thu Oct 20 ): hw2.tar.gz
Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 

HW3 KNN Assignment (due start of class Tue Nov 22): hw3.ps  hw3.data.gz

Go to top


Final Project

Go to top


ML Links

Go to top