General
Information Lecture Notes
ML Links
Assignments
Project
Announcements:
Homework 3 is now available. It's due at the beginning of class on Tue Nov 22.
Clarification for midterm: XOR of three inputs is defined to be +1 if the number of +1 inputs is odd, and 1 if the number of +1 inputs is even.
Small error in the midterm: on the decision tree question where you are asked to calculate Info_gain, Gain_Ratio, and Gini_Score, Gini_Score is a typo and you should be calculating RMS_Score instead. This should be obvious from the rest of the question, but just in case.
The midterm handed out in class is missing the last page. Here is the complete text file: trickortreat.2005.txt Very sorry for the confusion. The midterm is due 2:55pm on Thu Nov 3. Unlike homework, we will not accept late midterms. Good luck. I hope you enjoy it.
Homework Assignment 2 is now available. It's due at the beginning of class on Thu Oct 20.
There are problems getting IND to compile using the new compiler in recent versions of CYGWIN. You can avoid these problems by telling CYGWIN to use compiler 3.3.3 (instead of the newer 3.4.4) in the CYGWIN Setup Tool. Note that you also should use the Setup Tool to install tcsh, bison, and make5
The Unix/UnixSTAT/Scripting tutorial will be held Tue 7:309:30 in Olin Hall 165. The tutorial is optional and is intended to help people who are not familiar with Unix or writing scripts get up to speed.
Homework Assignment 1 is now available. It's due at the beginning of class on Tue Sep 20.
We requested that the Mitchell textbook be put on reserve, but it may take a few days for it to be available because the library's copy from last year was not returned.


Email (@cs.cornell.edu) 
Office Hours 
Office 
Instructor 
Rich Caruana 
caruana  Tue 4:30  5:00 Wed 10:0011:00 
Upson 4157 
Teaching Assistant 
Cristian Bucila 
cristi  Thu 11:3012:00 Fri 2:003:00 
Upson 322 
Teaching Assistant 
Lars Backstrom 
lars  Mon 11:3012:00 Wed 3:304:35 
Upson 4124 
Teaching Assistant 
Alex NiculescuMizil 
alexn  Mon 12:001:00 
Upson 5154 
Administrative Asst. 
Amy Fish 
amyfish  MF 9:004:00 
Upson 4146 
Course
Description:
This implementationoriented course presents a broad introduction to
current algorithms and approaches in machine learning, knowledge
discovery, and data mining and their application to realworld learning
and decisionmaking tasks. The course also will cover empirical methods
for comparing learning algorithms, for understanding and explaining
their differences, for exploring the conditions under which each is
most appropriate, and for figuring out how to get the best possible
performance out of them on real problems.
Textbooks:
Machine Learning
by Tom Mitchell
Optional references:
The Elements of
Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani,
J. Friedman.
Pattern Classification 2nd edition
by Richard Duda, Peter Hart, & David Stork
Grading policies:
Unsupervised Learning Clustering slides (cs578_clustering_lecture.4up.pdf)
MTL slides (cs578.mtl.lecture.4up.pdf)
SVM slides (slides_sigir03_tutorialmodified.v3.pdf)
Bagging & Boosting Slides (CS578.bagging.boosting.lecture.pdf)
Special Topics: Missing Values & Feature Selection Slides (missing_featsel_lecture.pdf)
KNN Slides (CS578_knn_lecture.4up.pdf)
Revised Performance Measures Slides (last update 10/18/05) (performance_measures.4up.pdf)
History File from Unix Tutorial (cu.578.05.unix.history.txt)
Revised Decision Tree Slides (last updated 9/15/05) (CS578.05_DT_lecture.2up.pdf)
Introduction to COMS 578 and a Brief History of Statistics, Machine Learning, and Data Mining (CS578.05_INTRO_lecture.pdf)
HW1 Decision Tree Assignment (due start of class Tue Sep 20): cs578.hw1.tar
IND download for MacOS 10.3: ind.macos10.3.tar
UnixStat download for MacOS 10.3: unixstat.macos10.3.tar
HW2 Neural Nets Assignment (due start of class Thu Oct 20 ): hw2.tar.gz
Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html
HW3 KNN Assignment (due start of class Tue Nov 22): hw3.ps hw3.data.gz