Empirical Methods in Machine Learning & Data Mining
CS578
Computer Science Department
Cornell University
Fall 2006

General Information    Lecture Notes    ML Links    Assignments    Project

Announcements:


Time and Place

2:55 PM to 4:10 PM
Tuesdays & Thursdays
205 Thurston Hall

 

 

Email (@cs.cornell.edu)

Office Hours

Office

Instructor

Rich Caruana

caruana
Tue 4:30 - 5:00
Wed 10:30-11:30

Upson 4157

Teaching Assistant

Art Munson

art
Mon 10:00-11:00am
Fri 1:30-2:30

Upson 5156

Teaching Assistant

Yisong Yue

yyue
Tue 4:30-5:30
Wed 11:00-12:00

Upson 4154

Teaching Assistant

Alex Niculescu-Mizil

alexn
Thu 10:30-11:30

Upson 5154

Administrative
Assistant

Melissa Totman 

mtotman
M-F 9:00-4:00

Upson 4147


Go to top


General Information

Course Description:
This implementation-oriented course presents a broad introduction to current algorithms and approaches in machine learning, knowledge discovery, and data mining and their application to real-world learning and decision-making tasks. The course also will cover empirical methods for comparing learning algorithms, for understanding and explaining their differences, for exploring the conditions under which each is most appropriate, and for figuring out how to get the best possible performance out of them on real problems.

Textbooks:
Machine Learning by Tom Mitchell

Optional references:
The Elements of Statistical Learning: Data Mining, Inference, and Prediction by T. Hastie, R. Tibshirani, J. Friedman.
Pattern Classification 2nd edition by Richard Duda, Peter Hart, & David Stork
Pattern Recognition and Machine Learning by Christopher Bishop

Grading policies:

Academic integrity policy

Go to top


Lecture Notes

  1. Intro Lecture (CS578.06_INTRO_lecture.4up.pdf)
  2. Decision Tree Lecture (and t-test mini-lecture) (CS578.06_DT_lecture.ppt.pdf)
  3. UNIX Introduction files:
  4. Performance Measures Lecture (performance_measures.pdf)
  5. Experimental Design Information
  6. KNN Lecture (CS578_knn_lecture.pdf)
  7. Feature Selection / Missing Value Lecture (CS578_featsel_missing_lecture.pdf)
  8. Bagging, Boosting, Random Forests, and Ensemble Learning (CS578.bagging.boosting.lecture.pdf)
  9. SVM lecture (long notes) (short notes you are responsible for)
  10. Clustering lecture (responsible up to slide 34 "Mean Point Happiness" for Prelim 2) (cs578_clustering_lecture.pdf)

Go to top


Assignments

Homework 3

Download HW3 here: 578.hw3.2006.tar.gz

Homework 2

Perf code for calculating ROC performances: http://kodiak.cs.cornell.edu/kddcup/software.html 
Download HW2 here: cs578.hw2.tar.gz

Homework 1

Download HW1 here: cs578.hw1.tar
IND decision tree code for MacOS: ind.macos10.3.tar
UNIXSTAT utility code for MacOS: unixstat.macos10.3.tar
Tips for installing IND / unixstat on Cygwin

Go to top


Final Project

Go to top


ML Links

Go to top