Machine Learning

COM S 478 - Spring 2006
Cornell University
Department of Computer Science

 
Time and Place
First lecture: January 24, 2006
Last lecture: May 4, 2006
  • Tuesday, 11:40am - 12:55pm in Upson 205
  • Thursday, 11:40am - 12:55pm in Upson 205

First Prelim Exam: Tuesday, March 14
Second Prelim Exam: Tuesday, April 25

Instructor
Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.
Mailing List and Newsgroup
[cs478-l@cs.cornell.edu] We'd like you to contact us by using this mailing list.  The list is set to mail all the TA's and Prof. Joachims -- you will get the best response time by using this facility, and all the TA's will know the question you asked and the answers you receive. This makes both of our jobs easier.
[cornell.class.cs478] We will post announcements to this newsgroup and students can use it to communicate among each other. You can find instruction for accessing the newsgroup at http://www.cit.cornell.edu/services/netnews/
Teaching Assistants
Thomas Finley, twf5@cornell.edu, 4104 Upson Hall.
Soo Yeon Lee, sl363@cornell.edu
 Rafael Frongillo, rmf25@cornell.edu
Yan Zhang, yz84@cornell.edu
Office Hours
Mondays, 2:30pm - 4:00pm Thomas Finley 4104 Upson
Mondays, 7:00pm - 8:00pm Soo Yeon Lee 361 Upson (CSUG Lab)
Wednesdays, 11:15am - 12:15pm Yan Zhang 361 Upson (CSUG Lab)
  Wednesdays, 2:30pm - 4:00pm Thomas Finley 4104 Upson
Thursdays, 1:30pm - 2:30pm Thorsten Joachims 4153 Upson
Fridays, 12:45pm - 2:15pm Rafael Frongillo 361 Upson (CSUG Lab)
Syllabus
Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to create spam filters, to analyze customer purchase data, or to detect fraudulent credit card transactions.

This course will introduce the fundamental set of techniques and algorithms that constitute machine learning as of today, ranging from classification methods like decision trees and support vector machines, over structured models like hidden Markov models and context-free grammars, to unsupervised learning and reinforcement learning. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics: 

  • Concept Learning : Hypothesis space, version space
  • Instance-based Learning : K-Nearest Neighbors, collaborative filtering
  • Decision Trees : TDIDT, Representation bias vs. search bias
  • Hypothesis Tests : Confidence intervals, resampling estimates
  • Linear Rules : Perceptron, Winnow
  • Support Vector Machines : Optimal hyperplane, Kernels
  • Generative Models : Bayes Rule, Naïve Bayes, MAP and Bayesian learning
  • Structured Models : Hidden Markov Models, Viterbi, Markov Random Fields
  • Learning Theory : PAC learning, generalization error bounds, mistake bounds, No-Free-Lunch
  • Clustering : HAC, k-means, Expectation-Maximization, latent semantic indexing
  • Reinforcement Learning : Q-Learning, Temporal difference learning

Reading
  • 01/24: Mitchell, Chapter 1
  • 01/26: Mitchell, Sections 8.1 - 8.2
  • 01/31: Mitchell Sections 2.1, 2.2, 2.5-2.5.2
  • 02/02: Mitchell Sections 3.1-3.5
  • 02/07: Mitchell Sections 5.1-5.2.1
  • 02/09: Mitchell Sections 2.7, 3.6-3.7
  • 02/14: Mitchell Chapter 5
  • 02/16: Mitchell Sections 4.4-4.4.2, 7.5
  • 02/21: Shawe-Taylor/Cristianini 2-2.1.1
  • 02/23: Schoelkopf/Smola 7.1-7.2, 7.5 (see here)
  • 02/28: Schoelkopf/Smola 7.3 (see here)
  • 03/01: Schoelkopf/Smola 7.4 (see here); Shawe-Taylor/Cristianini 3.1, 3.2, 3.3.2, 3.4
  • 03/07: Schoelkopf/Smola 7.6, 7.8 (see here); Joachims Section 4 (see here)
  • 03/09: Duda, Hart & Stork page 20-27
  • 03/16: Mitchell Section 6.9
  • 03/28: Duda, Hart & Stork Sections 2.4-2.6.1; Mitchell Section 6.10
  • 04/04: Leeds Online Tutorial on HMMs (except Forward and Forward/Backward Algorithm)
  • 04/11: Mitchell Chapter 7 (not 7.4.4 and 7.5)
  • 04/18: Manning/Schuetze Chapter 14 (not 14.1.3, 14.1.4)
Assignments
  • 01/31: Homework 1 (due 02/07) (see CMS)
  • 02/14: Homework 2 (due 02/21) (see CMS)
  • 02/28: Homework 3 (due 03/07) (see CMS)
  • 03/30: Homework 4 (due 04/06) (see CMS)
  • 04/11: Homework 5 (due 04/18) (see CMS)
  • 05/02 and 05/04: Project presentations (see notes on presentations)
  • 05/15: Final project report due via CMS (see template format for report)
Reference Material
The main textbook for the class is

Tom Mitchell, "Machine Learning", McGraw Hill, 1997.

A good additional textbook as a secondary reference is

Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.

In addition, we will provide hand-outs for topics not covered in the book. For further reading beyond the scope of the course, we recommended the following books:

  • Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
  • Shawe-Taylor, Cristianini, "Introduction to Support Vector Machines", Cambridge University Press, 2000.
  • Joachims, "Learning to Classify Text using Support Vector Machines", Kluwer, 2002.
  • Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997.
  • Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001.
  • Vapnik, "Statistical Learning Theory", Wiley, 1998.
Prerequisites
Programming skills (e.g. COM S 211 or COM S 312), and basic knowledge of linear algebra and probability theory (e.g. COM S 280).
Grading
This is a 4-credit course. Grades will be determined based on two written exams, a final project, homework assignments, and class participation.
  • 40%: 2 Prelim Exams
  • 15%: Final Project
  • 40%: Homework (~6 assignments)
  • 5%: Class Participation

All assignments are due at the beginning of class on the due date. Assignments turned in late will drop 5 points for each period of 24 hours for which the assignment is late. In addition, no assignments will be accepted after the solutions have been made available.

Roughly: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60

Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.