CS 4780 - Machine Learning

Machine Learning CS 4780 - Fall 2009 Cornell University Department of Computer Science

Time and Place
	First lecture: August 27, 2009 Last lecture: December 3, 2009 Tuesday, 1:25pm - 2:40pm in Phillips 203 Thursday, 1:25pm - 2:40pm in Phillips 203 NOTE: CS4780 is only offered in Fall 2009, not in Spring 2010. First Prelim Exam: 10/15 Second Prelim Exam: 11/24 Review Session I: Wednesday 10/14, 10:00am - 11:00am, in Upson 315 Review Session II: Sunday 11/22, 5:00pm - 6:00pm, in Upson 315
Instructor
	Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.
Mailing List and Newsgroup
	[cs4780-l@cornell.edu] We'd like you to contact us by using this mailing list. The list is set to mail all the TA's and profs -- you will get the best response time by using this facility, and all the TA's will know the question you asked and the answers you receive.
Teaching Assistants
	Mark Verheggen, mark@cs.cornell.edu, Upson 4161.

Office Hours
	Monday, 1:00 pm - 2:00 pm	Mark Verheggen	Upson 328B
	Thursday, 3:00 pm - 4:00 pm	Thorsten Joachims	4153 Upson
	Thursday, 12:15 pm - 1:15 pm	Mark Verheggen	Upson 328B
	Friday, 2:30 pm - 3:30 pm	Rick Ducott	Upson 328B
Syllabus
	Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to create spam filters, to analyze customer purchase data, or to detect fraudulent credit card transactions. This course will introduce the fundamental set of techniques and algorithms that constitute machine learning as of today, ranging from classification methods like decision trees and support vector machines, over structured models like hidden Markov models and context-free grammars, to unsupervised learning and clustering. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics: Concept Learning : Hypothesis space, version space Instance-based Learning : K-Nearest Neighbors, collaborative filtering Decision Trees : TDIDT, Representation bias vs. search bias Hypothesis Tests : Confidence intervals, resampling estimates Linear Rules : Perceptron, Winnow Support Vector Machines : Optimal hyperplane, Kernels Generative Models : Bayes Rule, Naïve Bayes, MAP and Bayesian learning Structured Models : Hidden Markov Models, Viterbi, Markov Random Fields Learning Theory : PAC learning, generalization error bounds, mistake bounds, No-Free-Lunch Clustering : HAC, k-means, Expectation-Maximization, latent semantic indexing
Slides and Handouts
	08/27: Introduction (PDF) 09/01: Instance-based Learning (PDF) 09/03: Decision Tree Learning (PDF) 09/15: Assessing Learning Results (PDF) 09/22: Linear Rules and Perceptron (PDF) 09/29: Optimal Hyperplanes and Support Vector Machines (PDF) 10/01: Duality and Leave-one-out (PDF) 10/06: Kernels (PDF) 10/08: Learning Ranking Functions (PDF) 10/20: Generative Models (PDF) 10/29: HMMs and Structured Output Prediction (PDF) 11/10: Statistical Learning Theory (PDF) 11/17: Clustering (PDF) 11/19: Transduction and Co-Training (PDF)
Reference Material
	The main textbooks for the class are Tom Mitchell, "Machine Learning", McGraw Hill, 1997. Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library) A good additional textbook as a secondary reference is Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004. In addition, there will be additional readings for topics not covered in the main textbooks. For further reading beyond the scope of the course, we recommended the following books: Duda, Hart, Stork, "Pattern Classification", Wiley, 2000. Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001. Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library) Leeds Tutorial on HMMs (online) Joachims, "Learning to Classify Text using Support Vector Machines", Kluwer, 2002. Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997. Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online) Vapnik, "Statistical Learning Theory", Wiley, 1998.
Prerequisites
	Programming skills (e.g. COM S 211 or COM S 312), and basic knowledge of linear algebra and probability theory (e.g. COM S 280).
Grading
	This is a 4-credit course. Grades will be determined based on two written exams, a final project, homework assignments, and class participation. 40%: 2 Prelim Exams 15%: Final Project 40%: Homework (~5 assignments) 5%: Class Participation All assignments are due at the beginning of class on the due date. Assignments turned in late will drop 5 points for each period of 24 hours for which the assignment is late. In addition, no assignments will be accepted after the solutions have been made available. Roughly: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60
Academic Integrity
	This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated.