Machine Learning
CS 4780/5780  Fall 2011 

Time and Place  
First lecture: August 25, 2011 Last lecture: December 1, 2011
First Prelim Exam: October 13  
Instructor  
Thorsten Joachims, tj@cs.cornell.edu, 4153 Upson Hall.  
Online Resources  


Teaching Assistants and Consultants  
Karthik Raman, TA, 4126 Upson Hall Chenhao Tan, TA, 4121 Upson Hall Adith Swaminathan, TA, 5132 Upson Hall Igor Labutov, Consultant Mevlana Gemici, Consultant Anthony Chang, Consultant Nic Williamson, Consultant Heran Yang, Consultant Boiar Qin, Consultant 

Office Hours  
Monday, 9:30am  10:30am  Adith Swaminathan  Upson 328B, Bay B  
Monday, 4:00pm  5:00pm  Karthik Raman  Upson 328B, Bay B  
Monday, 6:00pm  7:00pm  Chenhao Tan  Upson 4121  
Tuesday, 4:30pm  5:20pm  Thorsten Joachims  Upson 4153  
Tuesday, 6:00pm  7:00pm  Igor Labutov  Upson 328B  
Wednesday, 6:00pm  7:00pm  Chenhao Tan  Upson 4121  
Thursday, 6:00pm  7:00pm  Adith Swaminathan  Upson 328B, Bay B  
Friday, 5:00pm  6:00pm  Karthik Raman  Upson 328B, Bay B  
Saturday, 11:00am  12:00pm  Anthony Chang  Upson 328B  
Sunday, 12:30pm  1:30pm  Mevlana Gemici  Upson 328B, Bay B  
Sunday, 4:00pm  5:00pm  Boiar Qin  Upson 328B, Bay A  
Syllabus  
Machine learning is concerned with the
question of how to make computers learn from experience. The ability to
learn is not only central to most aspects of intelligent behavior, but
machine learning techniques have become key components of many software
systems. For examples, machine learning techniques are used to create
spam filters, to analyze customer purchase data, or to detect fraudulent
credit card transactions.
This course will introduce the fundamental set of techniques and algorithms that constitute machine learning as of today, ranging from classification methods like decision trees and support vector machines, over structured models like hidden Markov models and contextfree grammars, to unsupervised learning and clustering. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics:


Slides and Handouts  
08/25: Introduction (PDF) 08/29: InstanceBased Learning (PDF) 09/01: DecisionTree Learning (PDF) 09/13: Assessing Learning Results (PDF) 09/20: Linear Classifiers and Perceptrons (PDF) 09/27: Support Vector Machines: Optimal Hyperplanes (PDF) 09/29: Support Vector Machines: Duality and LeaveOneOut Error (PDF) 10/04: Support Vector Machines: Kernels (PDF) 10/14: Learning to Rank (PDF) 10/18: Generative Models, Naive Bayes, and Linear Discriminant (PDF) 11/01: Sequences and Hidden Markov Models (PDF) 11/08: Statistical Learning Theory (PDF) 11/15: Clustering (PDF) 11/15: Structured Prediction and Structural SVMs (PDF) 

Reference Material  
The main textbooks for the class are
An additional textbook that can serve as a brief secondary reference on many topics in this class is
In addition, there will be additional readings for topics not covered in the main textbooks. For further reading beyond the scope of the course, we recommended the following books:


Prerequisites  
Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra and probability theory (e.g. CS 2800).  
Grading  
This is a 4credit course. Grades will be
determined based on two written exams, a final project, homework
assignments, and class participation.
To eliminate outlier grades for homeworks and quizzes, the lowest grade is replaced by the second lowest grade in the final grade computation. All assignments are due at the beginning of class on the due date. Assignments turned in late will be charged a late penalty of 5 points for each period of 24 hours for which the assignment is late. However, every student has a budget of 4 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. No assignment will be accepted after the solution was made public, which is typically 5 days after the time it was due. You can submit late assignments in class, in office hours, or to the office of a TA. Graded homework assignments and prelims can be picked up in Upson 360 (opening hours Monday  Thursday 12noon  4:00pm, Friday: 1:30pm  4:00pm). Regrade requests can be submitted within 7 days after the grades have been made available on CMS. Regrade requests have to be submitted in writing and in hardcopy using this form (or similar). They can be submitted in class, in office hours, or to the office of a TA. We always appreciate interesting homework solutions that go beyond the minimum. To reward homework solutions that are particularly nice, we will give you "Bonus Points". Bonus points are collected in a special category on CMS. Bonus points are not real points and are not summed up for the final grade, but they can nudge somebody to a higher grade who is right on the boundary. All assignment, exam, and final grades are roughly on the following scale: A=92100; B=8288; C=7278; D=6068; F= below 60 

Academic Integrity  
This course follows the
Cornell University Code of Academic Integrity. Each student in this
course is expected to abide by the Cornell University Code of Academic
Integrity. Any work submitted by a student in this course for academic
credit will be the student's own work. Violations of the rules (e.g.
cheating, copying, nonapproved collaborations) will not be tolerated. We run automatic cheating detection to detect violations of the collaboration rules. 