
CS6780  Advanced Machine Learning
Spring 2015
Prof. Thorsten Joachims
Cornell University, Department of Computer Science


Shortcuts: [Slides] [Piazza] [CMS]

Time and Place
First lecture: January 22, 2015
Last lecture: May 5, 2015
Tuesdays/Thursdays, 2:55pm  4:10pm
 Gates 114 (Cornell Ithaca)
 Ursa room (Cornell NYCTech)
Exam: April 16
Project Report: May 11 and May 15


Syllabus
This course gives a graduatelevel introduction to machine learning and statistical pattern recognition and indepth coverage of new and advanced methods in machine learning, as well as their underlying theory. It emphasizes approaches with practical relevance and discusses a number of recent applications of machine learning, such as data mining, computer vision, robotics, text and web data processing. An open research project is a major part of the course.
In particular, the course will cover the following topics:
 Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization
 Decision Trees : TDIDT, attribute selection, pruning and overfitting
 Statistical Learning Theory : generalization error bounds, VC dimension
 LargeMargin Methods: linear Rules, margin, Perceptron, SVMs
 Kernels: duality, nonlinear rules, nonvectorial data
 Probabilistic Models: generative vs. discriminative, maximum likelihood, Bayesian inference
 Sequence Prediction : hidden Markov model, Viterbi
 Structured Output Prediction : undirected graphical models, structural SVMs, conditional random fields
 Latent Variable Models: kmeans clustering, mixture of Gaussians, expectationmaximization algorithm, matrix factorization, embeddings
 Online Learning : experts, bandits, online convex optimization
 Other topics: neural nets, ensemble methods, sparsity, ...
The prerequisites for the class are: programming skills (at the level of CS 2110) and basic knowledge of linear algebra (at the level of MATH 2940) and probability theory (at the level of MATH 4710) and multivariable calculus (at the level of MATH 1920).
Enrollment is limited to PhD students. Students who have already taken CS 4780/CS 5780 should not take this class.


Slides
 01/22: Introduction [slides] [slides 6up]
 What is learning?
 What is machine learning used for?
 Overview of course, course policies, and contact info.
 01/27: Supervised Batch Learning [slides] [slides 6up]
 Supervised learning for binary classification, multiclass classification, regression, and stuctured output prediction
 Instance space, target function, training examples
 Hypothesis space, consistency, and version space
 Classifying with a decision tree
 Representational power of decision trees
 TDIDT decision tree learning algorithm
 01/29: Empirical Risk Minimization [slides] [slides 6up]
 Independently identically distributed (iid) data
 Risk, empirical risk, and Empirical Risk Minimization (ERM)
 Bayes decision rule and Bayes risk
 Model selection and overfitting
 Model assessment and machine learning experiments
 02/10: Statistical Learning Theory: Generalization Error Bounds [slides] [slides 6up]
 Generalization error bound for finite H and zero error
 Sample complexity
 Probably Approximately Correct (PAC) learning
 Generalization error bound for finite H and nonzero error
 Theoretical characterization of overfitting
 02/12: Linear Classifiers and Perceptrons [slides] [slides 6up]
 Linear classification rules
 Batch Perceptron learning algorithm
 Online Perceptron learning algorithm
 Margin of linear classifier
 Perceptron mistake bound
 02/19: Statistical Learning Theory: Error Bounds and VCDimension [slides] [slides 6up]
 VapnikChervonenkis Dimension
 VC dimension of hyperplanes
 Symmetrization and error bound
 02/24: Support Vector Machines: Optimal Hyperplanes and Soft Margin [slides] [slides 6up]
 Optimal hyperplanes and margins
 Hardmargin Support Vector Machine
 Softmargin Support Vector Machine
 Primal optimization problems
 02/26: Support Vector Machines: Duality and Leaveoneout [slides] [slides 6up]
 Dual Perceptron
 Dual SVM optimization problem
 Connection between primal and dual
 Bounds on the leaveoneout error of an SVM
 03/03: Support Vector Machines: Kernels [slides] [slides 6up]
 Input space and feature space
 Kernels as a method for nonlinear learning
 Kernels for learning with nonvectorial data
 03/10: Regularized Linear Models (slides] [slides 6up]
 Conditional maximum likelihood
 Maximum a posteriori estimates
 Logistic regression
 Ridge regression
 03/12: Generative Models for Classification [slides] [slides 6up]
 Multivariate naive Bayes classifier
 Linear discriminant analysis
 Multinomial naive Bayes classifier
 MLE vs Bayesian
 03/17: Structured Output Prediction: Generative Models [slides] [slides 6up]
 Hidden Markov model (HMM)
 Estimation of HMM parameters
 Viterbi algorithm
 03/19: Structured Output Prediction: Discriminative Learning [slides] [slides 6up]
 Structural SVMs
 Conditional Random Fields
 03/24: Online Learning: Expert Setting [slides] [slides 6up]
 Expert Online Learning Model
 Halving Algorithm and Optimal Mistake Bound
 Weighted Majority Algorithm
 Exponentiated Gradient Algorithm
 03/24: Online Learning: Bandit Setting [slides] [slides 6up]
 Adversarial Bandit Model
 EXP3 Algorithm
 Stochastic Bandit Model
 UCB1 Algorithm
 04/07: ERM Optimization 1 (Yoram Singer) [slides]
 ERM Review
 Convexity, Smoothness, Lipschitzness
 Gradient Descent Algorithm
 04/09: ERM Optimization 2 (Yoram Singer)
 Convergence rate of GD
 Faster convergence for special cases
 Stochastic Gradient Algorithm
 04/14: Unsupervised Learning: kMeans and Mixtures of Gaussians [slides] [slides 6up]
 Flat clustering
 kMeans algorithms
 Mixture of gaussians model
 EMalgorithm for mixture of gaussians
 05/05: Wrapup [slides] [slides 6up]
 Batch learning: ERM vs. Conditional vs. Generative
 Online learning: Experts and Bandits
 Unsupervised learning: Latent variable models
 What we did not cover and other courses


Staff and Office Hours
Please use the CS6780 Piazza Forum for questions and discussions. The forum is monitored by all the TA's and the prof  you will get the best response time. And all the TA's will know the question you asked and the answers you received.
 Prof. Thorsten Joachims (homepage) [Office hours: Friday, 2:303:30pm (Gates 418)]
 Daniel Sedra, TA (homepage) [Office hours: Monday 9:3010:30am (Gates G19), Friday 9:3010:30am (Gates G13)]
 Adith Swaminathan, TA (homepage) [Office hours: Monday 56pm (Gates G11), Thursday 5:306:30pm (Gates G17)]


Assignments and Exams
Homework assignments can be downloaded from CMS.
All assignments are due at the beginning of class on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 4 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 35 days after the time it was due. You can submit late assignments in class or following the policy written on the homework handout.
Graded homework assignments and prelims can be picked up in Gates 216 (opening hours Monday  Friday 12:00pm  4:00pm).
Regrade requests can be submitted within 7 days after the grades have been made available on CMS. Regrade requests have to be submitted in writing and in hardcopy using this form (or similar). They can be submitted in class or to Gates 216.


Grading
This is a 4credit course. Grades will be determined based on a written exam, a research project, homework assignments, and class participation.
 35%: Exam
 35%: Research Project
 20%: Homework (~4 assignments)
 10%: Class Participation (e.g., lecture, piazza)
To eliminate outlier grades for homeworks, the lowest grade is replaced by the second lowest grade when grades are cumulated at the end of the semester.
All assignment, exam, and final grades (including + and  of that grade) are roughly on the following scale: A=92100; B=8288; C=7278; D=6068; F= below 60.
For the research project, we will use peer review analogous to how scientific papers are reviewed for conferences and journals. This means that you will read and comment on other students work, which provides input for the TA's and the professor to determine the project grades. The quality of your reviewing also becomes a component of your own grade.
Students taking the class S/U do all work, except for the project, and need to receive at least a grade equivalent to a D to pass the course.


Reference Material
The main textbooks for the class is:
 Kevin Murphy, "Machine Learning  a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
We will also read original research papers and other sources from the following list:
 Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
 Cristianini, ShaweTaylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
 Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
 Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
 Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
 Devroye, Gyoerfi, Lugosi, "A Probabilistic Theory of Pattern Recognition", Springer, 1997.
 Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
 Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
 Joachims, "Learning to Classify Text using Support Vector Machines", Kluwer, 2002.
 Leeds Tutorial on HMMs (online)
 Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library)
 Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
 Vapnik, "Statistical Learning Theory", Wiley, 1998.


Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, nonapproved collaborations) will not be tolerated.
