
CS4780/5780  Machine Learning for Intelligent Systems
Fall 2019
Prof. Nika Haghtalab & Prof. Thorsten Joachims
Cornell University, Department of Computer Science



Information on how to enroll for nonCS majors.
Quick links: [Piazza] [Gradescope] [Vocareum]
Time and Place
First lecture: August 29, 2019
Time: Tuesday/Thursday, 2:55pm  4:10pm
Room: Statler Auditorium 185
Midterm Exam: October 24, 7:30pm
Final Exam: December 15, 7:00pm


Course Description
Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to build search engines, to recommend movies, to understand natural language and images, and to build autonomous robots.
This course will introduce the fundamental set of techniques and algorithms that constitute supervised machine learning as of today. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics:
 Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization
 Instancebased Learning: KNearest Neighbors, collaborative filtering
 Decision Trees: TDIDT, attribute selection, pruning and overfitting
 Linear Rules: Perceptron, logistic regression, linear regression, duality
 Support Vector Machines: Optimal hyperplane, margin, kernels, stability
 Deep Learning: multilayer perceptrons, deep networks, stochastic gradient
 Generative Models: generative vs. discriminative, naive Bayes, linear discriminant analysis
 Structured Output Prediction: predicting sequences, hidden markov model, rankings
 Statistical Learning Theory: generalization error bounds, VC dimension
 Online Learning: experts, bandits, online mistake bounds
The prerequisites for the class are: Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra (e.g. MATH 2940), and multivariable calculus, and probability theory (e.g. CS 2800).


Lectures
 08/29: Introduction [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 1
 What is learning?
 What is machine learning used for?
 Overview of course, course policies, and contact info.
 09/03: InstanceBased Learning [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 19.1, 19.3
 Definition of binary classification, instance space, target function, training examples.
 Unweighted knearest neighbor (kNN) rule.
 Weighted kNN.
 Effect of selecting k.
 Supervised learning for binary classification, multiclass classification, regression, and stuctured output prediction.
 kNN for regression and collaborative filtering.
 09/05: Supervised Learning and Decision Trees [slides] [slides 6up] [whiteboard] [video]
 Hypothesis space, consistency, and version space
 Listtheneliminate algorithm
 Classifying with a decision tree
 Representational power of decision trees
 TDIDT decision tree learning algorithm
 Splitting criteria for TDIDT learning
 09/10: Prediction and Overfitting [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 2.12.2, 18.2
 Training error, Test error, prediction error
 Independently identically distributed (i.i.d) data
 Overfitting
 Occam's Razor
 09/12: Model Selection and Assessment [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 11 (w/o 11.1) and McNemar's Test (ref1) and (ref2)
 Model selection
 Controlling overfitting in decision trees
 Train/validate/test split and kfold crossvalidation
 Statistical tests for assessing learning results
 09/17: Linear Classifiers and Perceptrons [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 99.1 (w/o 9.1.3)
 Linear classification rules
 Linear programming for linear classification
 (Batch) Perceptron learning algorithm
 09/19: Convergence of Perceptron [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 9.1.2
 Margin of linear classifiers
 Convergence of Perceptron
 Online Mistake Bound Learning
 09/24: Optimal Hyperplanes and SVMs [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 15.115.2
 Optimal hyperplanes and margins
 Hardmargin Support Vector Machine
 Softmargin Support Vector Machine
 Primal SVM optimization problem
 09/26: Margin, Duality and Stability [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 15.315.4
 Dual Perceptron
 Dual SVM optimization problem
 Connections between dual and primal
 Bounds on the leaveoneout error of SVMs
 10/01: Kernels [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 16.116.2
 Input space and feature space
 Kernels as a method for nonlinear learning
 Kernels for learning with nonvectorial data
 10/03: Regularized Linear Models [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 13.1, 9.2, 9.3
 Conditional maximum likelihood
 Logistic regression
 Ridge regression
 10/08: Stochastic Gradient Descent [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 14
 Overview of convexity and gradients
 Gradient Descent
 Stochastic Gradient Descent
 10/10: Deep Neural Networks [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 2020.3
 Demos: Step sizes and Tensorflow Playground
 AdaGrad and Momentum for SGD
 Feedforward Neural Networks
 Activation Functions
 Expressive power of neural networks
 10/17: Backpropagation in Neural Networks [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 20.6
 SGD in Multilayer Networks
 Backpropagations
 10/22: Review Session [whiteboard] [video]
 Review of material covered so far
 10/29: Statistical Learning Theory I [slides][slides6up][whiteboard] [video]
 Reading: UML 34
 Psychic Game
 Generalization error bound for finite H and zero training error
 Generalization error bound for finite H and nonzero training error
 Sample Complexity for finite H.
 10/31: Statistical Learning Theory II [slides][slides6up][whiteboard] [video]
 Reading: UML 4 and 6
 Generalization error bound for finite H and nonzero training error
 VC Dimension and Growth Function.
 Sample Complexity for infinite H.
 11/05: Statistical Learning Theory III [slides][slides6up][whiteboard] [video]
 Reading: UML 6
 VC dimension examples
 PAC Learning.
 11/07: Intelligibility in Machine Learning [video]
 Guest lecture by Rich Caruana
 11/12: Generative Models for Classification [slides][slides6up][whiteboard] [video]
 Reading: UML 24.2, 24.3
 Generative modeling vs. discriminative training
 Multivariate Naive Bayes
 Multinominal Naive Bayes
 Linear Discriminant Analysis
 11/14: Structured Output Prediction: Generative Models [slides][slides6up][whiteboard] [video]
 Reading: Murphy 17.3, 17.4.4, 17.5.1
 Hidden Markov Model (HMM)
 Estimation of HMM parameters
 Viterbi algorithm
 11/19: Structured Output Prediction: Discriminative Training [slides][slides6up][whiteboard] [video]
 Reading: Murphy 19.7
 Structural SVMs
 Conditional Random Fields
 Encoder/Decoder Networks
 11/21: Online Learning [slides][slides6up][whiteboard] [video]
 Reading: UML 2121.2.1 and optional reading
 Online Learning Model
 Halving Algorithm
 (Deterministic) Weighted Majority Algorithm
 11/26: Boosting [slides][slides6up][whiteboard] [video]
 12/3: Learning to Act and Causality [slides][slides6up] [video]
 Reading: Imbens/Rubin Chapter 1
 ContextualBandit Feedback
 Policy Learning
 Potential Outcome Model
 ModeltheWorld vs. ModeltheBias
 12/5: Privacy and Fairness [slides][slides6up][whiteboard][video]
 Reading: Dwork and Roth Chapters 12
 Differential Privacy
 Fairness Tradeoffs
 12/10: Fairness in Ranking and Wrapup [slides][slides6up][video]
 Reading: none
 Fairness of exposure
 Endogenous vs. exogenous bias
 Review of main class themes
 Pointers to followup classes


Staff and Office Hours
Please use the CS4780/5780 Piazza Forum as the primary channel for questions and discussions.
Office hours:


Assignments and Projects
Homework assignments are managed on Gradescope, where they can be downloaded and submitted. All assignments are due at noon on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 5 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 35 days after the time it was due. Regrade requests can be submitted within 7 days after the grades have been made available using the mechanism specified in the homework handout.
Homework 1 is posted in the week of 09/01, homework 2 in the week of 09/15, homework 3 in the week of 09/30, homework 4 in the week of 11/03, and homework 5 in the week of 11/17.
Programming projects augment the homework assignments with handson experiences. They are managed through Vocareum. You will receive an invite to join Vocareum to sign up. Late submissions are handled analogous to the policy for homework assignments, but you have a separate budget of 5 late days for the projects.


Grading
This is a 4credit course. Grades will be determined based on two written exams, programming projects, homework assignments, a prereq assessment, and class participation.
 50%: Exam
 30%: Homework Assignments
 18%: Programming Projects
 1%: Prereq Assessment
 1%: Class Participation (e.g., lecture, piazza, office hours)
To eliminate outlier grades for homework assignments and programming projects, the lowest homework grade is replaced by the second lowest homework grade when grades are cumulated at the end of the semester.Analogously, lowest programming project grade is replaced by the second lowest programming project grade.
All assignment, exam, and final grades (including + and  of that grade) are roughly on the following scale: A=92100; B=8288; C=7278; D=6068; F= below 60.
Students taking the class S/U do all work and need to receive at least a grade equivalent to a C to pass the course.
Students auditing the course cannot hand in written homeworks and programming projects.


Reference Material
The main textbook for the class is:
 Shai ShalevShwartz, Shai BenDavid, "Understanding Machine Learning  From Theory to Algorithms", Cambridge University Press, 2014. (online)
For additional reading, here is a list of other sources:
 Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
 Kevin Murphy, "Machine Learning  a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
 Cristianini, ShaweTaylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
 Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
 Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
 Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
 Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
 Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
 Imbens, Rubin, Causal Inference for Statistical Social Science, Cambridge, 2015. (online via Cornell Library)
 Leeds Tutorial on HMMs (online)
 Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library)
 Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
 Vapnik, "Statistical Learning Theory", Wiley, 1998.


Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, nonapproved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.
