CS4780/5780 Machine Learning for Intelligent Systems, Cornell University

CS4780/5780 - Machine Learning for Intelligent Systems

Fall 2019
Prof. Nika Haghtalab & Prof. Thorsten Joachims
Cornell University, Department of Computer Science


Information on how to enroll for non-CS majors.

Quick links: [Piazza] [Gradescope] [Vocareum]

Time and Place

First lecture: August 29, 2019
Time: Tuesday/Thursday, 2:55pm - 4:10pm
Room: Statler Auditorium 185

Mid-term Exam: October 24, 7:30pm
Final Exam: December 15, 7:00pm

Course Description

Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to build search engines, to recommend movies, to understand natural language and images, and to build autonomous robots. This course will introduce the fundamental set of techniques and algorithms that constitute supervised machine learning as of today. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics:

  • Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization
  • Instance-based Learning: K-Nearest Neighbors, collaborative filtering
  • Decision Trees: TDIDT, attribute selection, pruning and overfitting
  • Linear Rules: Perceptron, logistic regression, linear regression, duality
  • Support Vector Machines: Optimal hyperplane, margin, kernels, stability
  • Deep Learning: multi-layer perceptrons, deep networks, stochastic gradient
  • Generative Models: generative vs. discriminative, naive Bayes, linear discriminant analysis
  • Structured Output Prediction: predicting sequences, hidden markov model, rankings
  • Statistical Learning Theory: generalization error bounds, VC dimension
  • Online Learning: experts, bandits, online mistake bounds
The prerequisites for the class are: Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra (e.g. MATH 2940), and multivariable calculus, and probability theory (e.g. CS 2800).



  • 08/29: Introduction [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 1
    • What is learning?
    • What is machine learning used for?
    • Overview of course, course policies, and contact info.
  • 09/03: Instance-Based Learning [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 19.1, 19.3
    • Definition of binary classification, instance space, target function, training examples.
    • Unweighted k-nearest neighbor (kNN) rule.
    • Weighted kNN.
    • Effect of selecting k.
    • Supervised learning for binary classification, multi-class classification, regression, and stuctured output prediction.
    • kNN for regression and collaborative filtering.
  • 09/05: Supervised Learning and Decision Trees [slides] [slides 6-up] [whiteboard] [video]
    • Hypothesis space, consistency, and version space
    • List-then-eliminate algorithm
    • Classifying with a decision tree
    • Representational power of decision trees
    • TDIDT decision tree learning algorithm
    • Splitting criteria for TDIDT learning
  • 09/10: Prediction and Overfitting [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 2.1-2.2, 18.2
    • Training error, Test error, prediction error
    • Independently identically distributed (i.i.d) data
    • Overfitting
    • Occam's Razor
  • 09/12: Model Selection and Assessment [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 11 (w/o 11.1) and McNemar's Test (ref1) and (ref2)
    • Model selection
    • Controlling overfitting in decision trees
    • Train/validate/test split and k-fold cross-validation
    • Statistical tests for assessing learning results
  • 09/17: Linear Classifiers and Perceptrons [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 9-9.1 (w/o 9.1.3)
    • Linear classification rules
    • Linear programming for linear classification
    • (Batch) Perceptron learning algorithm
  • 09/19: Convergence of Perceptron [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 9.1.2
    • Margin of linear classifiers
    • Convergence of Perceptron
    • Online Mistake Bound Learning
  • 09/24: Optimal Hyperplanes and SVMs [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 15.1-15.2
    • Optimal hyperplanes and margins
    • Hard-margin Support Vector Machine
    • Soft-margin Support Vector Machine
    • Primal SVM optimization problem
  • 09/26: Margin, Duality and Stability [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 15.3-15.4
    • Dual Perceptron
    • Dual SVM optimization problem
    • Connections between dual and primal
    • Bounds on the leave-one-out error of SVMs
  • 10/01: Kernels [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 16.1-16.2
    • Input space and feature space
    • Kernels as a method for non-linear learning
    • Kernels for learning with non-vectorial data
  • 10/03: Regularized Linear Models [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 13.1, 9.2, 9.3
    • Conditional maximum likelihood
    • Logistic regression
    • Ridge regression
  • 10/08: Stochastic Gradient Descent [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 14
    • Overview of convexity and gradients
    • Gradient Descent
    • Stochastic Gradient Descent
  • 10/10: Deep Neural Networks [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 20-20.3
    • Demos: Step sizes and Tensorflow Playground
    • AdaGrad and Momentum for SGD
    • Feedforward Neural Networks
    • Activation Functions
    • Expressive power of neural networks
  • 10/17: Backpropagation in Neural Networks [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 20.6
    • SGD in Multi-layer Networks
    • Backpropagations
  • 10/22: Review Session [whiteboard] [video]
    • Review of material covered so far
  • 10/29: Statistical Learning Theory I [slides][slides-6up][whiteboard] [video]
    • Reading: UML 3-4
    • Psychic Game
    • Generalization error bound for finite H and zero training error
    • Generalization error bound for finite H and non-zero training error
    • Sample Complexity for finite H.
  • 10/31: Statistical Learning Theory II [slides][slides-6up][whiteboard] [video]
    • Reading: UML 4 and 6
    • Generalization error bound for finite H and non-zero training error
    • VC Dimension and Growth Function.
    • Sample Complexity for infinite H.
  • 11/05: Statistical Learning Theory III [slides][slides-6up][whiteboard] [video]
    • Reading: UML 6
    • VC dimension examples
    • PAC Learning.
  • 11/07: Intelligibility in Machine Learning [video]
    • Guest lecture by Rich Caruana
  • 11/12: Generative Models for Classification [slides][slides-6up][whiteboard] [video]
    • Reading: UML 24.2, 24.3
    • Generative modeling vs. discriminative training
    • Multivariate Naive Bayes
    • Multinominal Naive Bayes
    • Linear Discriminant Analysis
  • 11/14: Structured Output Prediction: Generative Models [slides][slides-6up][whiteboard] [video]
    • Reading: Murphy 17.3, 17.4.4, 17.5.1
    • Hidden Markov Model (HMM)
    • Estimation of HMM parameters
    • Viterbi algorithm
  • 11/19: Structured Output Prediction: Discriminative Training [slides][slides-6up][whiteboard] [video]
    • Reading: Murphy 19.7
    • Structural SVMs
    • Conditional Random Fields
    • Encoder/Decoder Networks
  • 11/21: Online Learning [slides][slides-6up][whiteboard] [video]
    • Reading: UML 21-21.2.1 and optional reading
    • Online Learning Model
    • Halving Algorithm
    • (Deterministic) Weighted Majority Algorithm
  • 11/26: Boosting [slides][slides-6up][whiteboard] [video]
  • 12/3: Learning to Act and Causality [slides][slides-6up] [video]
    • Reading: Imbens/Rubin Chapter 1
    • Contextual-Bandit Feedback
    • Policy Learning
    • Potential Outcome Model
    • Model-the-World vs. Model-the-Bias
  • 12/5: Privacy and Fairness [slides][slides-6up][whiteboard][video]
    • Reading: Dwork and Roth Chapters 1-2
    • Differential Privacy
    • Fairness Tradeoffs
  • 12/10: Fairness in Ranking and Wrap-up [slides][slides-6up][video]
    • Reading: none
    • Fairness of exposure
    • Endogenous vs. exogenous bias
    • Review of main class themes
    • Pointers to followup classes

Staff and Office Hours

Please use the CS4780/5780 Piazza Forum as the primary channel for questions and discussions.

Office hours:


Assignments and Projects

Homework assignments are managed on Gradescope, where they can be downloaded and submitted. All assignments are due at noon on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 5 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 3-5 days after the time it was due. Regrade requests can be submitted within 7 days after the grades have been made available using the mechanism specified in the homework handout. Homework 1 is posted in the week of 09/01, homework 2 in the week of 09/15, homework 3 in the week of 09/30, homework 4 in the week of 11/03, and homework 5 in the week of 11/17.

Programming projects augment the homework assignments with hands-on experiences. They are managed through Vocareum. You will receive an invite to join Vocareum to sign up. Late submissions are handled analogous to the policy for homework assignments, but you have a separate budget of 5 late days for the projects.



This is a 4-credit course. Grades will be determined based on two written exams, programming projects, homework assignments, a prereq assessment, and class participation.

  • 50%: Exam
  • 30%: Homework Assignments
  • 18%: Programming Projects
  • 1%: Prereq Assessment
  • 1%: Class Participation (e.g., lecture, piazza, office hours)

To eliminate outlier grades for homework assignments and programming projects, the lowest homework grade is replaced by the second lowest homework grade when grades are cumulated at the end of the semester.Analogously, lowest programming project grade is replaced by the second lowest programming project grade.

All assignment, exam, and final grades (including + and - of that grade) are roughly on the following scale: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60.

Students taking the class S/U do all work and need to receive at least a grade equivalent to a C- to pass the course.

Students auditing the course cannot hand in written homeworks and programming projects.


Reference Material

The main textbook for the class is:

  • Shai Shalev-Shwartz, Shai Ben-David, "Understanding Machine Learning - From Theory to Algorithms", Cambridge University Press, 2014. (online)

For additional reading, here is a list of other sources:

  • Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
  • Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
  • Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
  • Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
  • Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
  • Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
  • Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
  • Imbens, Rubin, Causal Inference for Statistical Social Science, Cambridge, 2015. (online via Cornell Library)
  • Leeds Tutorial on HMMs (online)
  • Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library)
  • Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
  • Vapnik, "Statistical Learning Theory", Wiley, 1998.

Academic Integrity

This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.