CS4780/5780 Machine Learning for Intelligent Systems, Cornell University

CS4780/5780 - Machine Learning for Intelligent Systems

Fall 2019
Prof. Nika Haghtalab & Prof. Thorsten Joachims
Cornell University, Department of Computer Science


Information on how to enroll for non-CS majors.

Quick links: [Piazza] [Gradescope] [Vocareum]

Time and Place

First lecture: August 29, 2019
Time: Tuesday/Thursday, 2:55pm - 4:10pm
Room: Statler Auditorium 185

Mid-term Exam: October 24, 7:30pm
Final Exam: December 15, 7:00pm

Course Description

Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to build search engines, to recommend movies, to understand natural language and images, and to build autonomous robots. This course will introduce the fundamental set of techniques and algorithms that constitute supervised machine learning as of today. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics:

  • Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization
  • Instance-based Learning: K-Nearest Neighbors, collaborative filtering
  • Decision Trees: TDIDT, attribute selection, pruning and overfitting
  • Linear Rules: Perceptron, logistic regression, linear regression, duality
  • Support Vector Machines: Optimal hyperplane, margin, kernels, stability
  • Deep Learning: multi-layer perceptrons, deep networks, stochastic gradient
  • Generative Models: generative vs. discriminative, naive Bayes, linear discriminant analysis
  • Structured Output Prediction: predicting sequences, hidden markov model, rankings
  • Statistical Learning Theory: generalization error bounds, VC dimension
  • Online Learning: experts, bandits, online mistake bounds
The prerequisites for the class are: Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra (e.g. MATH 2940), and multivariable calculus, and probability theory (e.g. CS 2800).



  • 08/29: Introduction [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 1
    • What is learning?
    • What is machine learning used for?
    • Overview of course, course policies, and contact info.
  • 09/03: Instance-Based Learning [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 19.1, 19.3
    • Definition of binary classification, instance space, target function, training examples.
    • Unweighted k-nearest neighbor (kNN) rule.
    • Weighted kNN.
    • Effect of selecting k.
    • Supervised learning for binary classification, multi-class classification, regression, and stuctured output prediction.
    • kNN for regression and collaborative filtering.
  • 09/05: Supervised Learning and Decision Trees [slides] [slides 6-up] [whiteboard] [video]
    • Hypothesis space, consistency, and version space
    • List-then-eliminate algorithm
    • Classifying with a decision tree
    • Representational power of decision trees
    • TDIDT decision tree learning algorithm
    • Splitting criteria for TDIDT learning
  • 09/10: Prediction and Overfitting [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 2.1-2.2, 18.2
    • Training error, Test error, prediction error
    • Independently identically distributed (i.i.d) data
    • Overfitting
    • Occam's Razor
  • 09/12: Model Selection and Assessment [slides] [slides 6-up] [whiteboard] [video]
    • Reading: UML 11 (w/o 11.1) and McNemar's Test (ref1) and ref2
    • Model selection
    • Controlling overfitting in decision trees
    • Train/validate/test split and k-fold cross-validation
    • Statistical tests for assessing learning results
  • 09/17: Linear Classifiers and Perceptrons
    • Reading: UML 9-9.1 (w/o 9.1.3)
    • Linear classification rules
    • Linear programming for linear classification
    • (Batch) Perceptron learning algorithm
  • 09/19: Convergence of Perceptron
    • Reading: UML 9.1.2
    • Margin of linear classifiers
    • Convergence of Perceptron
    • Online Mistake Bound Learning

Staff and Office Hours

Please use the CS4780/5780 Piazza Forum as the primary channel for questions and discussions.

Office hours:


Assignments and Projects

Homework assignments are managed on Gradescope, where they can be downloaded and submitted. All assignments are due at noon on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 5 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 3-5 days after the time it was due. Regrade requests can be submitted within 7 days after the grades have been made available using the mechanism specified in the homework handout. Homework 1 is posted in the week of 09/01, homework 2 in the week of 09/15, homework 3 in the week of 09/30, homework 4 in the week of 11/03, and homework 5 in the week of 11/17.

Programming projects augment the homework assignments with hands-on experiences. They are managed through Vocareum. You will receive an invite to join Vocareum to sign up. Late submissions are handled analogous to the policy for homework assignments, but you have a separate budget of 5 late days for the projects.



This is a 4-credit course. Grades will be determined based on two written exams, programming projects, homework assignments, a prereq assessment, and class participation.

  • 50%: Exam
  • 30%: Homework Assignments
  • 18%: Programming Projects
  • 1%: Prereq Assessment
  • 1%: Class Participation (e.g., lecture, piazza, office hours)

To eliminate outlier grades for homework assignments and programming projects, the lowest homework grade is replaced by the second lowest homework grade when grades are cumulated at the end of the semester.Analogously, lowest programming project grade is replaced by the second lowest programming project grade.

All assignment, exam, and final grades (including + and - of that grade) are roughly on the following scale: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60.

Students taking the class S/U do all work and need to receive at least a grade equivalent to a D to pass the course.

Students auditing the course cannot hand in written homeworks and programming projects.


Reference Material

The main textbook for the class is:

  • Shai Shalev-Shwartz, Shai Ben-David, "Understanding Machine Learning - From Theory to Algorithms", Cambridge University Press, 2014. (online)

For additional reading, here is a list of other sources:

  • Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
  • Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
  • Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
  • Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
  • Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
  • Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
  • Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
  • Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
  • Imbens, Rubin, Causal Inference for Statistical Social Science, Cambridge, 2015. (online via Cornell Library)
  • Leeds Tutorial on HMMs (online)
  • Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library)
  • Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
  • Vapnik, "Statistical Learning Theory", Wiley, 1998.

Academic Integrity

This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.