
CS4780/5780  Machine Learning for Intelligent Systems
Fall 2019
Prof. Nika Haghtalab & Prof. Thorsten Joachims
Cornell University, Department of Computer Science



Information on how to enroll for nonCS majors.
Quick links: [Piazza] [Gradescope] [Vocareum]
Time and Place
First lecture: August 29, 2019
Time: Tuesday/Thursday, 2:55pm  4:10pm
Room: Statler Auditorium 185
Midterm Exam: October 24, 7:30pm
Final Exam: December 15, 7:00pm


Course Description
Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to build search engines, to recommend movies, to understand natural language and images, and to build autonomous robots.
This course will introduce the fundamental set of techniques and algorithms that constitute supervised machine learning as of today. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics:
 Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization
 Instancebased Learning: KNearest Neighbors, collaborative filtering
 Decision Trees: TDIDT, attribute selection, pruning and overfitting
 Linear Rules: Perceptron, logistic regression, linear regression, duality
 Support Vector Machines: Optimal hyperplane, margin, kernels, stability
 Deep Learning: multilayer perceptrons, deep networks, stochastic gradient
 Generative Models: generative vs. discriminative, naive Bayes, linear discriminant analysis
 Structured Output Prediction: predicting sequences, hidden markov model, rankings
 Statistical Learning Theory: generalization error bounds, VC dimension
 Online Learning: experts, bandits, online mistake bounds
The prerequisites for the class are: Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra (e.g. MATH 2940), and multivariable calculus, and probability theory (e.g. CS 2800).


Lectures
 08/29: Introduction [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 1
 What is learning?
 What is machine learning used for?
 Overview of course, course policies, and contact info.
 09/03: InstanceBased Learning [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 19.1, 19.3
 Definition of binary classification, instance space, target function, training examples.
 Unweighted knearest neighbor (kNN) rule.
 Weighted kNN.
 Effect of selecting k.
 Supervised learning for binary classification, multiclass classification, regression, and stuctured output prediction.
 kNN for regression and collaborative filtering.
 09/05: Supervised Learning and Decision Trees [slides] [slides 6up] [whiteboard] [video]
 Hypothesis space, consistency, and version space
 Listtheneliminate algorithm
 Classifying with a decision tree
 Representational power of decision trees
 TDIDT decision tree learning algorithm
 Splitting criteria for TDIDT learning
 09/10: Prediction and Overfitting [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 2.12.2, 18.2
 Training error, Test error, prediction error
 Independently identically distributed (i.i.d) data
 Overfitting
 Occam's Razor
 09/12: Model Selection and Assessment [slides] [slides 6up] [whiteboard] [video]
 Reading: UML 11 (w/o 11.1) and McNemar's Test (ref1) and ref2
 Model selection
 Controlling overfitting in decision trees
 Train/validate/test split and kfold crossvalidation
 Statistical tests for assessing learning results
 09/17: Linear Classifiers and Perceptrons
 Reading: UML 99.1 (w/o 9.1.3)
 Linear classification rules
 Linear programming for linear classification
 (Batch) Perceptron learning algorithm
 09/19: Convergence of Perceptron
 Reading: UML 9.1.2
 Margin of linear classifiers
 Convergence of Perceptron
 Online Mistake Bound Learning


Staff and Office Hours
Please use the CS4780/5780 Piazza Forum as the primary channel for questions and discussions.
Office hours:


Assignments and Projects
Homework assignments are managed on Gradescope, where they can be downloaded and submitted. All assignments are due at noon on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 5 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 35 days after the time it was due. Regrade requests can be submitted within 7 days after the grades have been made available using the mechanism specified in the homework handout.
Homework 1 is posted in the week of 09/01, homework 2 in the week of 09/15, homework 3 in the week of 09/30, homework 4 in the week of 11/03, and homework 5 in the week of 11/17.
Programming projects augment the homework assignments with handson experiences. They are managed through Vocareum. You will receive an invite to join Vocareum to sign up. Late submissions are handled analogous to the policy for homework assignments, but you have a separate budget of 5 late days for the projects.


Grading
This is a 4credit course. Grades will be determined based on two written exams, programming projects, homework assignments, a prereq assessment, and class participation.
 50%: Exam
 30%: Homework Assignments
 18%: Programming Projects
 1%: Prereq Assessment
 1%: Class Participation (e.g., lecture, piazza, office hours)
To eliminate outlier grades for homework assignments and programming projects, the lowest homework grade is replaced by the second lowest homework grade when grades are cumulated at the end of the semester.Analogously, lowest programming project grade is replaced by the second lowest programming project grade.
All assignment, exam, and final grades (including + and  of that grade) are roughly on the following scale: A=92100; B=8288; C=7278; D=6068; F= below 60.
Students taking the class S/U do all work and need to receive at least a grade equivalent to a D to pass the course.
Students auditing the course cannot hand in written homeworks and programming projects.


Reference Material
The main textbook for the class is:
 Shai ShalevShwartz, Shai BenDavid, "Understanding Machine Learning  From Theory to Algorithms", Cambridge University Press, 2014. (online)
For additional reading, here is a list of other sources:
 Tom Mitchell, "Machine Learning", McGraw Hill, 1997.
 Kevin Murphy, "Machine Learning  a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library)
 Cristianini, ShaweTaylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library)
 Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online)
 Bishop, "Pattern Recognition and Machine Learning", Springer, 2006.
 Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004.
 Duda, Hart, Stork, "Pattern Classification", Wiley, 2000.
 Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001.
 Imbens, Rubin, Causal Inference for Statistical Social Science, Cambridge, 2015. (online via Cornell Library)
 Leeds Tutorial on HMMs (online)
 Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library)
 Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online)
 Vapnik, "Statistical Learning Theory", Wiley, 1998.


Academic Integrity
This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, nonapproved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.
