CS4780/5780 Machine Learning for Intelligent Systems, Cornell University

CS4780/5780 Machine Learning for Intelligent Systems, Cornell University

	CS4780/5780 - Machine Learning for Intelligent Systems Fall 2019 Prof. Nika Haghtalab & Prof. Thorsten Joachims Cornell University, Department of Computer Science

	Information on how to enroll for non-CS majors. Quick links: [Piazza] [Gradescope] [Vocareum] Time and Place First lecture: August 29, 2019 Time: Tuesday/Thursday, 2:55pm - 4:10pm Room: Statler Auditorium 185 Mid-term Exam: October 24, 7:30pm Final Exam: December 15, 7:00pm
	Course Description Machine learning is concerned with the question of how to make computers learn from experience. The ability to learn is not only central to most aspects of intelligent behavior, but machine learning techniques have become key components of many software systems. For examples, machine learning techniques are used to build search engines, to recommend movies, to understand natural language and images, and to build autonomous robots. This course will introduce the fundamental set of techniques and algorithms that constitute supervised machine learning as of today. The course will not only discuss individual algorithms and methods, but also tie principles and approaches together from a theoretical perspective. In particular, the course will cover the following topics: Supervised Batch Learning: model, decision theoretic foundation, model selection, model assessment, empirical risk minimization Instance-based Learning: K-Nearest Neighbors, collaborative filtering Decision Trees: TDIDT, attribute selection, pruning and overfitting Linear Rules: Perceptron, logistic regression, linear regression, duality Support Vector Machines: Optimal hyperplane, margin, kernels, stability Deep Learning: multi-layer perceptrons, deep networks, stochastic gradient Generative Models: generative vs. discriminative, naive Bayes, linear discriminant analysis Structured Output Prediction: predicting sequences, hidden markov model, rankings Statistical Learning Theory: generalization error bounds, VC dimension Online Learning: experts, bandits, online mistake bounds The prerequisites for the class are: Programming skills (e.g. CS 2110 or CS 3110), and basic knowledge of linear algebra (e.g. MATH 2940), and multivariable calculus, and probability theory (e.g. CS 2800).
	Lectures 08/29: Introduction [slides] [slides 6-up] [whiteboard] [video] Reading: UML 1 What is learning? What is machine learning used for? Overview of course, course policies, and contact info. 09/03: Instance-Based Learning [slides] [slides 6-up] [whiteboard] [video] Reading: UML 19.1, 19.3 Definition of binary classification, instance space, target function, training examples. Unweighted k-nearest neighbor (kNN) rule. Weighted kNN. Effect of selecting k. Supervised learning for binary classification, multi-class classification, regression, and stuctured output prediction. kNN for regression and collaborative filtering. 09/05: Supervised Learning and Decision Trees [slides] [slides 6-up] [whiteboard] [video] Hypothesis space, consistency, and version space List-then-eliminate algorithm Classifying with a decision tree Representational power of decision trees TDIDT decision tree learning algorithm Splitting criteria for TDIDT learning 09/10: Prediction and Overfitting [slides] [slides 6-up] [whiteboard] [video] Reading: UML 2.1-2.2, 18.2 Training error, Test error, prediction error Independently identically distributed (i.i.d) data Overfitting Occam's Razor 09/12: Model Selection and Assessment [slides] [slides 6-up] [whiteboard] [video] Reading: UML 11 (w/o 11.1) and McNemar's Test (ref1) and (ref2) Model selection Controlling overfitting in decision trees Train/validate/test split and k-fold cross-validation Statistical tests for assessing learning results 09/17: Linear Classifiers and Perceptrons [slides] [slides 6-up] [whiteboard] [video] Reading: UML 9-9.1 (w/o 9.1.3) Linear classification rules Linear programming for linear classification (Batch) Perceptron learning algorithm 09/19: Convergence of Perceptron [slides] [slides 6-up] [whiteboard] [video] Reading: UML 9.1.2 Margin of linear classifiers Convergence of Perceptron Online Mistake Bound Learning 09/24: Optimal Hyperplanes and SVMs [slides] [slides 6-up] [whiteboard] [video] Reading: UML 15.1-15.2 Optimal hyperplanes and margins Hard-margin Support Vector Machine Soft-margin Support Vector Machine Primal SVM optimization problem 09/26: Margin, Duality and Stability [slides] [slides 6-up] [whiteboard] [video] Reading: UML 15.3-15.4 Dual Perceptron Dual SVM optimization problem Connections between dual and primal Bounds on the leave-one-out error of SVMs 10/01: Kernels [slides] [slides 6-up] [whiteboard] [video] Reading: UML 16.1-16.2 Input space and feature space Kernels as a method for non-linear learning Kernels for learning with non-vectorial data 10/03: Regularized Linear Models [slides] [slides 6-up] [whiteboard] [video] Reading: UML 13.1, 9.2, 9.3 Conditional maximum likelihood Logistic regression Ridge regression 10/08: Stochastic Gradient Descent [slides] [slides 6-up] [whiteboard] [video] Reading: UML 14 Overview of convexity and gradients Gradient Descent Stochastic Gradient Descent 10/10: Deep Neural Networks [slides] [slides 6-up] [whiteboard] [video] Reading: UML 20-20.3 Demos: Step sizes and Tensorflow Playground AdaGrad and Momentum for SGD Feedforward Neural Networks Activation Functions Expressive power of neural networks 10/17: Backpropagation in Neural Networks [slides] [slides 6-up] [whiteboard] [video] Reading: UML 20.6 SGD in Multi-layer Networks Backpropagations 10/22: Review Session [whiteboard] [video] Review of material covered so far 10/29: Statistical Learning Theory I [slides][slides-6up][whiteboard] [video] Reading: UML 3-4 Psychic Game Generalization error bound for finite H and zero training error Generalization error bound for finite H and non-zero training error Sample Complexity for finite H. 10/31: Statistical Learning Theory II [slides][slides-6up][whiteboard] [video] Reading: UML 4 and 6 Generalization error bound for finite H and non-zero training error VC Dimension and Growth Function. Sample Complexity for infinite H. 11/05: Statistical Learning Theory III [slides][slides-6up][whiteboard] [video] Reading: UML 6 VC dimension examples PAC Learning. 11/07: Intelligibility in Machine Learning [video] Guest lecture by Rich Caruana 11/12: Generative Models for Classification [slides][slides-6up][whiteboard] [video] Reading: UML 24.2, 24.3 Generative modeling vs. discriminative training Multivariate Naive Bayes Multinominal Naive Bayes Linear Discriminant Analysis 11/14: Structured Output Prediction: Generative Models [slides][slides-6up][whiteboard] [video] Reading: Murphy 17.3, 17.4.4, 17.5.1 Hidden Markov Model (HMM) Estimation of HMM parameters Viterbi algorithm 11/19: Structured Output Prediction: Discriminative Training [slides][slides-6up][whiteboard] [video] Reading: Murphy 19.7 Structural SVMs Conditional Random Fields Encoder/Decoder Networks 11/21: Online Learning [slides][slides-6up][whiteboard] [video] Reading: UML 21-21.2.1 and optional reading Online Learning Model Halving Algorithm (Deterministic) Weighted Majority Algorithm 11/26: Boosting [slides][slides-6up][whiteboard] [video] Reading: UML 10-10.3 Optional readings: Schapire 2001 Survey and Schapire's Tutorial Strong and Weak Learning Adaboost Boosting and Regret Minimization Ensemble Methods and Bagging. 12/3: Learning to Act and Causality [slides][slides-6up] [video] Reading: Imbens/Rubin Chapter 1 Contextual-Bandit Feedback Policy Learning Potential Outcome Model Model-the-World vs. Model-the-Bias 12/5: Privacy and Fairness [slides][slides-6up][whiteboard][video] Reading: Dwork and Roth Chapters 1-2 Differential Privacy Fairness Tradeoffs 12/10: Fairness in Ranking and Wrap-up [slides][slides-6up][video] Reading: none Fairness of exposure Endogenous vs. exogenous bias Review of main class themes Pointers to followup classes
	Staff and Office Hours Please use the CS4780/5780 Piazza Forum as the primary channel for questions and discussions. Office hours:
	Assignments and Projects Homework assignments are managed on Gradescope, where they can be downloaded and submitted. All assignments are due at noon on the due date. Assignments turned in late will be charged a 1 percentage point reduction of the cumulated final homework grade for each period of 24 hours for which the assignment is late. However, every student has a budget of 5 late days (i.e. 24 hour periods after the time the assignment was due) throughout the semester for which there is no late penalty. So, if you have perfect scores of 100 on all 5 homeworks and a total of 8 late days, your final homework score will be 97. No assignment will be accepted after the solution was made public, which is typically 3-5 days after the time it was due. Regrade requests can be submitted within 7 days after the grades have been made available using the mechanism specified in the homework handout. Homework 1 is posted in the week of 09/01, homework 2 in the week of 09/15, homework 3 in the week of 09/30, homework 4 in the week of 11/03, and homework 5 in the week of 11/17. Programming projects augment the homework assignments with hands-on experiences. They are managed through Vocareum. You will receive an invite to join Vocareum to sign up. Late submissions are handled analogous to the policy for homework assignments, but you have a separate budget of 5 late days for the projects.
	Grading This is a 4-credit course. Grades will be determined based on two written exams, programming projects, homework assignments, a prereq assessment, and class participation. 50%: Exam 30%: Homework Assignments 18%: Programming Projects 1%: Prereq Assessment 1%: Class Participation (e.g., lecture, piazza, office hours) To eliminate outlier grades for homework assignments and programming projects, the lowest homework grade is replaced by the second lowest homework grade when grades are cumulated at the end of the semester.Analogously, lowest programming project grade is replaced by the second lowest programming project grade. All assignment, exam, and final grades (including + and - of that grade) are roughly on the following scale: A=92-100; B=82-88; C=72-78; D=60-68; F= below 60. Students taking the class S/U do all work and need to receive at least a grade equivalent to a C- to pass the course. Students auditing the course cannot hand in written homeworks and programming projects.
	Reference Material The main textbook for the class is: Shai Shalev-Shwartz, Shai Ben-David, "Understanding Machine Learning - From Theory to Algorithms", Cambridge University Press, 2014. (online) For additional reading, here is a list of other sources: Tom Mitchell, "Machine Learning", McGraw Hill, 1997. Kevin Murphy, "Machine Learning - a Probabilistic Perspective", MIT Press, 2012. (online via Cornell Library) Cristianini, Shawe-Taylor, "Introduction to Support Vector Machines", Cambridge University Press, 2000. (online via Cornell Library) Schoelkopf, Smola, "Learning with Kernels", MIT Press, 2001. (online) Bishop, "Pattern Recognition and Machine Learning", Springer, 2006. Ethem Alpaydin, "Introduction to Machine Learning", MIT Press, 2004. Duda, Hart, Stork, "Pattern Classification", Wiley, 2000. Hastie, Tibshirani, Friedman, "The Elements of Statistical Learning", Springer, 2001. Imbens, Rubin, Causal Inference for Statistical Social Science, Cambridge, 2015. (online via Cornell Library) Leeds Tutorial on HMMs (online) Manning, Schuetze, "Foundations of Statistical Natural Language Processing", MIT Press, 1999. (online via Cornell Library) Manning, Raghavan, Schuetze, "Introduction to Information Retrieval", Cambridge, 2008. (online) Vapnik, "Statistical Learning Theory", Wiley, 1998.
	Academic Integrity This course follows the Cornell University Code of Academic Integrity. Each student in this course is expected to abide by the Cornell University Code of Academic Integrity. Any work submitted by a student in this course for academic credit will be the student's own work. Collaborations are allowed only if explicitly permitted. Violations of the rules (e.g. cheating, copying, non-approved collaborations) will not be tolerated. Respectful, constructive and inclusive conduct is expected of all class participants.