Syllabus for CS4787

Principles of Large-Scale Machine Learning — Spring 2020

TermSpring 2020InstructorChristopher De Sa
Course websitewww.cs.cornell.edu/courses/cs4787/2020sp/E-mail[email hidden]
ScheduleMW 7:30-8:45PMOffice hoursWednesdays 2PM
RoomHollister Hall B14OfficeGates Hall 450

Description: CS4787 explores the principles behind scalable machine learning systems. The course will cover the algorithmic and the implementation principles that power the current generation of machine learning on big data. We will cover training and inference for both traditional ML algorithms such as linear and logistic regression, as well as deep models. Topics will include: estimating statistics of data quickly with subsampling, stochastic gradient descent and other scalable optimization methods, mini-batch training, accelerated methods, adaptive learning rates, methods for scalable deep learning, hyperparameter optimization, parallel and distributed training, and quantization and model compression.

Prerequisites: CS4780 or equivalent, CS 2110 or equivalent

Format: Lectures during the scheduled lecture period will cover the course content. Problem sets will be used to encourage familiarity with the content and develop competence with the more mathematical aspects of the course. Programming assignments will help build intuition and familiarity with how machine learning algorithms run. There will be one midterm exam and one final exam.

Material: The course is based on books, papers, and other texts in machine learning, scalable optimization, and systems. Texts will be provided ahead of time on the website on a per-lecture basis. You aren't expected to necessarily read the texts, but they will provide useful background for the material we are discussing.

Grading: Students will be evaluated on the following basis.

30%Problem sets
40%Programming assignments
30%Take-Home Final Exam

TA Office Hours

Course calendar may be subject to change.

Course Calendar

Wednesday, January 22 Lecture 1. Introduction and course overview. [Notes]
Monday, January 27 Lecture 2. Scaling to complex models by learning with optimization algorithms. Gradient descent, convex optimization and conditioning. [Notes] [Demo Notebook] [Demo HTML]

Problem Set 1 Released.

Background reading material:

Wednesday, January 29 Lecture 3. Gradient descent continued. Stochastic gradient descent. [Notes]

Background reading material:

Monday, February 3 Lecture 4. Stochastic gradient descent continued. Scaling to huge datasets with subsampling. [Notes] [Demo Notebook] [Demo HTML]

Programming Assignment 1 Released.

Background reading material:

Wednesday, February 5 Lecture 5. Adapting algorithms to hardware. Minibatching and the effect of the learning rate. Our first hyperparameters. [Notes] [Demo Notebook] [Demo HTML]

Background reading material:

Monday, February 10 Lecture 6. The mathematical hammers behind subsampling. Estimating large sums with samples, e.g. the empirical risk. Concentration inequalities. [Notes] [Demo Notebook] [Demo HTML]

Problem Set 1 Due.

Problem Set 2 Released.

Background reading material:

Wednesday, February 12 Lecture 7. Optimization techniques for efficient ML. Accelerating SGD with momentum. [Notes] [Demo Notebook] [Demo HTML]

Programming Assignment 1 Due.

Background reading material:

Monday, February 17 Lecture 8. Optimization techniques for efficient ML, continued. Accelerating SGD with preconditioning and adaptive learning rates. [Notes]

Programming Assignment 2 Released.

Background reading material:

Wednesday, February 19 Lecture 9. Optimization techniques for efficient ML, continued. Accelerating SGD with variance reduction and averaging. [Notes]

Problem Set 2 Due.

Background reading material:

Monday, February 24 February break. No lecture.
Wednesday, February 26 Lecture 10. Dimensionality reduction and sparsity. [Notes] [Demo Notebook] [Demo HTML]

Background reading material:

Monday, March 2 Lecture 11. Deep neural networks. Matrix multiply as computational core of learning. [Notes] [Demo Notebook] [Demo HTML]

Programming Assignment 2 Due.

Problem Set 3 Released.

Background reading material:

Wednesday, March 4 Class cancelled. No lecture.
Monday, March 9 Lecture 12. Automatic differentiation and ML frameworks. [Notes] [Demo Notebook] [Demo HTML]

Programming Assignment 3 Released.

Background reading material:

Wednesday, March 11 Lecture 13. Accelerating DNN training: early stopping and batch normalization. [Notes] [Demo Notebook] [Demo HTML]

Background reading material:

Monday, March 16 Classes postponed due to COVID-19. No lecture.
Tuesday, March 17 Prelim Exam Cancelled
Wednesday, March 18 Classes postponed due to COVID-19. No lecture.
Monday, March 23 Classes postponed due to COVID-19. No lecture.
Wednesday, March 25 Classes postponed due to COVID-19. No lecture.
Monday, March 30 Spring break. No lecture.
Wednesday, April 1 Spring break. No lecture.
Monday, April 6 Lecture 14. Hyperparameter optimization. Grid search. Random search. [Notes] [Slides Notebook] [Slides HTML] [Demo Notebook] [Demo HTML]

Problem Set 3 Due.

Problem Set 4 Released.

Background reading material:

Wednesday, April 8 Lecture 15. Kernels and kernel feature extraction. [Notes] [Slides Notebook] [Slides HTML]

Programming Assignment 3 Due.

Background reading material:

Monday, April 13 Lecture 16. Parallelism. [Notes] [Slides Notebook] [Slides HTML] [Demo Notebook] [Demo HTML]

Programming Assignment 4 Released.

Background reading material:

  • Good resource on parallel programming, particularly on GPUs: Chapter 1 of Programming Massively Parallel Processors: A Hands-On Approach, Second Edition (by David B. Kirk and Wen-mei W. Hwu). This book is available on the Cornell library.
  • Classical work providing background on parallelism in computer architecture: Chapters 3, 4, and 5 of Computer Architecture: A Quantitative Approach. This book is available on the Cornell library.
Wednesday, April 15 Lecture 17. Parallelism 2. [Notes] [Slides Notebook] [Slides HTML] [Demo Notebook] [Demo HTML]

Background reading material: same as Parallelism 1.

Monday, April 20 Lecture 18. Memory locality and memory bandwidth. [Notes] [Slides Notebook] [Slides HTML]

Problem Set 4 Due.

Problem Set 5 Released.

Background reading material: same as Parallelism 1.

Wednesday, April 22 Lecture 19. Machine learning on GPUs; matrix multiply returns. [Notes] [Slides Notebook] [Slides HTML]

Background reading material:

  • Parallel programming on GPUs: Chapters 2-5 of Programming Massively Parallel Processors: A Hands-On Approach, Second Edition (by David B. Kirk and Wen-mei W. Hwu). This book is available on the Cornell library.
Monday, April 27 Lecture 20. Distributed learning and the parameter server. [Notes] [Slides]

Programming Assignment 4 Due.

Programming Assignment 5 Released.

Background reading material:

Wednesday, April 29 Lecture 21. Quantized, low-precision machine learning. [Notes] [Slides] [Demo Notebook] [Demo HTML]

Background reading material:

  • An example of a blog post illustrating the use of low-precision arithmetic for deep learning.
Monday, May 4 Lecture 22. Deployment and low-latency inference. Deep neural network compression and pruning. [Notes] [Slides] [Demo Notebook] [Demo HTML]

Problem Set 5 Due.

Problem Set 6/Final Review Released.

Background reading material:

Wednesday, May 6 Lecture 23. Online Learning and Realtime Learning. [Notes] [Slides Notebook] [Slides HTML]

Background reading material:

Monday, May 11 Lecture 24. Machine learning accelerators, and Course Summary. [Notes] [Slides Notebook] [Slides HTML]

Programming Assignment 5 Due.

Background reading material:

Tuesday, May 12 Problem Set 6/Final Review Due.
Sunday, May 17 Take-home final exam (2 day period, starting at 2PM).