Syllabus for CS4787/5777

Principles of Large-Scale Machine Learning — Fall 2022

Term Fall 2022 Instructor Christopher De Sa
Course website www.cs.cornell.edu/courses/cs4787/2022fa/ E-mail [email hidden]
Schedule MW 7:30-8:45PM Office hours Wednesdays 2PM
Room Kimball Hall B11 Office Gates 426
Term Fall 2022
Instructor Christopher De Sa
Course website www.cs.cornell.edu/courses/cs4787/2022fa/
E-mail [email hidden]
Schedule MW 7:30-8:45PM
Office hours Wednesdays 2PM
Room Kimball Hall B11
Office Gates 426

Description: CS4787 explores the principles behind scalable machine learning systems. The course will cover the algorithmic and the implementation principles that power the current generation of machine learning on big data. We will cover training and inference for both traditional ML algorithms such as linear and logistic regression, as well as deep models. Topics will include: estimating statistics of data quickly with subsampling, stochastic gradient descent and other scalable optimization methods, mini-batch training, accelerated methods, adaptive learning rates, methods for scalable deep learning, hyperparameter optimization, parallel and distributed training, and quantization and model compression.

Prerequisites: CS4780 or equivalent, CS 2110 or equivalent

Format: Lectures during the scheduled lecture period will cover the course content. Problem sets will be used to encourage familiarity with the content and develop competence with the more mathematical aspects of the course. Programming assignments will help build intuition and familiarity with how machine learning algorithms run. There will be one midterm exam and one final exam, each of which will test both theoretical knowledge and programmming implementation of concepts.

Material: The course is based on books, papers, and other texts in machine learning, scalable optimization, and systems. Texts will be provided ahead of time on the website on a per-lecture basis. You aren't expected to necessarily read the texts, but they will provide useful background for the material we are discussing.

Grading: Students taking CS4787 will be evaluated on the following basis.

20% Problem sets
40% Programming assignments
15% Prelim Exam
25% Final Exam

CS5777 has an additional paper-reading component, and students taking CS5777 will be evaluated as follows.

15% Problem sets
35% Programming assignments
10% Paper reading
15% Prelim Exam
25% Final Exam

Inclusiveness: You should expect and demand to be treated by your classmates and the course staff with respect. You belong here, and we are here to help you learn—and enjoy—this course. If any incident occurs that challenges this commitment to a supportive and inclusive environment, please let the instructor know so that we can address the issue. We are personally committed to this, and subscribe to the Computer Science Department's Values of Inclusion.

TA Office Hours

Link to Google calendar.

Course calendar may be subject to change.

Course Calendar Plan

Monday, August 22
Aug
21
Aug
22
Aug
23
Aug
24
Aug
25
Aug
26
Aug
27
Monday, August 22
Lecture 1. Introduction and course overview. [Notes PDF]

Problem Set 1 Released. [Notebook] [HTML]
Wednesday, August 24
Aug
21
Aug
22
Aug
23
Aug
24
Aug
25
Aug
26
Aug
27
Wednesday, August 24
Lecture 2. Linear algebra done efficiently: Mapping mathematics to numpy. ML via efficient kernels linked together in python. [Notebook] [HTML]

Background reading material:
Monday, August 29
Aug
28
Aug
29
Aug
30
Aug
31
Sep
1
Sep
2
Sep
3
Monday, August 29
Hybrid Zoom/In Person — Lecture 3. Software for learning with gradients. Numerical differentiation, symbolic differentiation, and automatic differentiation. [Notebook] [HTML]
Wednesday, August 31
Aug
28
Aug
29
Aug
30
Aug
31
Sep
1
Sep
2
Sep
3
Wednesday, August 31
Hybrid Zoom/In Person — Lecture 4. Efficient gradients with backpropagation. [Notebook] [HTML]

Background reading material:

Programming Assignment 1 Released. [Instructions] [Starter Code]

Paper Reading 1 Released. [Instructions]
Monday, September 5
Sep
4
Sep
5
Sep
6
Sep
7
Sep
8
Sep
9
Sep
10
Monday, September 5
Labor Day. No Lecture.
Wednesday, September 7
Sep
4
Sep
5
Sep
6
Sep
7
Sep
8
Sep
9
Sep
10
Wednesday, September 7
Lecture 5. Machine learning frameworks. [Notebook] [HTML]

Background reading material:

Problem Set 1 Due.

Problem Set 2 Released. [PDF]
Monday, September 12
Sep
11
Sep
12
Sep
13
Sep
14
Sep
15
Sep
16
Sep
17
Monday, September 12
Lecture 6. Scaling to complex models by learning with optimization algorithms. Learning in the underparameterized regime. Gradient descent, convex optimization and conditioning. [Notebook] [HTML] [Notes PDF]

Background reading material:
Wednesday, September 14
Sep
11
Sep
12
Sep
13
Sep
14
Sep
15
Sep
16
Sep
17
Wednesday, September 14
Lecture 7. Gradient descent continued. Stochastic gradient descent. [Notebook] [HTML] [Notes PDF]

Background reading material:

Programming Assignment 1 Due.

Paper Reading 1 Due.
Monday, September 19
Sep
18
Sep
19
Sep
20
Sep
21
Sep
22
Sep
23
Sep
24
Monday, September 19
Lecture 8. Stochastic gradient descent continued. Scaling to huge datasets with subsampling. [Notebook] [HTML]

Background reading material:

Programming Assignment 2 Released. [Instructions] [Starter Code]
Wednesday, September 21
Sep
18
Sep
19
Sep
20
Sep
21
Sep
22
Sep
23
Sep
24
Wednesday, September 21
Lecture 9. Adapting algorithms to hardware. Minibatching and the effect of the learning rate. Our first hyperparameters. [Notebook] [HTML] [Demo Notebook] [Demo HTML]

Background reading material:
Monday, September 26
Sep
25
Sep
26
Sep
27
Sep
28
Sep
29
Sep
30
Oct
1
Monday, September 26
Lecture 10. Optimization techniques for efficient ML. Accelerating SGD with momentum. [Notebook] [HTML] [Demo Notebook] [Demo HTML] [Notes PDF]

Background reading material:

Problem Set 2 Due.

Problem Set 3 Released. [PDF]

Paper Reading 2 Released. [Instructions]
Wednesday, September 28
Sep
25
Sep
26
Sep
27
Sep
28
Sep
29
Sep
30
Oct
1
Wednesday, September 28
Lecture 11. Optimization techniques for efficient ML, continued. Accelerating SGD with preconditioning and adaptive learning rates. [Notebook] [HTML] [Notes PDF]

Background reading material:
Monday, October 3
Oct
2
Oct
3
Oct
4
Oct
5
Oct
6
Oct
7
Oct
8
Monday, October 3
Lecture 12. Optimization techniques for efficient ML, continued. Accelerating SGD with variance reduction and averaging. [Notebook] [HTML] [Notes PDF]

Background reading material:

Programming Assignment 2 Due.

Programming Assignment 3 Released. [Instructions] [Starter Code]
Wednesday, October 5
Oct
2
Oct
3
Oct
4
Oct
5
Oct
6
Oct
7
Oct
8
Wednesday, October 5
Lecture 13. Sparsity and dimension reduction. [Notebook] [HTML] [Demo Notebook] [Demo HTML] [Notes PDF]

Background reading material:
Monday, October 10
Oct
9
Oct
10
Oct
11
Oct
12
Oct
13
Oct
14
Oct
15
Monday, October 10
Indigenous Peoples' Day. No Lecture.
Wednesday, October 12
Oct
9
Oct
10
Oct
11
Oct
12
Oct
13
Oct
14
Oct
15
Wednesday, October 12
Lecture 14. Deep neural networks review. The overparameterized regime and how it affects optimization. Matrix multiply as computational core of learning. [Notes PDF]

Background reading material:

Problem Set 3 Due.

Paper Reading 2 Due.

Problem Set 4 Released. [PDF]

Paper Reading 3 Released. [Instructions]
Monday, October 17
Oct
16
Oct
17
Oct
18
Oct
19
Oct
20
Oct
21
Oct
22
Monday, October 17
Lecture 15. Methods to Accelerate DNN training. Early stopping. Batch normalization. [Demo Notebook] [Demo HTML] [Notes PDF]

Background reading material:

Programming Assignment 3 Due.
Wednesday, October 19
Oct
16
Oct
17
Oct
18
Oct
19
Oct
20
Oct
21
Oct
22
Wednesday, October 19
Lecture 16. Beyond supervised learning. Semi-supervised learning. Transfer learning. Self-supervised learning. [Notes PDF]

Background reading material:

Programming Assignment 4 Released. [Instructions] [Starter Code]
Monday, October 24
Oct
23
Oct
24
Oct
25
Oct
26
Oct
27
Oct
28
Oct
29
Monday, October 24
Lecture 17. Foundation models. Attention. Transformers. [Notes PDF]

Problem Set 4 Due.
Wednesday, October 26
Oct
23
Oct
24
Oct
25
Oct
26
Oct
27
Oct
28
Oct
29
Wednesday, October 26
Lecture 18. Kernels and kernel feature extraction. [Notebook] [HTML] [Notes PDF]

Background reading material:
Thursday, October 27
Oct
23
Oct
24
Oct
25
Oct
26
Oct
27
Oct
28
Oct
29
Thursday, October 27
Prelim Exam. 7:30PM, OLH155, OLH165.
Monday, October 31
Oct
30
Oct
31
Nov
1
Nov
2
Nov
3
Nov
4
Nov
5
Monday, October 31
Lecture 19. Kernels continued, and Hyperparameter Optimization Recap. [Notebook] [HTML] [Notes PDF]

Background reading material:

Paper Reading 3 Due.
Wednesday, November 2
Oct
30
Oct
31
Nov
1
Nov
2
Nov
3
Nov
4
Nov
5
Wednesday, November 2
Lecture 20. Hyperparameter Optimization Recap. (Lecture Spill Over From Monday; Same Notes/Slides as Monday)

Background reading material:

Programming Assignment 4 Due.

Problem Set 5 Released. [PDF]

Paper Reading 4 Released. [Instructions]
Monday, November 7
Nov
6
Nov
7
Nov
8
Nov
9
Nov
10
Nov
11
Nov
12
Monday, November 7
Lecture 21. Gaussian Processes and Bayesian Optimization. [Notes PDF]

Background reading material:

Programming Assignment 5 Released. [Instructions] [Starter Code]
Wednesday, November 9
Nov
6
Nov
7
Nov
8
Nov
9
Nov
10
Nov
11
Nov
12
Wednesday, November 9

Background reading material:
  • Good resource on parallel programming, particularly on GPUs: Chapter 1 of Programming Massively Parallel Processors: A Hands-On Approach, Second Edition (by David B. Kirk and Wen-mei W. Hwu). This book is available on the Cornell library.
  • Classical work providing background on parallelism in computer architecture: Chapters 3, 4, and 5 of Computer Architecture: A Quantitative Approach. This book is available on the Cornell library.
Monday, November 14
Nov
13
Nov
14
Nov
15
Nov
16
Nov
17
Nov
18
Nov
19
Monday, November 14
Lecture 23. Memory locality and memory bandwidth. [Notebook] [HTML] [Notes PDF]

Problem Set 6 Released. [PDF]
Wednesday, November 16
Nov
13
Nov
14
Nov
15
Nov
16
Nov
17
Nov
18
Nov
19
Wednesday, November 16
Lecture 24. Floating-point arithmetic. Quantized, low-precision machine learning. [Notes PDF]

Background reading material:
  • A classic example of a blog post illustrating the use of low-precision arithmetic for deep learning.

Problem Set 5 Due.

Programming Assignment 6 Released. [Instructions] [Starter Code]

Paper Reading 5 Released. [Instructions]
Monday, November 21
Nov
20
Nov
21
Nov
22
Nov
23
Nov
24
Nov
25
Nov
26
Monday, November 21
Lecture 25. Distributed learning and the parameter server. [Notes PDF]

Background reading material:

Paper Reading 4 Due.

Programming Assignment 5 Due.
Wednesday, November 23
Nov
20
Nov
21
Nov
22
Nov
23
Nov
24
Nov
25
Nov
26
Wednesday, November 23
Thanksgiving Break. No Lecture.
Monday, November 28
Nov
27
Nov
28
Nov
29
Nov
30
Dec
1
Dec
2
Dec
3
Monday, November 28
Lecture 26. Machine learning on GPUs. ML Accelerators. [Notes PDF]

Background reading material:
  • Parallel programming on GPUs: Chapters 2-5 of Programming Massively Parallel Processors: A Hands-On Approach, Second Edition (by David B. Kirk and Wen-mei W. Hwu). This book is available on the Cornell library.
  • The original TPU paper In-datacenter performance analysis of a tensor processing unit ISCA, 2017.
Wednesday, November 30
Nov
27
Nov
28
Nov
29
Nov
30
Dec
1
Dec
2
Dec
3
Wednesday, November 30
Lecture 27. Deployment and low-latency inference. Real-time learning. Deep neural network compression and pruning. [Notes PDF]

Background reading material:

Problem Set 6 Due.
Monday, December 5
Dec
4
Dec
5
Dec
6
Dec
7
Dec
8
Dec
9
Dec
10
Monday, December 5
Lecture 28. Recap. Online learning. Scaling: the future of machine learning? [Notes PDF]

Programming Assignment 6 Due.

Paper Reading 5 Due.