Term  Fall 2021  Instructor  Christopher De Sa 
Room  Phillips Hall 219  [email hidden]  
Schedule  MW 7:30pm – 8:45pm  Office hours  W 2:00pm – 3:00pm 
Forum  Ed Discussion  Office  Gates 450 
So you've taken a machine learning class. You know the models people use to solve their problems. You know the algorithms they use for learning. You know how to evaluate the quality of their solutions.
But when we look at a largescale machine learning application that is deployed in practice, it's not always exactly what you learned in class. Sure, the basic models, the basic algorithms are all there. But they're modified a bit, in a bunch of different ways, to run faster and more efficiently. And these modifications are really important—they often are what make the system tractable to run on the data it needs to process.
CS6787 is a graduatelevel introduction to these systemfocused aspects of machine learning, covering guiding principles and commonly used techniques for scaling up learning to large data sets. Informally, we will cover the techniques that lie between a standard machine learning course and an efficient systems implementation: both statistical/optimization techniques based on improving the convergence rate of learning algorithms and techniques that improve performance by leveraging the capabilities of the underlying hardware. Topics will include stochastic gradient descent, acceleration, variance reduction, methods for choosing hyperparameters, parallelization within a chip and across a cluster, popular ML frameworks, and innovations in hardware architectures. An openended project in which students apply these techniques is a major part of the course.
Prerequisites: Knowledge of machine learning at the level of CS4780. If you are an undergraduate, you should have taken CS4780 or an equivalent course, since it is a prerequisite. Knowledge of computer systems and hardware on the level of CS 3410 is recommended, but this is not a prerequisite.
Format: About half of the classes will involve traditionally formatted lectures. For the other half of the classes, we will read and discuss two seminal papers relevant to the course topic. These classes will involve presentations by groups of students of the paper contents (each student will sign up in a group to present one paper for 1520 minutes) followed by breakout discussions about the material. Historically, the lectures have occurred on Mondays and the discussions have occurred on Wednesdays, but due to the nonstandard timeline this semester, these course elements will be scheduled irregularly (see schedule below).
Grading: Students will be evaluated on the following basis.
20%  Paper presentation 
10%  Discussion participation 
20%  Paper reviews 
10%  Programming assignments 
40%  Final project 
Paper review parameters: Paper reviews should be about one page (singlespaced) in length. The review guidelines should mirror what an actual conference review would look like (although you needn't assign scores or anything like that). In particular you should at least: (1) summarize the paper, (2) discuss the paper's strengths and weaknesses, and (3) discuss the paper's impact. For reference, you can read the ICML reviewer guidelines. Of course, your review will not be precisely like a real review, in large part because we already know the impact of these papers. You can submit any review up to two days late with no penalty. Students who presented a paper do not have to submit a review of that paper (although you can if you want).
Final project parameters (subject to change): The final project can be done in groups of up to three (although more work will be expected from groups with more people). The subject of the project is openended, but it must include:
Monday, August 30 In Person Aug 29Aug 30Aug 31Sep 1Sep 2Sep 3Sep 4  Lecture #1: Overview. [Slides] [Demo Notebook] [Demo HTML]

Wednesday, September 1 In Person Aug 29Aug 30Aug 31Sep 1Sep 2Sep 3Sep 4  Lecture #2: Backpropagation & ML Frameworks. [Slides] [Demo Notebook] [Demo HTML]
Presentation signup: due Friday. (Survey link) 
Monday, September 6  Labor Day: No classes. 
Wednesday, September 8 In Person Sep 5Sep 6Sep 7Sep 8Sep 9Sep 10Sep 11  Lecture #3: Hyperparameters and Tradeoffs. [Slides] [Demo Notebook] [Demo HTML]

Monday, September 13 In Person Sep 12Sep 13Sep 14Sep 15Sep 16Sep 17Sep 18  Paper Discussion 1a. On the importance of initialization and momentum in deep learning. Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. Proceedings of the International Conference on Machine Learning (ICML), 2013. Paper Discussion 1b. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Sergey Ioffe, Christian Szegedy. Proceedings of the International Conference on Machine Learning (ICML), 2015. 
Wednesday, September 15 In Person Sep 12Sep 13Sep 14Sep 15Sep 16Sep 17Sep 18  Lecture #4: Kernels and Dimensionality Reduction. [Slides] [Demo Notebook] [Demo HTML]

Monday, September 20 In Person Sep 19Sep 20Sep 21Sep 22Sep 23Sep 24Sep 25  Paper Discussion 2a. Random features for largescale kernel machines. Ali Rahimi and Benjamin Recht. In Advances in Neural Information Processing Systems (NeurIPS), 2007. Paper Discussion 2b. Feature Hashing for Large Scale Multitask Learning. Kilian Weinberger, Anirban Dasgupta, Josh Attenberg, John Langford and Alex Smola. Proceedings of the International Conference on Machine Learning (ICML), 2009. Due: Review of paper 1a or 1b. 
Wednesday, September 22 In Person Sep 19Sep 20Sep 21Sep 22Sep 23Sep 24Sep 25  Lecture #5: Online Learning and Variance Reduction. [Slides] [Demo Notebook] [Demo HTML]
Released: Programming Assignment 2. 
Monday, September 27 In Person Sep 26Sep 27Sep 28Sep 29Sep 30Oct 1Oct 2  Paper Discussion 3a. Identifying Suspicious URLs: An Application of LargeScale Online Learning. Justin Ma, Lawrence K. Saul, Stefan Savage and Geoffrey M. Voelker. Proceedings of the International Conference on Machine Learning (ICML), 2009. Paper Discussion 3b. Accelerating stochastic gradient descent using predictive variance reduction. Rie Johnson and Tong Zhang. In Advances in Neural Information Processing Systems (NeurIPS), 2013. Due: Review of paper 2a or 2b. 
Wednesday, September 29 In Person Sep 26Sep 27Sep 28Sep 29Sep 30Oct 1Oct 2  Lecture #6: Hyperparameter Optimization. [Slides] [Demo Notebook] [Demo HTML]

Monday, October 4 In Person Oct 3Oct 4Oct 5Oct 6Oct 7Oct 8Oct 9  Paper Discussion 4a. Random search for hyperparameter optimization. James Bergstra and Yoshua Bengio. Journal of Machine Learning Research (JMLR), 2012. Paper Discussion 4b. A System for Massively Parallel Hyperparameter Tuning. Liam Li et al. Proceedings of the 2nd Conference on Machine Learning and Systems, 2020. Due: Review of paper 3a or 3b. 
Wednesday, October 6 In Person Oct 3Oct 4Oct 5Oct 6Oct 7Oct 8Oct 9  Lecture #7: Adaptive Methods & NonConvex Optimization. [Slides] [Demo Notebook] [Demo HTML]

Monday, October 11  Indigenous Peoples' Day: No classes. 
Wednesday, October 13 In Person Oct 10Oct 11Oct 12Oct 13Oct 14Oct 15Oct 16  Paper Discussion 5a. The Marginal Value of Adaptive Gradient Methods in Machine Learning. Ashia C Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro and Benjamin Recht. In Advances in Neural Information Processing Systems (NeurIPS), 2017. Paper Discussion 5b. Adam: A method for stochastic optimization. Diederik Kingma and Jimmy Ba. Proceedings of the International Conference on Learning Representations (ICLR), 2015. Due: Review of paper 4a or 4b. 
Monday, October 18 In Person Oct 17Oct 18Oct 19Oct 20Oct 21Oct 22Oct 23  Lecture #8: Parallelism. [Slides] [Demo Notebook] [Demo HTML]
Inclass project feedback activity. 
Wednesday, October 20 In Person Oct 17Oct 18Oct 19Oct 20Oct 21Oct 22Oct 23  Paper Discussion 6a. Mapreduce for machine learning on multicore. ChengTao Chu, Sang K Kim, YiAn Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun In Advances in Neural Information Processing Systems (NeurIPS), 2007. Paper Discussion 6b. Hogwild: A lockfree approach to parallelizing stochastic gradient descent. Feng Niu, Benjamin Recht, Christopher Re, and Stephen Wright. In Advances in Neural Information Processing Systems (NeurIPS), 2011. Due: Review of paper 5a or 5b. 
Monday, October 25 In Person Oct 24Oct 25Oct 26Oct 27Oct 28Oct 29Oct 30  Lecture #9: Distributed Learning. [Slides]
Due: Final project proposals. 
Wednesday, October 27 In Person Oct 24Oct 25Oct 26Oct 27Oct 28Oct 29Oct 30  Paper Discussion 7a. Large scale distributed deep networks. Jeff Dean In Advances in Neural Information Processing Systems (NeurIPS), 2012. Paper Discussion 7b. Towards federated learning at scale: System design. Keith Bonawitz, et al. In Proceedings of the 2nd MLSys Conference (MLSys), 2019. Due: Review of paper 6a or 6b. 
Monday, November 1 In Person Oct 31Nov 1Nov 2Nov 3Nov 4Nov 5Nov 6  Lecture #10: LowPrecision Arithmetic. [Slides] [Demo Notebook] [Demo HTML]

Wednesday, November 3 In Person Oct 31Nov 1Nov 2Nov 3Nov 4Nov 5Nov 6  Paper Discussion 8a. Deep learning with limited numerical precision. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Proceedings of the International Conference on Machine Learning (ICML), 2015. Paper Discussion 8b. BinaryConnect: Training Deep Neural Networks with binary weights during propagations. Matthieu Courbariaux, Yoshua Bengio, and JeanPierre David. In Advances in Neural Information Processing Systems (NeurIPS), 2015. Due: Review of paper 7a or 7b. 
Monday, November 8 In Person Nov 7Nov 8Nov 9Nov 10Nov 11Nov 12Nov 13  Lecture #11: Inference and Compression. [Slides] [Demo Notebook] [Demo HTML]

Wednesday, November 10 In Person Nov 7Nov 8Nov 9Nov 10Nov 11Nov 12Nov 13  Paper Discussion 9a. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Song Han, Huizi Mao, and William J Dally. Proceedings of the International Conference on Learning Representations (ICLR), 2016. Paper Discussion 9b. What is the State of Neural Network Pruning? Davis Blalock, Jose Javier Gonzalez Ortiz, Jonathan Frankle, and John Guttag. Proceedings of the 2nd Conference on Machine Learning and Systems, 2020. Due: Review of paper 8a or 8b. 
Monday, November 15 In Person Nov 14Nov 15Nov 16Nov 17Nov 18Nov 19Nov 20  Lecture #12: Machine Learning Frameworks II.

Wednesday, November 17 In Person Nov 14Nov 15Nov 16Nov 17Nov 18Nov 19Nov 20  Paper Discussion 10a. TensorFlow.js: Machine Learning for the Web and Beyond. Daniel Smilkov et al. Proceedings of the 2nd Conference on Machine Learning and Systems, 2019. Paper Discussion 10b. PyTorch: An Imperative Style, HighPerformance Deep Learning Library. Adam Paszke et al. 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), 2019. Due: Review of paper 9a or 9b. 
Monday, November 22 In Person Nov 21Nov 22Nov 23Nov 24Nov 25Nov 26Nov 27  Lecture #13: Hardware for Machine Learning.

Wednesday, November 24  Thanksgiving Break: No classes. 
Monday, November 29 In Person Nov 28Nov 29Nov 30Dec 1Dec 2Dec 3Dec 4  Paper Discussion 11a. Indatacenter performance analysis of a tensor processing unit. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 2017. Paper Discussion 11b. A Configurable CloudScale DNN Processor for RealTime AI. Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengills, et al. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA), 2018. Due: Review of paper 10a or 10b. Due: Final project abstract. Can be submitted late until Sunday; will discuss in class on Monday. 
Wednesday, December 1 In Person Nov 28Nov 29Nov 30Dec 1Dec 2Dec 3Dec 4  Lecture #15: Large Scale ML on the Cloud. Abstract discussion. 
Monday, December 6 In Person Dec 5Dec 6Dec 7Dec 8Dec 9Dec 10Dec 11  Lecture #16: Final Project Disussion. Due: Review of paper 11a or 11b. 
Tuesday, December 7  Last day of instruction. No CS6787 lecture.
Due: Final project report. 