| Term | Spring 2026 | Instructor | Christopher De Sa |
| Room | Gates 114 | [email hidden] | |
| Schedule | MW 10:10am – 11:25am | Office hours | W 2:00pm – 3:00pm |
| Forum | Ed Discussion | Office | Gates 426 |
So you've taken a machine learning class. You know the models people use to solve their problems. You know the algorithms they use for learning. You know how to evaluate the quality of their solutions.
But when we look at a large-scale machine learning application that is deployed in practice, it's not always exactly what you learned in class. Sure, the basic models, the basic algorithms are all there. But they're modified a bit, in a bunch of different ways, to run faster and more efficiently. And these modifications are really important—they often are what make the system tractable to run on the data it needs to process.
CS6787 is a graduate-level introduction to these system-focused aspects of machine learning, covering guiding principles and commonly used techniques for scaling up learning to large data sets. Informally, we will cover the techniques that lie between a standard machine learning course and an efficient systems implementation: both statistical/optimization techniques based on improving the convergence rate of learning algorithms and techniques that improve performance by leveraging the capabilities of the underlying hardware. Topics will include stochastic gradient descent, acceleration, variance reduction, methods for choosing hyperparameters, parallelization within a chip and across a cluster, popular ML frameworks, and innovations in hardware architectures. An open-ended project in which students apply these techniques is a major part of the course.
Prerequisites: Knowledge of machine learning at the level of CS4780. If you are an undergraduate, you should have taken CS4780 or an equivalent course, since it is a prerequisite. Knowledge of computer systems and hardware on the level of CS 3410 is recommended, but this is not a prerequisite.
Format: About half of the classes will involve traditionally formatted lectures. For the other half of the classes, we will read and discuss two seminal papers relevant to the course topic. These classes will involve presentations by groups of students of the paper contents (each student will sign up in a group to present one paper for 15-20 minutes) followed by breakout discussions about the material. Historically, the lectures have occurred on Mondays and the discussions have occurred on Wednesdays, but due to the non-standard timeline this semester, these course elements will be scheduled irregularly (see schedule below).
Grading: Students will be evaluated on the following basis.
| 20% | Paper presentation |
| 10% | Discussion participation |
| 20% | Paper reviews |
| 50% | Final project |
Paper review parameters: Paper reviews should be about one page (single-spaced) in length. The review guidelines should mirror what an actual conference review would look like (although you needn't assign scores or anything like that). In particular you should at least: (1) summarize the paper, (2) discuss the paper's strengths and weaknesses, and (3) discuss the paper's impact. For reference, you can read the ICML reviewer guidelines. Of course, your review will not be precisely like a real review, in large part because we already know the impact of these papers. You can submit any review up to two days late with no penalty. Students who presented a paper do not have to submit a review of that paper (although you can if you want).
Final project parameters (subject to change): The final project can be done in groups of up to three (although more work will be expected from groups with more people). The subject of the project is open-ended, but it must include:
| Wednesday, January 21 In Person Jan 18Jan 19Jan 20Jan 21Jan 22Jan 23Jan 24 | Lecture #1: Overview (No Office Hours Today). [Slides]
|
| Monday, January 26 In Person Jan 25Jan 26Jan 27Jan 28Jan 29Jan 30Jan 31 | Lecture #2: Backpropagation & ML Frameworks.
Presentation signup: due Monday. (Survey link) |
| Wednesday, January 28 In Person Jan 25Jan 26Jan 27Jan 28Jan 29Jan 30Jan 31 | Lecture #3: Hyperparameters and Tradeoffs.
|
| Monday, February 2 In Person Feb 1Feb 2Feb 3Feb 4Feb 5Feb 6Feb 7 | Paper Discussion 1a. Attention is all you need Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. In Advances in neural information processing systems (NeurIPS), 2017. Paper Discussion 1b. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. Sergey Ioffe, Christian Szegedy. Proceedings of the International Conference on Machine Learning (ICML), 2015. |
| Wednesday, February 4 In Person Feb 1Feb 2Feb 3Feb 4Feb 5Feb 6Feb 7 | Lecture #4: Kernels and Dimensionality Reduction. [Slides]
|
| Monday, February 9 In Person Feb 8Feb 9Feb 10Feb 11Feb 12Feb 13Feb 14 | Paper Discussion 2a. Efficient Memory Management for Large Language Model Serving with PagedAttention. Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph E. Gonzalez, Hao Zhang, Ion Stoica. SOSP '23: Proceedings of the 29th Symposium on Operating Systems Principles, 2023. Paper Discussion 2b. Language models are few-shot learners. Tom Brown In Advances in neural information processing systems (NeurIPS), 2020. Due: Review of paper 1a or 1b. |
| Wednesday, February 11 In Person Feb 8Feb 9Feb 10Feb 11Feb 12Feb 13Feb 14 | Lecture #5: Adaptive Methods & Non-Convex Optimization.
|
| Monday, February 16 | February Break: No classes. |
| Wednesday, February 18 In Person Feb 15Feb 16Feb 17Feb 18Feb 19Feb 20Feb 21 | Paper Discussion 3a. Random features for large-scale kernel machines. Ali Rahimi and Benjamin Recht. In Advances in Neural Information Processing Systems (NeurIPS), 2007. Paper Discussion 3b. Transformers are RNNs: fast autoregressive transformers with linear attention. Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, François Fleuret. Proceedings of the International Conference on Machine Learning (ICML), 2020. Due: Review of paper 2a or 2b. |
| Monday, February 23 In Person Feb 22Feb 23Feb 24Feb 25Feb 26Feb 27Feb 28 | Lecture #6: Hyperparameter Optimization.
|
| Wednesday, February 25 In Person Feb 22Feb 23Feb 24Feb 25Feb 26Feb 27Feb 28 | Paper Discussion 4a. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. Albert Gu, Tri Dao CoLM, 2024. Paper Discussion 4b. Adam: A method for stochastic optimization. Diederik Kingma and Jimmy Ba. Proceedings of the International Conference on Learning Representations (ICLR), 2015. Due: Review of paper 3a or 3b. |
| Monday, March 2 In Person Mar 1Mar 2Mar 3Mar 4Mar 5Mar 6Mar 7 | Paper Discussion 5a. Scaling laws for neural language models. Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. arXiv preprint arXiv:2001.08361, 2020. Paper Discussion 5b. Training Compute-Optimal Large Language Models. Jordan Hoffmann et al. 36th Conference on Neural Information Processing Systems (NeurIPS), 2022. |
| Wednesday, March 4 In Person Mar 1Mar 2Mar 3Mar 4Mar 5Mar 6Mar 7 | Lecture #7: Parallelism.
Due: Review of paper 4a or 4b. |
| Monday, March 9 In Person Mar 8Mar 9Mar 10Mar 11Mar 12Mar 13Mar 14 | Paper Discussion 6a. Map-reduce for machine learning on multicore. Cheng-Tao Chu, Sang K Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun In Advances in Neural Information Processing Systems (NeurIPS), 2007. Paper Discussion 6b. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. Feng Niu, Benjamin Recht, Christopher Re, and Stephen Wright. In Advances in Neural Information Processing Systems (NeurIPS), 2011. |
| Wednesday, March 11 In Person Mar 8Mar 9Mar 10Mar 11Mar 12Mar 13Mar 14 | Lecture #8: Distributed Learning.
Due: Review of paper 5a or 5b. |
| Monday, March 16 In Person Mar 15Mar 16Mar 17Mar 18Mar 19Mar 20Mar 21 | Paper Discussion 7a. Flashattention: Fast and memory-efficient exact attention with io-awareness. Tri Dao, Dan Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. In Advances in Neural Information Processing Systems (NeurIPS), 2022. Paper Discussion 7b. A System for Massively Parallel Hyperparameter Tuning. Liam Li Proceedings of the 2nd Conference on Machine Learning and Systems (MLSys), 2020. |
| Wednesday, March 18 In Person Mar 15Mar 16Mar 17Mar 18Mar 19Mar 20Mar 21 | Lecture #9: Low-Precision Arithmetic.
Due: Review of paper 6a or 6b. In-class project feedback activity. |
| Monday, March 23 In Person Mar 22Mar 23Mar 24Mar 25Mar 26Mar 27Mar 28 | Paper Discussion 8a. Large scale distributed deep networks. Jeff Dean In Advances in Neural Information Processing Systems (NeurIPS), 2012. Paper Discussion 8b. Towards federated learning at scale: System design. Keith Bonawitz In Proceedings of the 2nd MLSys Conference (MLSys), 2019. Due: Final project proposals. |
| Wednesday, March 25 In Person Mar 22Mar 23Mar 24Mar 25Mar 26Mar 27Mar 28 | Lecture #10: Inference and Compression.
Due: Review of paper 7a or 7b. |
| Monday, March 30 | Spring Break: No classes. |
| Wednesday, April 1 | Spring Break: No classes. |
| Monday, April 6 In Person Apr 5Apr 6Apr 7Apr 8Apr 9Apr 10Apr 11 | Paper Discussion 9a. Gpipe: Efficient training of giant neural networks using pipeline parallelism. Yanping Huang, Youlong Cheng, Ankur Bapna, Orhan Firat, Dehao Chen, Mia Chen, HyoukJoong Lee, Jiquan Ngiam, Quoc V. Le, and Yonghui Wu. In Advances in Neural Information Processing Systems (NeurIPS), 2019. Paper Discussion 9b. Efficiently scaling transformer inference. Reiner Pope, Sholto Douglas, Aakanksha Chowdhery, Jacob Devlin, James Bradbury, Jonathan Heek, Kefan Xiao, Shivani Agrawal, and Jeff Dean. In Proceedings of Machine Learning and Systems (MLSys), 2023. |
| Wednesday, April 8 In Person Apr 5Apr 6Apr 7Apr 8Apr 9Apr 10Apr 11 | Lecture #11: Machine Learning Frameworks II.
Due: Review of paper 8a or 8b. |
| Monday, April 13 In Person Apr 12Apr 13Apr 14Apr 15Apr 16Apr 17Apr 18 | Paper Discussion 10a. Deep learning with limited numerical precision. Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. Proceedings of the International Conference on Machine Learning (ICML), 2015. Paper Discussion 10b. LoRA: Low-Rank Adaptation of Large Language Models. Edward J. Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Proceedings of the International Conference on Learning Representations (ICLR), 2021. |
| Wednesday, April 15 In Person Apr 12Apr 13Apr 14Apr 15Apr 16Apr 17Apr 18 | Lecture #12: Hardware for Machine Learning.
Due: Review of paper 9a or 9b. |
| Monday, April 20 In Person Apr 19Apr 20Apr 21Apr 22Apr 23Apr 24Apr 25 | Paper Discussion 11a. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. Song Han, Huizi Mao, and William J Dally. Proceedings of the International Conference on Learning Representations (ICLR), 2016. Paper Discussion 11b. GPTQ: Accurate post-training quantization for generative pre-trained transformers. Frantar, Elias, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Proceedings of the International Conference on Learning Representations (ICLR), 2023. |
| Wednesday, April 22 In Person Apr 19Apr 20Apr 21Apr 22Apr 23Apr 24Apr 25 | Lecture #13: Modern Generative AI.
Due: Review of paper 10a or 10b. |
| Monday, April 27 In Person Apr 26Apr 27Apr 28Apr 29Apr 30May 1May 2 | Paper Discussion 12a. In-datacenter performance analysis of a tensor processing unit. Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), 2017. Paper Discussion 12b. A Configurable Cloud-Scale DNN Processor for Real-Time AI. Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengills, et al. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA), 2018. Due: Final project abstract draft. Can be submitted late until Tuesday evening; will discuss in class on Wednesday. |
| Wednesday, April 29 In Person Apr 26Apr 27Apr 28Apr 29Apr 30May 1May 2 | Lecture #14: Large Scale ML on the Cloud.
Due: Review of paper 11a or 11b. Abstract discussion. |
| Monday, May 4 In Person May 3May 4May 5May 6May 7May 8May 9 | Lecture #15: Final Project Disussion. |