You will take a monolithic implementation of an ML pipeline and optimize it to run across 3 nodes 5:39 Everyone (at least those who have been coming to recitation) in cs5416 should already be familiar with the details since I have mentioned it in the last 2 recitations

Shared Lectures (Tuesday/Thursday)

 Recitation

CS4414 Friday

CS5416

Homework: first few weeks are shared homeworks

8/26

pdf

1. Introduction, rules for using AI code generation in this course

8/29

C++ classes, names and scopes

std::vector

Balanced binary tree in C++ via repeated insertion of the median.

8/28

pdf

Word Count Files

2.  Multicore concurrency and how it can conflict with the NUMA memory model (false sharing, lock contention, unfair mutex)

9/3

pdf

3.  How did we settle on C++ for this course? Should we all switch to Rust?

9/6

Copy versus reference, const, RAII

9/5

pdf

4.  Building abstractions that simplify advanced systems designs and development.  The file system is an abstraction.  POSIX file I/O API

9/9

pdf

5.  Linux segments, DLLs, page remap, page protections.

9/12

C++ templates

Extending our binary tree into an approximate nearest neighbor tree, used heavily in RAG LLM and LRM systems.

Leaderboard: most performant solution to the ANN task..

9/11

pdf 

6.  Compile time evaluation: constants, constexpr, templates.  Printf as a C++ template.

9/16

pdf

7. Programming with SIMD parallelism

9/19

Multithreaded programming workshop

9/18

pdf

8.   Control flow abstractions, abnormal control flow, how it can cause C++ memory leakage.  Why std::shared_ptr avoids this issue

9/23

pdf

9.  Abstractions for safe concurrency and thread-to-thread coordination: circular buffers, readers+writers

9/26

Designing FarmVille

Farmville.  This is a simple old-style graphical application in which application threads animate little objects in a scenario involving making cupcakes at a bakery that sells to local students and sources ingredients from a local farm and from a regional wholesaler.  The main focus is on designing and implementing the needed concurrency control to avoid collisions.  Leaderboard: solutions with the best scalability of overheads as a function of the number of concurrent threads. 

9/25

pdf

10.  Deadlocks, livelocks: four conditions, ordered locking

Prelim 1:  Evening exam, 9/25. OLH155, OLH255.  Designed as a 75m exam, but we will have the rooms for at least twice that long.  SDS Accommodations: Handled at SDS testing center.

9/30

pdf

11.  Accessing collections, CCL primitives

10/3

Debugging tools: gdb, Valgrind, gprof

10/2

pdf

12.  Theoretical models for distributed computing, the concept of distributed consistency.

10/7

pdf 

13. How inconsistency in a fault-tolerant multicast ended a project to redesign the US air traffic control system.

10/10

Compiler+instruction set+architecture ecosystem

10/9

pdf

14. Can MLs benefit from fault-tolerance consistency models without being recoded from scratch?

 

Fall break: Oct 11-14

10/16

pdf

15. Understanding (and debugging) performance.

10/17

4414: single process optimization track

5416: multiprocess+GPU

4414: single process homework on csug lab

5416: distributed homework on MEng lab + fractus

10/21

pdf

16. Client connectivity to the cloud.  VPNs and VPCs.

10/24

TBD

TBD

 

 

 

10/23

pdf

17.  Facebook CDN and caching

10/28

pdf

18.  Facebook's social network graph, TAO

10/31

 

10/30

pdf

19.   Cloud Microservice Frameworks

11/4

pdf

20. Availability zones and data  redundancy support

11/7

 

11/6

pdf

21. Apache technologies - I



CS4414

You will design a RAG system from scratch, encoding documents, indexing them with FAISS, and retrieveing top-K context for queries. Then you’ll use llama.cpp to generate a text reply to each query.  The project will conclude with a benchmarking study to identify bottlenecks and assess design choices.

Leaderboard: single machine performance.

 



CS5416

 

Understanding and optimizing performance in PreFMLR: An ML for retrieving documents relevant to text queries at high speeds. This project has been discussed in recitations a few times.

 Leaderboard: distributed performance.

 

11/11

pdf

22. Apache technologies - II

11/14

 

 

11/13

pdf

23. Spark RDD concept, compiling to MapReduce for Big Data analytics

11/18

pdf

24. Vector databases: Approximate document retrieval for RAG MLs

11/26

 

 

11/20

pdf

25. GPU accelerators

11/25

pdf

26. Performance lessons for ML systems

 

 

 

11/26-11/31 Thanksgiving break

 

12/2

pdf

27. More details on RDMA

Lectures 27 and 28 are not included on prelim2.  There will be no Friday recitation this week, but we will have coding help sessions for HW3.
Prelim 2:  Evening exam, 12/2. OLH155, OLH255.  SDS accommodated students will take this exam at the same time, but at the ATP center operated by SDS.

12/5

pdf

28. The Rocky path to RoCE deployment at Microsoft.

     

We have no final exam.  Final projects must be submitted no later than midnight 12/10.  If you submit by 12/5, we will post a letter grade by 12/13.  If you delay until 12/10 midnight, we will be short on grading staff and you should anticipate some risk that we would miss the deadline for posting our letter grade -- your grade would show as an INC and would later be fixed to reflect the actual grade once we manage to get things graded.  Factor this in if you are on a visa or a student loan that views INC as some kind of serious problem.  Basically, we are urging that you think of 12/5 as the deadline even though there is no penalty for handing it in later, because of this issue of not having enough grading staff after final exams period starts (our graders have final projects of their own, and exams, and then they leave to go home)!