|
CS4414: Systems Programming (Fall 2025)
Systems programming aims at students who have taken CS3410 and learned C, and are also proficient in an object-oriented programming language (which could be C++ but could also be Java or Python). The course offers a deeper understanding of how "real world" computing systems are built. In CS4414 you will be working in C++ (which we will teach you, but rather quickly), and then will use C++ on the Linux system as a way to create high quality software. This pair of technologies has gained nearly universal adoption at every level, from small devices hidden within "smart things" to the world's largest cloud computing systems. As such, the skills you gain in CS4414 are very broadly relevant, no matter what you plan to do in your career. You will also get a close look at some important cloud computing concepts that are widely important, and especially so for training or deploying machine learning applications. We expect three groups of students. The majority of students will not have taken CS4410, but we do expect that some will have already taken CS4410 as well as CS3410 and will be seeking additional perspective on the creation of large, complex, software systems. The third will be MEng students sharing T/R lectures with CS4414, but following a different path on Friday lectures and on the large project doing during the last 4 weeks of the course. Although everyone is expected to have mastered 3410 (including topics added in Fall 2024 such as memory, data representations, C programming with pointers to memory allocated using malloc/free, threads and synchronization), we recognize that students will have varied backgrounds, and especially for those who are less confident in their programming skills. But this is normal, and you gain confidence by doing things -- like learning to drive a car. You learn to drive and are nervous at first, but get good at it with practice. In our context, this comes down to becoming a confident programmer with a good level of understanding of computing environments and tools, and the ability to pick things up with relatively little direct guidance. We will help you get going, but systems programming is different from application development because we work with larger pieces of software and sometimes with code we didn't actually write ourselves, and yet that depends on some layer that we might even want to reimplement. It requires a new mindset -- but one you'll find is standard in the rest of your career. Indeed, the majority of CS4414 students won't become systems programmers, but they are very likely to still use ideas they learn in the course in settings like exploring embedded device programming in robotics, digital agriculture, Internet of Things, or other settings where sensors and actuators are deployed. None of these are topics we will cover in CS4414 itself, yet the skills you gain in our course translate directly to those settings. We will also be learning a lot about how ML technogies use systems and even how to diagnose performance issues and how to fix them in ML code you didn't write and might not even fully understand. In the modern world, this is a fantastically useful skill to acquire and it will help you get job offers and raises! Among our practical goals will to learn to leverage existing Linux tools, to learn how to write correct code in C++, and how to achieve performance and efficiency. Like any programming language, you really teach yourself by doing, but we will present C++ and Linux in the required section. Assigned readings and homeworks will help you build up hands-on proficiency. C++ and Linux are easy to learn if you are comfortable in some other object oriented programming language like Java, so we will move quickly (this is not a course for people who struggle with programming or who have never seen object-oriented code and learned about data structures). You'll also be reading a famous C++ self-teaching textbook, written by the inventor of the language. This will begin early in the semester, so be ready to work hard in the first few weeks! On the ML side, we will learn about a common kind of ML (ML used as a service by various applications and "co-pilots"), and where time is spent in those: training them and running them, obviously, but can we do a deeper dive without learning a whole course worth of linear algebra and implementation techniques? We'll see that yes, we can! CS4414 thinks of coding as a skill and a tool, and we do teach you the subtleties of C++ programming, but we assume everyone is already a comfortable programmer (mostly in some other language). The core intellectual material focuses on the way that modern applications are often created by combining two or more programs, which talk to one-another over some mixture of pipes, files, mapped files (shared memory), networking (messages sent over tcp), etc. We will discuss and security abstractions for isolation and authorization, and the best ways of building applications that use these technologies in correct ways. All of these are concepts you'll find valuable in your work, no matter where computing might lead you. Modern computers are based on NUMA processors (chips with multiple CPUs in the single machine), and leveraging NUMA sometimes entails writing programs with multiple threads running in parallel while sharing memory. NUMA computers have a variety of interesting features that include hardware support for parallelism (such as for image processing, computer vision tasks, machine learning), multiple levels of hardware caching (important for performance), and can support many styles of locking and synchronization. We'll focus on monitors, an approach that is highly flexible, nicely supported in C++, and promotes correctness. Late in the semester, we will also look at some distributed systems abstractions that extend the idea of correct synchronization to cover applications spanning more than one computer, specifically state machine replication, leader election and crash-failure tolerance. What about the cloud content? This reflects an evolution since prior offerings of CS4414. In fall 2024 and previously, CS4414 did not require CS3410 and so needed to cover quite a bit of material that we are now starting to assume you learned in CS3410, such as programming in C, using threads and synchronizing them properly, using memory allocation and pointers, and how languages are compiled down to data held in memory and instructions that the computer executes. Because CS4414 won't need to repeat this content, we can move faster than in prior offerings, freeing about six lecture slots, which will pull material from CS5412, a cloud computing course that Cornell no longer offers. But the cloud content will be exactly the cloud ML aspects mentioned above. CS4414 won't become a general purpose cloud course... but we will learn all about cloud-hosted ML services. At the start we said that we expect three groups of students. How will the CS5416 students share CS4414 with us and yet do different projects? The shot answer is that they will be sharing two lectures each week with us, but where our third lecture does a lot more on performance of a single but complicated program, CS5416 students will be learning more about performance of programs spread over a set of cloud computing servers. The specifc cloud-ML scenarios we will study on are ones involving machine learning systems that can answer questions about images: so-called multimodal RAG LLM pipelines. In fact we plan to have a CS4414 project that will look at performance for such a pipeline, so you will get some experience, but you'll work with it as a single process running on a single machine. The CS5416 students in the room will be looking at different version that is split into components, each running in a separate process and perhaps on a separate machine. But the details are still being worked out, so the story could evolve between now (this was written in April) and when the course starts (in August). Prerequisites: CS3410 is required. Appropriate use of AI tools: Please read our Academic Integrity policy. Recitation: Required, we teach a lot of the hands-on topics in the recitation CS Major: CS4414 is one of the courses that can be used to satisfy the systems-area course requirement. It does not have an associated practicum, however, so it does not satisfy the practicum requirement. Grading: We compute a weighted curve (a ranking) based on your scores, and assign letter grades based on this ranking. Grades generally range from B- to A, with half or more of the students receiving A or A- grades. At the instructor's discretion a few A+ grades are also awarded, but these are not purely based on the score ranking (attendance and participation are also considered, and prelim performance is given a slightly higher weighting). Getting Help
Course Materials
Course Information
Main Instructor
Recitation Instructors
Undergraduate TAs
|