CS5412: Topics in Cloud Computing

Gates Hall room G01, 1:00pm-2:15pm on Monday and Wednesday.  Recitation on Wednesday at 7:30pm-8:45pm, same room

We recommend that you attend all lectures in person, but will also post recorded versions, with closed captions, on the syllabus page.  Students who don't attend in person generally do poorly in this course because the exams focus on material covered in class, and it can be hard to learn from a video
(videos are great for catching up or revisiting a topic you found confusing, but people just don't focus well when shown a 75m video twice per week).

Syllabus for Fall 2022

 

Prof. Ken Birman, 435 Gates Hall, x5-9199. 

Ken in Person After class M,W Gates Hall, Room 435

TA Office hours:

 

Yifan Wang

After recitation, hence Wednesday 8:45 - 9:45 PM Gates G01 or a table outside in the hallway

Tiancheng Yuan

Thursdays, 3-4PM Duffield 340

Rahul Sharnappa

(Skills demos) Tuesday 12-1pm Rhodes 590

 

Ed Discussions: Find the 5412 discussion board here

Assignments and graded materials: Find them in CMS.

What is this course about?  Cloud Computing is an overarching term that covers modern computing infrastructures to support the web: browsers and web servers, as well as ways of building mobile clients, scalable web services, and very fast infrastructures for serving up content in geographically distributed systems that might include dozens of data centers and millions of computers. Everything we do is cloud-based or uses cloud solutions these days.  CS5412 teaches you to use one of the main clouds (Azure), while also learning transferable insights about the fundamentals of how these systems work.  That deeper perspective will be equally useful when working with other major cloud platforms.

The cloud is a huge space within which there are many trends.  Two that interest us in CS5412 involve real-time data streams that enter the cloud from sensors or other devices, or from mobile platforms that interact with 5G computing and connectivity hubs.  We will look closely at the "data path" by which one connects a sensor to a cloud computing account, uploads data as new events occur, or continuously, then processes the data using various kinds of tools.  Many data science courses teach you to upload existing data sets and then work with them; our course is more oriented towards event-by-event scenarios, where data needs to processed as it is generated and may have to trigger some form of immediate ("real-time") reaction.  This makes the course rather hands-on, with a fair amount of programming -- mostly in the form of small functions written in Python or other languages using cloud APIs, but sometimes involving larger coding tasks that you would carry out in your favorite programming language. 

When we combine cloud intelligence with IoT and 5G applications, we end up with smart things: smart power grids, smart farms, smart homes, smart cities...  In CS5412 projects, you'll pick a topic from one of these areas (or something related), and will create apps for those kinds of settings.  Every student will do a project (either on their own or one we suggest) involving prototyping some form of smart something.

Do I have the background for this course?   Cloud computing at Cornell is a fairly hands-on task.  Half your grade is based on exams that reflect material covered in class and is generally pretty conceptual in nature, but half comes from doing a big practical project -- real software that you design and implement yourself.  As a result, we need to be sure that students come into this course comfortable with the kinds of programming tasks and skills needed, because our TAs can't teach that background material.

Let me give an example.  Suppose that you decide you want to do a project on a topic like building a cloud-based AI service to intelligently firewall some sort of organization like a hospital.  You've found trace data on the web and decided to use this as a way to simulate a real hospital, and you plan to build smart policies aimed at avoiding accidental insecure transfer of private patient information.

This would be an awesome project topic, by the way, but on the hard side (and I don't know where you would find the trace file!).   Assuming you do find a trace, very suitable for someone looking to also get MEng project credit.

Anyhow, you have these log files with your trace data, copied from somewhere, and they contain 10M records (representing 10M messages seen by some actual firewall).  Obviously, to "simulate" the real application, you'll need to replay this data, or at least some of it.  So one part of your project would be to build a piece of software to read one message at a time from the trace, then send it to the Azure cloud.  You happen to be a Python wizard, and our TAs tell you about something called FLASK, and you end up finding a sample program for uploading data from a Python program in FLASK into the cloud, but it demos a line-by-line scenario.  You change it to send messages in whatever format the trace was using.  The cloud responds by saying "block it" or "let it through" for each message.

You can see that this involves lots of programming: coding this Python program and reading data from the trace and figuring out how a FLASK container works, building one from the sample, modifying it, rebuilding it, testing it... and now you know how to simulate the world of your firewall.

Now, was it important that you did this in Python/FLASK?  Not at all!  It could be in C with gRPC,  C#, C++, or in Java, or Javascript.  Any would be fine.  But the thing is... our CS5412 TAs won't be teaching you any of those things.  We do teach you how it has to work, but we need you to implement this code.  And because there are so many options, our TAs might never have even tried the option you find most natural.  You will need to learn by reading documentation and downloading a demo, trying it out, and then evolving it into your own version.  This last aspect is what makes cloud computing easy for some people but hard for others -- many people come in with awesome skills using PyTorch in Jupyter Notebook, but either never learned to write a stand-alone Python program that accesses files, or never learned to build applications in an IDE like Eclipse (for the cloud, we actually favor Visual Studio Code, by the way).  In that situation, a student can feel very stressed: the TAs aren't teaching you what to do, other people are powering ahead, and you feel stuck.  We are trying to discourage people who would be in that situation from even taking the class: we want you to come to us prepared to jump in.

Same deal "inside" the cloud.  In your intelligent firewall project you would need an ML model for classifying messages ("safe" / "unsafe"), and would need to apply that model to each of these uploaded messages, in a way that scales out.  Our TAs can show you how to use an idea we will be talking about called a "scalable first tier service" implemented as an "Azure lambda service", but you would still have to build it.  Oddly, the amount of coding can be quite small -- a few lines in your favorite language.  But you do need to be able to search for a demo, compile and run it, then change it to do this stuff.  That requires some level of comfort writing little programs in Python or whatever, building them, testing them...  If you would need months to do even a small program, and tons of help, you might not be ready for cloud computing.

 We will discuss this more in class on Monday.  In fact, "homework 1" will be for you to upload a little form that we'll check, where you tell us where you got the experience from that makes you feel comfortable with this.  It could be anywhere -- maybe you learned enough back in high school to do little tasks like this, and have interned since then and used those skills a ton of times.  Maybe you learned here at Cornell.  No problem!  We trust you But if you don't have these skills, this class won't be a good choice for you.

What about those demos of "how to do such-and-such on Azure"?  Microsoft has a huge number of them, in open source, with excellent documentation.  If you have the basic skills, they bridge you to the specific ways of using them in the cloud.  And with this you can be insanely productive in the modern cloud!  You write a few lines of code and voila!  An ML does message classification for you, one you trained using a few cloud commands (and a trace -- with  no data to train on, this particular project wouldn't be very feasible).  So with those basic skills, you will do great in this class.

Wait List.  In fall 2022, most students will initially need to go onto a wait list.  Then, during the add/drop period, we will send you an enrollment PIN you can use to finalize your enrollment in the class.  There is no capacity limit in fall 2022, but we do take the background aspect seriously and some people may be asked to drop the course if they enroll, but in fact lack the required background.

Attending class.   Many studies show that watching a class on videos from home is not effective.  Please attend in person, then use the videos as a catch-up aid.  Don't assume that you can skip class and do just as well working from home.

Videos of lectures.  Ken will post video recordings of all lectures.

Exams.  Grading is 50% exams, 50% project.  The current plan is that exams will be in person, one prelim and one final.  Dates and location to be announced, but we might use recitation slots as exam slots if the University assigns us really wierd dates and rooms.  The exams focus on topics we covered in class and we will provide old exams that you can use as study and practice materials.  Exams will switch to being at-home if the covid situation makes an in-person test unwise.

Projects. Small homeworks and your project add up to 50% of your course grade.   Read the Project Options web page to learn more.

MEng Projects. Some students expand their CS5412 project into an MEng project. See the separate web page about CS5999 and the FAQ.   It is important to understand that an expanded project require much more effort than a regular project -- you should schedule your semester to include six or eight hours per week of additional effort, for which you will get CS5999 credits (your grade in the main course and in CS5999 will be identical).  To do this, you will need to submit a CS5412 MEng project plan that details the extra effort you plan to invest, and we will be meeting with you from time to time and tracking progress on the project (including those extra aspects that justified it being counted as an MEng project) throughout the semester.  In your final written report, which is required, you will need to document both what you accomplished and also your personal investment of effort, by telling us precisely which parts of the solution you personally created.

FAQ Syllabus for Spring 2022 Project Options Recitation Prelim study guide CS5999 Info
Cloud Resources Cloud computing accounts TextBooks (not required) EdStem Discussion Site Sample Prelim1
Sample Solution1
Sample Prelim2
Sample Solution2