CS5412: Topics in Cloud Computing
(Spring 2020 focus:
Using the Cloud to Create Smart IoT Systems).

Syllabus for Spring 2020

10:10-11:25 TR Gates Hall Room G01

Recitation: 7:30-9:00pm W Gates G01

Prof. Ken Birman, 435 Gates Hall, x5-9199.  Ken has a blog here.

Ken's office hours: Tuesday and Thursday after lunch (1:00-2:00) or by appointment.

TAs: Sagar Jha and Kwang Lim.  

This semester, Sagar will run the recitations, in the evenings, once per week.  Those are optional but many students find them really useful.   We will also have guest speakers from time to time.

Piazza Discussion Group: Log in to access it at piazza.  Assignments and graded materials: Find them in CMS.

What is this course about?  Cloud Computing is an overarching term that covers modern computing infrastructures to support the web: browsers and web servers, but also ways of building mobile clients, scalable web services, and very fast infrastructures for serving up content in geographically distributed systems that might include dozens of data centers and millions of computers. Everything we do is cloud-based or uses cloud solutions these days.

The Internet of Things (IoT) is a more recent trend that will make this connection to the cloud even stronger.  Developers are deploying all sorts of sensors and smart controllable devices, and then linking them to the cloud.  By doing this they can create applications that leverage machine learning or other forms of smart functionality in settings where we can actually control the outside world.  For example, a smart highway would be like an air traffic controller for cars: it would use video cameras to monitor the highway, "understand" what all the cars are doing and what hazards exist, and then use that knowledge to help cars.  A smart home uses tools like Alexa and Siri, but also knows about the physical layout of the home, and can help the residents.  For example, an elderly person worried about falling, or who might sometimes need a hand adjusting the shades or remembering that they left something cooking could benefit from a home that can play a role in doing those kinds of things.  A smart grid optimizes delivery of electrical power.  The list is really endless.

What makes all of these systems so smart is that people are moving decision-making support (machine learning) closer to the IoT sensors and actuators, for reasons of performance.. In classic machine learning and AI approaches, the ML and AI software was only available through data stored into files in the file system, and only ran in batches after long delays. So by shifting technology out from the back to the edge, we gain dramatic speedups.  

Lots of companies are players in this area, and we can't cover all of them.  In spring 2020, we'll mostly hear about the Azure IoT architecture.  This focus will let us drill in on IoT security, which is something Microsoft has viewed as a speciality.  Still, everything we learn about has a parallel in Amazon AWS or Google Cloud.

Our focus on Azure won't mean that you need to learn to program in Windows.  Under the covers, any Windows PC actually has a modern Ubuntu Linux system built right in!  Azure itself lets you skip Windows entirely and just program on Ubuntu Linux.  In fact, even the people who work at Microsoft on IoT solutions use Linux for all sorts of things.  In CS5412, we can stick with those Linux/Ubuntu APIs.  So if you are familiar with Linux, you won't need to learn a new OS or anything like that.  Of course if you prefer Windows for some reason, that works too.

We feel that for the 2020 offering, it would be best to clarify some aspects of the expectations and requirements for taking this course.

Why did we make this decision?   Around 2017-2018, some students started to feel that cloud computing is so important to getting a job that some people decided to take the class even knowing that they had completely inadequate background skills.  By 2019 we saw a large group of such students, and a few had really big problems surviving in the class.  They stopped coming to lectures because they couldn't follow them, but didn't have good ways to study the material purely from textbooks or papers or lecture notes from their friends, because this kind of class centers on the lectures.  People who were attending the classes complained about the strange dynamic this was creating, with half of the students not coming.  We agreed, as a class (the people in class, at least), that to attend this course, you do need to be able to follow lectures and come to class.  And as a group, we settled on a plan that I'll use this coming spring.

Now, before explaining this plan, I want to say that all of us understand how important landing a great job is to all our students, and we even understand why you might feel that this particular course is vitally needed.  The core issue that all of us need to appreciate is that without the right background, you simply can't walk in and survive in a really hard graduate course at a top university.  This adds up to two decisions.  First, we felt that we really need to clarify and stress the requirements.  To protect you against making really unwise choices, we plan to enforce them absolutely, with no exceptions.  This will also benefit students who do have the full background, because if everyone in the room starts on an even basis, the class can run in a smoother way, and the lectures can aim at "everyone" with a more uniform and balanced level of detail.  And then we need to use graded quizzes to make sure people actually are following the lectures, by making this a routine part of the experience, and having them lead towards the final exam, which will be based on the same material you'll have seen in the lectures and on those quizzes.

Prequisites [we always had them]. 
CS5412 is a hard course with a big software project.  You'll work in teams, but even so, you have to do an equal part of the work.  We require operating systems (either Cornell CS4410, or a solid grade in an equivalent course that covered similar topics), plus some exposure to networks and/or databases (Cornell courses are the ideal way to get that background). 

You should know how to build and run programs in a language like Python or C, on Linux.  Languages like C++ or Java or Scala are even more useful, but not required.  Everyone in the room should have actual hands-on experience, personally building some fairly hefty software systems that had to be developed from scratch, debugged, and then demonstrated successfully.  Do not even consider taking this course if you lack this background.  Speak directly with Professor Birman for explicit approval if unsure.  Taking OS at the same time, or taking OS but never having done any hands-on practical clas, is not a way to satisfy requirements.

Class attendance is required, and we use quizzes to verify that you really are there [new in 2020].  We will be using quizzes to enforce a simple but strict policy: unless you are interviewing or sick, we expect you to be in the room at 10:10, every single time, for the whole semester.  To enforce this we will have weekly quizzes that count towards your final grade.  You can miss any two quizzes out of the semester without telling us, but not more (if you do all of them, we'll drop the low two grades, so everyone will have 10 grades quizzes by the end of the semester, as 10% of their final grade).  You'll find these quizzes easy if you attended class and paid attention, plus you will be better prepared for the final exam, which is based on very similar material and with a similar style of questions.  In contrast, you will not be able to successfully complete this course if you skip classes.

On the final exam (~40% of your final grade), and on these quizzes (~10%), all questions will be short with written answers.  They won't be true-false, and won't have answers you can just copy from notes [again, nothing new here].  The final exam will be at 2pm on May 16.  Exams are ~50% of your overall grade and consist of approximately 10 small quizzes worth a total of 10% of your overall grade, and the final, which is worth ~40% of your overall grade.  In fact there may be more than 10 quizzes: we generally aim for 12 but then drop the low 2 (which also means missing 2 is fine).  These are approximate percentages and may be adjusted later in the semester to ensure fair weighting of each element.

In CS5412 we genuinely expect you to understand the material, and you cannot do the prelim by just memorizing any form of notes.  We do offer a lot of study materials, sample exams, etc, and we help you understand the content.   Then in the final, you will have to write little mini-essays on questions you have never seen in class.  Your grade will be based on whether your answers show real comprehension.  It will be impossible to do well if you cannot look at a new question, then synthesize knowledge taught in lecture and apply that to solve the question in a brief, clear way focused on the big issues.

Big Data.  In CS5412 we do look at big data technologies, but mostly from the perspective of how the tools were created.  You would need to take an ML course or a data analytics course to learn about actually extracting insight from big data using these modern tools.  So we will hear about MapReduce (also called Hadoop/Spark) as well as TensorFlow and PyTorch, but you won't learn to use them here.  But we do offer extra credit for participating in the Digital Agriculture hackathon.  This is mostly a big-data analytics experience and Microsoft helps you develop the needed hands-on skills and provides a ton of data sets you can use.  We highly recommend this experience -- but it isn't required.

Final.  We will have one cummulative final exam.  There will not be any prelim in 2020, just the one final, and the quizzes.  The final will be ~40% of your grade, so that between that and the quizzes, tests are ~50% of your grade.  The other ~50% will come from a few programming homework assignments during weeks 1-4, and the project grade. 

Beyond this we offer extra credit for students who do a digital agriculture project (1 point), verified participation in the Digital Agriculture Hackathon (a weekend event; 1 point) and the BOOM projects fair (1 point).   Some years there is also a chance to participate in an industrial affiliates program (1 point).  A few of these are competitive: you can propose a BOOM project, but you only get the credit if you are accepted, and participate.  Others are pretty much guaranteed, if you want to do them.

You will need to study for the final exam.  We have posted a study guide and will also run a review session.  The prelim from 2018 is here and the one from 2019 is here Keep in mind that the class itself isn't identical and that these two exams were done as prelims, a bit earlier in the semester.  In 2020 we will have a final, but the style of the exam will be similar.  There is no makeup date planned for the final, which will occur at the time and place Cornell assigns to us -- we have no control over that.

Projects. Everyone has to work on a project. Not counting extra credit, your project grade will be one half of your course grade!

In Spring 2020 about half the students will work on projects we'll be recommending: Digital agriculture projects coming from Cornell's "smart dairy" classes and from the people using robotics and IoT sensors in greenhouses and on outdoor farms.  We have a few data sets for people who prefer to work on their own, but many of these projects will involve groups of our students meeting with groups from courses in those other areas, in the evening, and cooperatively learning.  For those projects there would be a joint demo with graders present from both courses, and because the team would be larger, you can aim to accomplish much more than by yourself.  Grades will reflect this: the very top grades often go to these larger teams simply because they can do such amazing things as a group.

The other half of the students will do "pure" software development projects.  Those will often have IoT aspects, and will be challenging to create because you will build non-trivial software solutions.  Making it even harder, you'll propose the actual effort, and we won't give a lot of suggestions -- we'll critique your ideas but you will put them out there, convince us, and execute on the plan.

MEng Projects. Some students expand their CS5412 project into an MEng project. This is not a problem!  See the separate web page about CS5999 and the FAQ.  One wierd thing to be very aware of is that if you do this, you really are using CS5412 itself as a component of your MEng project, and your project grade be the same as your CS5412 course grade, not a separate thing limited just to the project.  This is because CS5412 projects draw on CS5412 material, and we can judge a project without also testing (hence our quizzes and final) to understand how deep your real understanding of the issues is.  So if you want to do a project that would have no connection to a course and exams at all, this isn't the best source for such a project!  But otherwise, you can sign up by just (1) adding the extra credits, and (2) DEFINITELY doing a bigger, more ambitious project that justifies the extra 6 hours per week those extra credits represent (1 credit == 2 hours per week of work through the semester).  A person who invested an extra 60-75 hours on a project would be expected to have done a bit more than a person who didn't invest that much extra time.  So perhaps you would do as much as a bigger team, or add bells and whistles that many groups didn't have time to do, or explore scalability and performance in a much more extensive way.

We will require a CS5412 MEng project plan that details the extra effort you plan to invest, tracks it through the semester, and in your final written report, which is required, you will need to document both what you accomplished and also your personal investment of effort as it really played out, to justify that you received extra credits for this.

Help! I can't enroll. At Cornell, enrollment is prioritized and runs in a series of  "tranches".  Some people do need to join a wait list (we open it up during the spring enrollment period), but because plenty of people drop the class, eventually the wait listed students do get a chance.  The important thing is to attend the class, even if you have not yet been able to officially enroll.  If you wait until the enrollment finally opens up, you may have missed the first few weeks of class, and at that point will be so far behind that catching up can just be impossible.  So come to class even if you aren't actually enrolled yet.  Priority is given to students in the CIS unit, and then we open the class for qualified non-CIS students at the end.  It is rare that someone who wanted to attend is unable to do get in.

FAQ Syllabus for Spring 2020 Project Options Recitation Prelim study guide CS5999 Info
Cloud Resources Cloud computing accounts TextBooks (not required) Piazza Discussion Site Sample Prelim