Homework versus Projects

In CS5412 everyone will do a few individual software development homework assignments.  The goal is to gain some hands-on experience with cloud ideas.  These are during the first four weeks of the semester.

How the Projects Work

We want you to work on your project through the entire semester, but the actual implementation work starts in week 5 and runs to the end of the semester. So starting on time is important.  About half of our recommended projects are based on cloud support for intelligent agriculture (smart farming) ideas.  We would consider allowing a few teams to work on topics not in our list, but we expect that almost all the teams will actually do projects from the list.  The other half are ideas you bring us, that we vet and perhaps negotiate, but where you own the whole story.  Every project must align with CS5412 topics and material.

Projects are usually team based.  Although there are always a few individuals who end up doing projects on their own, CS5412 really encourages teamwork.  Most teams will include 2 or 3 students from CS5412. 

Every project needs a mix of real-time data with other kinds of data sets.  Azure has many additional data sets, focused on things like climate and weather prediction, river depths and flood levels and flood predictions, air pollution, farm crop choices, and so forth, and the data will cover many years.  But keep in mind that CS5412 isn't about data mining on static data -- every project is about something active, that deals with the outside world and has to cope with real-time events and real-time decision making, failures, consistency challenges, etc.  You will need to automatically pull streaming data into your project, and combine it with other data sets in intelligent ways.  The hackathon teaches some of this, but tends to focus on AI and data mining scenarios, often using preloaded Azure data sets.  As a result, even if the hackathon teaches you exactly the right things on the AI and data mining side (and it generally does!), you will still need to write code to integrate those mechanisms with your streaming input.

Everyone should select a project during the first four weeks of the semester, and we will ask you to upload a plan at the end of that period, including (1) team members, if you work with others, (2) the specific project you will do, (3) your timetable for starting to show some hands-on experience with the key elements, (4) the split of tasks within your team.

You can come to us with a team proposal of your own, if you know other people in the class.  You can use Piazza to meet team members through their matchmaking feature (a bit like a dating app).  Or you can ask us to assign you to a team.  Any of these options will work.

Plan to spend 4-6 hours per person, per week, on these projects. We do homework in the first weeks, then the project, and the only exams are in-class quizzes that take a few minutes per week and a final, so the workload is actually pretty much average for a Cornell class, provided that you start on time.

PaaS not IaaS

Each year we see a few students who did a project in some class -- perhaps, in Professor Hariharan's course on computer vision -- and then ask if they can just submit this as cloud computing project.  They would usually propose to port the project from the computer they implemented it on in computer vision so that it can run on Azure, and then maybe run it on photos available in one of the Azure data repostories.  A project like this is not suitable because it would be too easy.  One way to use Azure is to ignore all the cloud features and just treat it as a way to "rent a remote Linux computer and log into it using ssh".  That works, and in fact works so well that "porting" is as simple as just copying the programs over.  You ask Azure for a new VM,  scp the files into it, ssh in and recompile, and your code should run just like it did when you took Bharath's class.  This takes a solid half an hour.  So first of all, the project becomes too easy.  And secondly, this sort of IaaS approach is simply not what the course is about.  IaaS is just a fancy term for renting a Linux box from Microsoft. 

We want you to think about projects that treat Azure as a platform (hence PaaS not IaaS -- PaaS means "platforms as a service", whereas IaaS means "infrastructure as a service".  PaaS entails using Azure built-in solutions such as the AI Engine and the CosmosDB storage system, in serious ways.  This invariably means using lambda functions to control the Microsoft service solutions, and that becomes your main coding task -- building those lambdas and editing the JSON files to tell Azure when to run them.

So, if you built the world's best animal classifier from images for computer vision, just running it on Azure won't be enough.  But integrating it more deeply with Azure would be a great project, as long as you show us how to upload new images (ideally, automatically right after they are captured), understand where they ended up, and are able to tell your image analysis program how to find the files it should use as inputs.  With those extra steps, the project suddenly becomes a cloud computing one, and then it would be fine.

Most Projects will be "homemade"

The class is too heterogeneous for everyone to do a dairy project, although those are cool and we give extra credit for people who pick one, because they will have external collaborators and it takes time to interact with students in a different unit.  But the upshot is that although we have ideas, shown below, most people will form a team to do a project in some area, and then with their teammate(s) will brainstorm to pin down something exciting and this will become their highly individualized, personal project.  You actually are way more likely to amaze us and get an A+ if you invent a project of your own.

A Few Project Ideas

To be acceptable, the project must be matched to your group size, not too ambitious or too easy, and must have a plan to use continuously streaming data plus data already in Azure, combined within the application.  The more ambitious projects are the ones that generally win the highest grades, but you can still get an A or A- even with an easy project.

If you aren't a great programmer and don't want this course to be a big time overhead, go with an easy project.  True, your grade might not be an A+ (maybe not even an A), but you won't suffer and hate the class.  On the other hand if you are a pretty comfortable builder who isn't afraid to tackle new challenges, go for one of the more ambitious projects!  They are much closer to the real spirit of the course.

MEng (CS5999) Option

Some CS5412 students have historically taken an additional semester of CS5999 credits with Professor Birman, towards their required MEng project credit.  The CS5999 rule is that for each credit you take, you must put 2 hours per week in on the effort, and this adds to what you would have done for CS5412 in the first place, so it is a substantial extra effort and we expect to see a "delta" to justify this.

Details are discussed in the recitations (and first lecture in the main class) but the idea in a nutshell is that you select a project for CS5412 that would be more ambitious than what you might normally have done.  If you do a project with other students, they all would be doing CS5999 for the same number of  credits as you -- every one puts in equal effort, and this means that a 2 or 3 person effort would be a pretty significant system (some people have prototyped startups this way!  A typical big example can be found by hunting for the news releases for "Remember Me", an idea for a startup tied to CS5412 a few years ago). 

We generally expect that CS5999 projects would be unique and original ideas, not just something we suggest, although there are sometimes hard open tasks in Ken's research project that a CS5999 can tackle.  Even if we did provide an idea, the breakdown of how to tackle it is your job, and we expect it done well!

At each step of the project effort we will be checking that your vision really justifies the extra credits, and that you really are doing several hours per week more than if you weren't signed up for CS5999 too.  Very often, the demos of these projects would occur after the normal demo day, to give a bit more time.  But they need to complete by the day Professor Birman hands in his grades (there is a deadline for him), or you would end up with an INC, which we prefer to avoid. 

Project grading is just the same as normal grading for the class and reflects the prelim grade, not just the project demo.  This is because a project in CS5412 needs to be in part a proof that you mastered the ideas of cloud computing, and understand issues like cloud fault-tolerance, consistency, scalability, availability, existing frameworks, etc.  We test your knowledge of that in part on the prelim.

Project Demos

All projects will have similar phases: you form a team, the team picks a project, the team develops a plan, and then the team does some preliminary work to get hands-on familiarity with the technology.  Then you report on progress and on the rest of the plan, mid-semester.  If working with a CALS team on a dairy topic, the idea will be reviewed by dairy professors, too, not just CS people.  Then you do the harder part of the project (the full version), and then you demo what you did.

Every project ends with a presentation from a short slide set or a poster-formatted single slide, and with a demo.  You will sign up for a Zoom slot, your whole team will be there, and we will see the work and ask questions and watch you run a live demo.