CS5412: Projects

Homework versus Projects

In CS5412 everyone will do a few individual software development homework assignments. The goal is to gain some hands-on experience with cloud ideas. These are during the first four weeks of the semester.

How the Projects Work

We want you to work on your project through the entire semester, but the actual implementation work starts in week 5 and runs to the end of the semester. So starting on time is important. About half of our recommended projects are based on cloud support for intelligent agriculture (smart farming) ideas. We would consider allowing a few teams to work on topics not in our list, but we expect that almost all the teams will actually do projects from the list. The other half are ideas you bring us, that we vet and perhaps negotiate, but where you own the whole story. Every project must align with CS5412 topics and material.

Projects are usually team based. Although there are always a few individuals who end up doing projects on their own, CS5412 really encourages teamwork. Most teams will include 2 or 3 students from CS5412, and the farming teams will have two additional collaborators from the Cornell College of Agriculture and Life Sciences -- people specialized in a farming topic, who can help the team do something real and valuable and "valid" in the sense of making choices that are realistic. The farming experts will typically be students in CALS. We will introduce you to these collaborators.

For farming projects, we provide some data sets. Azure has many additional data sets, focused on things like climate over many years. But keep in mind that CS5412 isn't about data mining on static data -- every project is about something active, that deals with the outside world and has to cope with real-time events and real-time decision making, failures, consistency challenges, etc. The hackathon is a data mining opportunity. Projects center on coding, not scripting or running existing programs.

Everyone should select a project during the first four weeks of the semester, and we will ask you to upload a plan at the end of that period, including (1) team members, if you work with others, (2) the specific project you will do, (3) your timetable for starting to show some hands-on experience with the key elements, (4) the split of tasks within your team.

You can come to us with a team proposal of your own, if you know other people in the class. You can use Piazza to meet team members through their matchmaking feature (a bit like a dating app). Or you can ask us to assign you to a team. Any of these options will work.

Plan to spend 4-6 hours per person, per week, on these projects. We do homework in the first weeks, then the project, and the only exams are in-class quizzes that take a few minutes per week and a final, so the workload is actually pretty much average for a Cornell class, provided that you start on time.

MEng (CS5999) Option

Some CS5412 students have historically taken an additional semester of CS5999 credits with Professor Birman, towards their required MEng project credit. The CS5999 rule is that for each credit you take, you must put 2 hours per week in on the effort, and this adds to what you would have done for CS5412 in the first place, so it is a substantial extra effort and we expect to see a substantial "delta" to justify this.

Details are discussed in the recitations (and first lecture in the main class) but the idea in a nutshell is that you select a project for CS5412 that would be more ambitious than what you might normally have done. If you do a project with other students, they all would be doing CS5999 for the same number of credits as you -- every one puts in equal effort, and this means that a 2 or 3 person effort would be a pretty significant system (some people have prototyped startups this way! A typical big example can be found by hunting for the news releases for "Remember Me", an idea for a startup tied to CS5412 a few years ago).

We generally expect that CS5999 projects would be unique and original ideas, not just something we suggest, although there are sometimes hard open tasks in Ken's research project that a CS5999 can tackle. Even if we did provide an idea, the breakdown of how to tackle it is your job, and we expect it done well!

At each step of the project effort we will be checking that your vision really justifies the extra credits, and that you really are doing several hours per week more than if you weren't signed up for CS5999 too. Very often, the demos of these projects would occur after the normal demo day, to give a bit more time. But they need to complete by the day Professor Birman hands in his grades (there is a deadline for him), or you would end up with an INC, which we prefer to avoid.

Project grading is just the same as normal grading for the class and reflects the prelim grade, not just the project demo. This is because a project in CS5412 needs to be in part a proof that you mastered the ideas of cloud computing, and understand issues like cloud fault-tolerance, consistency, scalability, availability, existing frameworks, etc. We test your knowledge of that in part on the prelim.

Slide Set On Projects

We recommend that you look at this Powerpoint (pptx) or PDF file for a reminder of how projects work.

Project Goals

Azure IoT

In Spring 2019, our main goal is for every student to gain familiarity with a real cloud-based IoT platform, which would be the Azure IoT platform in most cases. Azure IoT can be programming in any language you like (they support 40 main languages and a few additional experimental ones), and has precreated "recipes" that many students would want to consider using as a kind of template for building a secure, elastic, complete solution to the farming scenarios we'll be focused on. Most people take existing Azure IoT micro-services, glue them together, and customize them by coding event handlers in a language like C#, C++, Scala, etc. These handlers can be as short as just a few lines of code and you usually create them by finding an example that seems close to what you need and then modifying it into the version you want for your project.

You'll end up coding mostly in what the Azure IoT people refer to as the "elastic Function Server" event-triggered layer. This does involve coding, but the amount of coding isn't going to be huge: lots of little event handlers, a bit like building a GUI where the user right-clicks on the image of a rock, this causes an "event" that gets handed to your "rock" object, and then the handler displays an animation of the rock splitting and Gimli the Troll leaping out. You probably wrote code like that in CS2110. So now you'll use that same way of thinking to accept photos from a camera in a cow barn, for example, or wind-speed updates from a sensor in a field where a drone is flying.

Projects focused on this side of Azure IoT would probably have an emphasis on new hardware (they might include an ECE sensor student creating new sensors for tracking underground water and nutrient movement, for example), or on routing data into Azure's existing micro-services. You would have to learn how those existing solutions work to use them, so you would become an expert Azure "meta-programmer" who puts together existing tools in new ways.

Derecho micro-services (new services that can run on Azure and be used from Azure IoT)

For people who want something a bit more hard-core, consider a project that would create a new micro-service, which could then be used in the Azure IoT ecosystem side by side with the ones from Microsoft. For this you would work in C++ (or some language like Java or Python that can import a library in C++ and call its methods). Cornell actually has been doing some software tools that we are contributing to Azure, namely our Derecho C++ library for building new IoT micro-services, so we will also have one group of projects focused on using Derecho in Azure IoT in this way, but that particular path would only make sense for serious builders with solid C++ experience.

Derecho is really best used from C++ but if you want to work from Java or Python or some other language, you'll take an Ubuntu container with Derecho pre-installed and will write your code to "import" the corresponding library ("dll"). Once you do this, you can do remote calls to any Derecho methods that don't have what C++ calls "templated APIs", meaning you need to tell Derecho the types you are using. Because Java and C++ don't have the same concept of types, that wouldn't work. But you can "wire down" the Derecho handlers by creating statically typed versions that have (size_t, char*) arguments, meaning "pointers to buffers" and, from Java, can pass in that sort of pointer (it is a special kind of reference called a "strong reference" and won't try to move or garbage collect the data while you are using it this way). Derecho's object store would be the ideal subset of the system for this sort of thing, and you can use it in this specific way from Java. Similar advice would apply for any language you like. You'll end up doing a tiny bit of C++ coding to create this call-through APIs for the specific methods you need to use, but those methods will just be a few lines long and mostly, you would work in your favorite coding environment.

For example, suppose that your team includes a person who is an expert on face recognition and you want to support "face recognition for farm animals". You could build a service that accepts requests from the Azure IoT function services. In would come an event from a camera: "got a new photo". The function servers don't run significant logic, but could pull the photo over and store it into your Derecho-based photo classification micro-service. Next your group member who does photo classification gets to show off: she has a deep neural network trained to classify type of animal (pig, cat, cow, goat, etc...) and then in a second step to segment the photo (find the faces of the cows) and for each face, tag it (this is "Sunny", a very good milk producer, and her calf "Bingy", who often gets into trouble and comes in covered in mud...), and then return a meta-data object containing this information.

A professional product of this kind would go further and "route" Sunny to her milking stall, Bingy over to the showers and then into a stall for his vet to have a look at that scratch on his leg, and it might track Sunny's milk production and try to correlate that with other factors such as which feed she is on, whether she spent the day in the fields or in the barn, how active she was, how long she spent ruminating and what temperature it was, and so forth. This would probably require more than one micro-service: think of each micro-service as a specialist focused on some subset of tasks: one to do face recognition and tagging, one to do a quick check for injuries that could need attention, one to check medical records to see if Bingy needs any vaccinations while the vet is dealing with that scratch, one to try and model milk production as a function of various variables we can track, etc. A smart farm might also have specialized services to deal with extracting fertilizer and fresh water from runoff and perhaps generating gas or bio-oil by heating waste to very high pressures and temperatures for brief periods. Farmers are hoping these kinds of ideas could lead to healthier herds with less use of antibiotics, much less pollution, and even generating revenue from spinoff products like the bio-oil (which is a very good basis for making diesel fuel).

Other kinds of farm tasks could involve monitoring fields using drones (a cool topic: optimizing them to fly during days with light winds by "sailing" on the wind, like a sailboat, to reduce battery energy consumed), classifying any problems (drought, damage from fungus or virus, insect damage), planning a remediation (localized irrigation, or spraying, or fertilizer), planning long term actions (maybe planting different seeds that are more drought resistent right at this one spot), etc. So one could imagine a lot of these micro-services, running side by side, specialized in very different roles.

You need the micro-service model for such tasks because function handlers are stateless and normally only run a few lines of code in total: they don't create big files and lack a place to keep machine-learned information like "facebook for Old MacDonald's dairy". So the split of functionality gives us function handlers that mostly route the tasks, and micro-services that do these sorts of tasks.

Project Demos

All projects will have similar phases: you form a team, the team picks a project, the team develops a plan, and then the team does some preliminary work to get hands-on familiarity with the technology. Then you report on progress and on the rest of the plan, mid-semester. If working with a CALS team on a dairy topic, the idea will be reviewed by dairy professors, too, not just CS people. Then you do the harder part of the project (the full version), and then you demo what you did.

Every project ends with a presentatio from a poster or slide set, and with a demo. You will sign up for a slot, your whole team will be there, and we will come see the work and ask questions and watch you run a demo.

Hackathon

Cornell is organizing a spring hackathon with prizes. This year's topic will be digital farming, and the emphasis is on data mining. Now, on the one hand, this isn't a direct fit with CS5412, where we focus on technology. Still, learning to use these tools is really an amazing opportunity. This is why we award extra credit for the hackathon.

BOOM

This is a project fair that runs at the very end of the semester. You compete to show your CS5412 project there, and if selected, you get extra credit (provided that you show up and participate of course).