(2018 version.  Click here for 2019).

How the Projects Work

We want you to work on your project through the entire semester (some people even continue beyond the end of the semester, for extra credit). So starting on time is important.

Everyone should select a project during the first two weeks of the semester, and we will ask you to upload a plan at the end of that period, including (1) team members, if you work with others, (2) the specific project you will do, (3) your timetable for starting to show some hands-on experience with the key elements, (4) the split of tasks within your team.

Plan to spend 4-6 hours per person, per week, on these projects. This is in lieu of other homework, and there is no final, so the workload is actually pretty much average for a Cornell class, provided that you start on time.

Our initial hope had been to have people use Derecho for their projects, but the platform is not yet stable enough to make this a wise choice.  So we are suggesting that you consider setting up Tensor Flow instead, and working in that model.  The experience would be very valuable for landing high-paying jobs these days!  There is one Derecho project opportunity mentioned at the bottom of this page, but we will probably limit how many people can work on it.

Project Goals

For spring 2018, we are recommending that students try to build a Fog or Edge computing application, and that they consider using Tensor Flow for the "online" aspects.   We will not be teaching you to use Tensor Flow; you would need to watch the online training videos and read the documentation.  But basically, this is a Python-based language (so your code is entirely in Python), that has a distributed computing model built into it, based on a flow of events that hold data in the form of a matrix, image, video, or a numerical "tensor".  In fact they treat all such data as some kind of tensor, for uniformity, but this is because even a string can be viewed as a 1 x 1 matrix with one cell containing a string.  So if you are very aggressive about it, anything is a tensor.  Tensor Flow is about "flows" of tensor objects from data sources, like sensors, into other subsystems in a cloud.

But Tensor Flow is not obligatory.  On Microsoft Azure, a version of C# called Dandelion might of interest, especially if you work with video content.  On Amazon, you can use Tensor Flow, but there are all sorts of tier-two service options for dealing with flows of events from sensors.  You do need to figure out what you will use, on your own, because there are so many choices.  But once you decide, our TAs can help you map the problem you pick to the technology you picked.  The TAs don't know everything about everything, but they are good at this kind of system structuring.

Aim for something simple, then build on that.  For example, start by "capture a sequence of data records, and do something to them, and store the results."  Work your way up and out, taking small steps.  This way you will always have something working, even if the thing isn't the world's most exciting.  Don't wait until the last minute, swing for the outfield fence...  and strike out entirely!

 Projects can be specialized, but would generally have a structure along the following lines.

For example, you could try something like this:

A Few Examples

Integration of Social Media with Videos

A first example of a project along these lines would focus on social networking, like with Facebook or other companies.  This is more of an edge example than a fog computing case, but because you want snappy response, it fits the theme just outlined.

Context: the problem arises because there is a lot of interest in ways that video content could be integrated more effectively with social networking content of the kind that Facebook TAO captures and tracks (read the paper if you don't remember what TAO does, or haven't yet seen that lecture).  Right now, the architecture of the classic cloud blocks us from fully exploiting this opportunity.

So suppose that you know things about a set of people, via TAO, and now those people are in a setting with a lot of video capabilities, like a disco, or a concert, or a smart home (see the smart home example below; this could be part of the same project). And suppose that the goal is to build a scalable cloud computing service that can turn any gathering into an amazing "movie" about the event (but not in the sense of snooping on what people are talking about!).

A movie about a party or a concert might center on the dinner, or on the performance, but cut little snippets in of groups of people talking and laughing, or dancing, or doing a wave. You would want the system to tag the people, then to use the TAO network to suggest to those people that they might want to share the video with particular friends. For example, if you and your friend Jane were at the party, but your shared friend Tallia was away interviewing, you or Jane could share the video with her.

There are a number of puzzles here, and we would expect projects to focus on a narrower kind of technical story but to think it through carefully. First, you have the technology questions themselves: how to film a whole concert, or a party, or people riding on a terrain park structure. Next is the puzzle of tagging: for privacy, you definitely would want all the people in a video of this kind to agree that it is ok with them if you share it. Then beyond that, you need to figure out how to leverage the TAO social network graph, and last, how to stitch the pieces together into a social-network "enriched" story. And there are questions of scalability: how can the solution be scaled out to run on the cloud.

A scenario with fewer privacy issues would be a sports event like a professional soccer or football or baseball game. You might imagine a set of cameras, all around the stadium, that capture the big game from every possible perspective. How would you use cloud computing to enable viewers to use joysticks on their home systems to zoom around and see the event from individually personalized perspectives? Could you and your pals from before you came to Cornell watch this kind of enriched-media game together, just like in old times, but with the network linking you together? "Sammy, he never should have missed that field-goal. Look at how easy the shot was from the kickers perspective!" and then you might zoom right in to show how it looked to that kicker as he struck the ball. And Sammy might come right back "No Tom, totally wrong -- the point is that this linebacker over here was charging in and the kicker had to watch him. Look at the same shot once you keep that guy in your sights..."

Someday this will be a big business -- a super big business. Doing a project like this one could launch you on the path to entrepreneurial zillions!

Companies like Facebook, Google (YouTube), Verizon (Yahoo) and others are keenly interested in this whole area.

One concern: videos such as these (lots of them) can only be processed in limited ways by the cameras themselves. So very likely the cloud solution will need a powerful GPU cluster to crunch the data, shared by the first-tier systems that see the streams (or those machines might use their own on-board video cards as GPUs). A convincing project probably needs to at least hint at how you would leverage that kind of hardware, and why it should be feasible. But as a student at Cornell you might not easily be able to demonstrate that aspect. So think this question through before proposing to do this project. (The same concern arises about the next project, too, but it is a bit simpler because the smart cars on a smart highway know a lot about themselves, and you could focus on that case -- in which case maybe videos aren't quite as important for figuring out what in the world is going on!).

Smart Highway: A Tensor-Flow Application

As a second example, solidly in the fog computing domain, consider something we actually discussed in class: a smart highway would have video sensors and motion sensors to watch the cars on the highway, plus it might have ways for the cars themselves to upload data from on-board cameras. The processing stages would basically identify the cars and other vehicles and then compute their trajectories (their motion paths). Based on this, if a car is following the predicted path, you might record the exact path but wouldn't update the knowledge of the system. If the car has departed from predictions, you would store data into the versioned storage layer, then use the back-end Spark/Databricks tools to "relearn" the vehicle trajectories. Finally, if your system predicts that two cars may get closer than a safe limit, you could warn about this, via console messages. In reality, you would link those back to the highway and to other cars.

With this structure, here's how each step could be developed and demonstrated. Some stuff is in bold because we want you to pause and think about those points:

Now think about scalability: in practice, each data capture layer will be sharded to scale, with perhaps thousands of two-node shards. Will your design scale in this sense?

Note: not every project needs to cover every aspect of this problem. Feel free to slice off a subset of the puzzle and to solve just that portion, but to solve it really well. But no matter what you decide to do, it has to be a real cloud computing scenario, approached realistically, built (not done purely on paper), and validated through experiments. During your demo we want to see some evidence that the solution solves the portion of the problem you selected, scales well, performs well, is fault-tolerant, etc.

Smart highways are going to be a real thing. If you work on this topic, you might consider joining a company working on real smart highway products!

Other Project Ideas:

We won't flesh these out in equal detail, but here are some other concepts you could explore. Or dream up a project of your own (provided we approve it).