CS5412 Spring 2012 Project Ideas


How CS5412 Projects Work:

Below we have a list of projects broken into three categories: easy, medium and hard.  The idea is that an easy project would be safe for a single person who is unsure of his or her skills to do on their own for the class.  The medium and hard projects would be suitable for a pair of students to tackle jointly (they would get the identical grade and so should do equal work).  At the end of the semester you will tell us if you did the work as a team of two.  If a team falls apart, you will need to each finish up separately.

How to tell us what project you are doing:

We need you to tell us what project you will be doing.  By February 15, please upload a one-page (two pages, absolute maximum) document to the CMS system (http://cms.csuglab.cornell.edu) into the "Project Plan" assignment.  Later, on the last day of classes when we review projects, we'll have this original plan with us and will want you to explain how you departed from the plan if the thing you actually do isn't quite what you originally had in mind.

This document should have the following information:

1) The full list of project participants (either just yourself, or if you work as a two-person team, full names and netids for both of you).  Both of you must upload the same document, separately, to your respective CMS accounts.

2) The project title and difficulty level (either from the list below or, if you propose a new project, similar in style). 

3) The short description copied from below or, if you propose a new project, similar in length and style

4)  If you will use your project for MEng credit, a sentence saying "This project will be used for CS MEng credit, approved by (Ken/Hussam/Qi/Z/Li).  Please note the rules given below!  You must respect them or we can't approve the CS MEng credit request.  The name to list can be Professor Birman or one of the TAs and that person must meet with you, discuss your plan for MEng project credit and approve your plan. 

5) A paragraph on what you will do to carry out the project:  "We will be downloading the Isis2 and Live Distributed Objects system, building them under Windows, implementing our own architecture for monitoring electric-power "phasor management unit" devices (PMUs), placing simulated PMUs on the NSF GENI testbed...." etc.  You can evolve this plan later if needed; the one you file is an initial concept.

6) A paragraph explaining how you will demonstrate the project (on completion, we will have a visual demo and a poster.  The demo will show....)  This can evolve over time too.

7) If team of two, who will do what.

MEng Project Credit:

If you wish to use CS5412 for MEng project credit, just sign up for 3 credits, graded, of CS5999 with Professor Birman's code.  We will use the CS5412 grade as the CS5999 grade.  Note that this means your quiz scores in CS5412 actually count towards you CS5999 grade too.

MEng projects done by a single person must of medium or hard difficulty.  We rarely approve MEng projects for teams of two and when we do, they must always be hard and we must always have a clear explanation in advance of who will do what and why both students would deserve MEng credit for the work.

Due Date:  CS5412 projects are due on the last day of the course, which is set aside as a project demo day.  On request, short extensions of at most 10 days may be granted, but you must request the extension, explain precisely why you need extra time, and get actual permission from Professor Birman or a TA, in writing.  Otherwise, late projects will be reviewed during the same 10 day period but if you didn't get permission to finish late, a penalty to your grade may apply (e.g. A+ work might get an A grade if you finished a week late and didn't have permission to work a week longer).

Grading:  Your MEng project will be graded by doing a demo and also presenting a poster that shows what you did to the grading team composed of Professor Birman and the TAs.   We grade in the range B to A+ for most projects.  Sometimes a very weak effort may receive a B- or lower.  Our aim is to have the median grade be on the B+/A- border: half above and half below.

 To get an A+ in CS5412 you must be one of the very best projects that the team saw.  We award very few A+ grades.  Sometimes we don't award any; more often, four or five students in the entire class might receive an A+.  The quiz scores also count towards your CS5412 grade.

Extra credit:  An MEng project shown at the BOOM projects fair will recieve extra credit (e.g. B work might receive a B+ grade).  However, extra credit will not boost your grade beyond A.

Projects not on our List:  You can suggest a project of your own but it should be similar to the ones on the list and you should tell us which chapters of the textbook you hope to draw on in developing your solution.  We do not allow CS5412 projects to come from completely different courses or areas.  Thus while you might manage to find a project that overlaps between the security class and the cloud computing class (in which case we would probably let you do the one project for both courses), more often it would be hard to pull that off because the coursex cover different material.  A CS5412 project, in short, must be based on what we learn in the CS5412 class.

  1. [Easy] Build a distributed web crawler and indexer.  Build a system that uses multiple crawler processes (potentially on tens of machines) to crawl a set of websites (bit-torrents or any other open system also work), and process/organize the data in a way that is easy to search. One example is to create an index page listing the acquired data through the crawl. Students can use Isis2 to ease their development.
  2. [Easy] Integrate the Isis2 system with the Live Objects system, both available for download from codeplex.com.  By creating a new kind of network monitoring "sensor" show how this solution could let a cloud management team easily build applications to monitor the network behavior of cloud-hosted applications.
  3. [Easy]  Experiment to see how fast FaceBook updates propagate and how consistent their data is.  Again, you would mix updates coming from Cornell with test systems running on PlanetLab (or vice versa).
  4. [Medium] Develop a RAMCloud-like System. The system need not be a kernel module, but can instead reside in user-space.
  5. [Medium] Develop a DiskCloud-like System. The students can use Fuse to build a distributed storage service that can be mounted as a local file system. Note: this is a reuse of the project in Hakim's distributed storage class.
  6. [Medium] P2P MapReduce. The idea is to build a distributed processing platform based on a P2P substrate. Students should write code to schedule and manage the processes running the different tasks. The platform need not be MapReduce, it just needs to be a platform that cam take in jobs and process them in a distributed fashion.
  7. [Medium]  Design a P2P solution for mobile users who want to connect with their friends or avoid some people.  It should run on mobile phones and let you set your status (in a hurry, eager to hang-out, etc) and on that basis, if two people come within some range of each other could pop up a notification of the right kind.
  8. [Medium] Build a distributed storage system with a simple client interface like dropbox. Clients can connect through a web interface or a desktop application. The system should support different users with different files. The system should scale with the addition of new servers and balance load across all servers. Data should be replicated to tolerate server failure. Other interesting features may be added as well
  9. [Medium] Using gossip, build a system that runs on a cloud and senses DDoS attacks on some of its first-tier applications.  You should invent an instrumentation API to sense the events that you are using as your indication of an attack (so the application would "help" by providing you with information).  Your service should have a visual GUI that an administrator could use to see where hot-spots are arising and maybe even a way to tell a service to shift from a hot-spot to some other node that isn't under attack. 
  10. [Medium] Implement a gossip-based failure detector inside cloud systems. A group nodes should organize themselves by using gossip protocol, and monitoring its neighbors' healthiness. Since failures in cloud can be server-based, rack-based and even cluster-based, it is worthwhile thinking about exploring group's geological deployment and optimize a layered neighboring selection. For example, virtual machines on the same server should be 1st-layer neighbors, servers in the same rack for 2nd-layer. Then, different fan-out and gossip intervals can be tuned for less cross-layer traffic but fast failure notification across the whole group. (Leave out the virtual host detection feature if this is too hard.)
  11. [Medium/Hard] Build a purely P2P version of Twitter that has strong guarantees of privacy and also anonymity.  It should have a notion of groups that can be created and with access keys that get shared outside of the system.  A user should be able to post private messages that only members of the right groups can access, and the system should be designed so that even if someone was spying on it, they would not be able to tell who posted which message.
  12. [Medium] Distributed Query Processing Service:  Using Cornell's new MiCA gossip programming language, implement the following application.   Individual nodes maintain their own logs, which are collated to form a system-wide log partially ordered using vector clocks.  Queries can be executed on the log, and ideally a query's result should be updated continuously as new events are appended to the log (without recomputing the result from scratch).  For example, every node might periodically log its CPU usage and available disk capacity.  Reasonable queries would be "what's the average CPU usage system-wide?" and "what's the total available disk space in the system?"

  13. [Medium] Distributed Log Service:  Using Cornell's new MiCA gossip programming language, implement cloud tomography:  IInfer the shape of the underlying communication network through gossip.

  14. [Medium]: Using Cornell's new MiCA gossip programming language, implement sensor network visualization. Make a general system for visualizing the output of a gossip-based sensor network on Google Maps using PlanetLab. 

  15. [Medium]: Using Cornell's new MiCA gossip programming language, reinvent FacebookCreate an eventually consistent distributed publish-subscribe system with a social network interface.

  16. [Hard]: Using Cornell's new MiCA gossip programming language, implement Gossip Objects as a MiCA layer.  Gossip Objects improves the performance of probabilisic publish-subscribe by speculatively delivering messages to intermediary nodes which may in turn deliver messages to their intended recipients.

  17. [Medium]: Using Cornell's new MiCA gossip programming language, Implement distributed cache optimization.  Memcached is a distributed in-memory cache that stores key-value pairs for rapid lookup.  Create a gossip system that helps memcached nodes coordinate by speculatively caching popular keys and evicting unpopular ones.  (See also Beehive by Ramasubramanian and Sirer)

  18. [Hard] Using Isis2 design a replicated file system service that brings a restarting file system "close" to synchronization with existing active ones before joining and transfers just the remaining delta of file system state, to minimize restart disruptions.  Demonstrate it on Amazon EC2 or RedCloud or Azure.
  19. [Hard] Using the (delayed) feed of aircraft location data from the FAA, build a high-assurance ATC system.  Carefully justify the assurance properties of the solution.
  20. [Hard] Customized web service that adapts its behavior according to load. Monitor the load servers and if the load is too high, may start more instances(EC2) to serve request. If the load is low, may shutdown instances. Your solution should make sure the instances are consistent with each other.
  21. [Hard] Cloud Geo-Caching: build a storage service to manage data on at least two geographical locations (e.g. different Amazon AWS availability zones) and a local client. The local client can be either a smart phone or a desktop application. The storage service should transparently move the data between the different locations (and perhaps do prefetching for reads) in order to minimize client-perceived latency. The three locations (2 cloud and 1 local) should not be full mirrors of one another. Instead, one of them should be a master and the other 2 should act as caches to improve performance. The master can be changed to a different location if the client relocates (and you need to be able to show that). The space at each of the cache locations is limited.

    For example, assume the client application is a photo viewer+editor. The end-user might add new photos to his album on the local client, and the data will transparently move to the master. If the client views a picture from one album, the storage layer can perhaps pre-fetch all the pictures of that album in order to minimize the latency for future views.

    The use of caching in your project should minimize cost and improve performance over a non-cached system.
  22. [Medium or Hard] Port/Build an application based on Isis running on Android. Mono has an android version and can support native C#/.Net code. So I am interested in seeing an Android phone/tablet running something based on Isis. I haven't look into it yet, and have no idea how to implement it so it might be listed harder than it should be. The application can be running on a cluster of phones and then with this bunch of wimpy computers we can provide a portable consistent cluster...