CS5412 Spring 2012 Project Ideas
 
How CS5412 Projects Work:
Below we have a list of projects broken into three categories: easy, medium 
and hard.  The idea is that an easy project would be safe for a single 
person who is unsure of his or her skills to do on their own for the class.  
The medium and hard projects would be suitable for a pair of students to tackle 
jointly (they would get the identical grade and so should do equal 
work).  At the end of the semester you will tell us if you did the work as 
a team of two.  If a team falls apart, you will need to each finish up 
separately.
How to tell us what project you are doing:
We need you to tell us what project you will be doing.  By February 15, 
please upload a one-page (two pages, absolute maximum) document to the CMS 
system (http://cms.csuglab.cornell.edu) 
into the "Project Plan" assignment.  Later, on the last day of classes when 
we review projects, we'll have this original plan with us and will want you to 
explain how you departed from the plan if the thing you actually do isn't quite 
what you originally had in mind.
This document should have the following information:
1) The full list of project participants (either just yourself, or if you 
work as a two-person team, full names and netids for both of you).  Both of 
you must upload the same document, separately, to your respective CMS accounts.
2) The project title and difficulty level (either from the list below or, 
if you propose a new project, similar in style).  
3) The short description copied from below or, if you propose a new 
project, similar in length and style
4)  If you will use your project for MEng credit, a sentence saying 
"This project will be used for CS MEng credit, approved by (Ken/Hussam/Qi/Z/Li).  
Please note the rules given below!  You must respect them or we can't 
approve the CS MEng credit request.  The name to list can be Professor 
Birman or one of the TAs and that person must meet with you, discuss your plan 
for MEng project credit and approve your plan.  
5) A paragraph on what you will do to carry out the project:  "We 
will be downloading the Isis2 and Live Distributed Objects system, building them 
under Windows, implementing our own architecture for monitoring electric-power 
"phasor management unit" devices (PMUs), placing simulated PMUs on the NSF GENI 
testbed...." etc.  You can evolve this plan later if needed; the one you 
file is an initial concept.
6) A paragraph explaining how you will demonstrate the project (on 
completion, we will have a visual demo and a poster.  The demo will 
show....)  This can evolve over time too.
7) If team of two, who will do what.
MEng Project Credit:
If you wish to use CS5412 for MEng project credit, just sign up for 3 
credits, graded, of CS5999 with Professor Birman's code.  We will use the 
CS5412 grade as the CS5999 grade.  Note that this means your quiz scores in 
CS5412 actually count towards you CS5999 grade too.
MEng projects done by a single person must of medium or hard difficulty.  
We rarely approve MEng projects for teams of two and when we do, they must 
always be hard and we must always have a clear explanation in advance of who 
will do what and why both students would deserve MEng credit for the work.
Due Date:  CS5412 projects are due on the last day of 
the course, which is set aside as a project demo day.  On request, short 
extensions of at most 10 days may be granted, but you must request the 
extension, explain precisely why you need extra time, and get actual permission 
from Professor Birman or a TA, in writing.  Otherwise, late projects will 
be reviewed during the same 10 day period but if you didn't get permission to 
finish late, a penalty to your grade may apply (e.g. A+ work might get an A 
grade if you finished a week late and didn't have permission to work a week 
longer).
Grading:  Your MEng project will be graded by doing a 
demo and also presenting a poster that shows what you did to the grading team 
composed of Professor Birman and the TAs.   We grade in the range B to 
A+ for most projects.  Sometimes a very weak effort may receive a B- or 
lower.  Our aim is to have the median grade be on the B+/A- border: half 
above and half below.
 To get an A+ in CS5412 you must be one of the very best projects that 
the team saw.  We award very few A+ grades.  Sometimes we don't award 
any; more often, four or five students in the entire class might receive an A+.  
The quiz scores also count towards your CS5412 grade.
Extra credit:  An MEng project shown at the BOOM 
projects fair will recieve extra credit (e.g. B work might receive a B+ grade).  
However, extra credit will not boost your grade beyond A. 
Projects not on our List:  You can suggest a project of 
your own but it should be similar to the ones on the list and you should tell us 
which chapters of the textbook you hope to draw on in developing your solution.  
We do not allow CS5412 projects to come from completely different courses or 
areas.  Thus while you might manage to find a project that overlaps between 
the security class and the cloud computing class (in which case we would 
probably let you do the one project for both courses), more often it would be 
hard to pull that off because the coursex cover different material.  A 
CS5412 project, in short, must be based on what we learn in the CS5412 class.
	- 
	[Easy] Build a 
	distributed web crawler and indexer. 
	Build a system that uses multiple crawler 
	processes (potentially on tens of machines) to crawl a set of websites 
	(bit-torrents or any other open system also work), and process/organize the 
	data in a way that is easy to search. One example is to create an index page 
	listing the acquired data through the crawl. Students can use Isis2 to ease 
	their development.
- [Easy] Integrate the 
	Isis2 system with the Live Objects system, both available for 
	download from codeplex.com.  By creating a new kind of network 
	monitoring "sensor" show how this solution could let a cloud management team 
	easily build applications to monitor the network behavior of cloud-hosted 
	applications.
- [Easy]  
	Experiment to see how fast FaceBook updates propagate and how consistent 
	their data is.  Again, you would mix updates coming from 
	Cornell with test systems running on PlanetLab (or vice versa).
- [Medium] Develop a 
	RAMCloud-like System. The system need not be a kernel module, but 
	can instead reside in user-space. 
- [Medium] Develop a 
	DiskCloud-like System. The students can use Fuse to build a 
	distributed storage service that can be mounted as a local file system. 
	Note: this is a reuse of the project in Hakim's distributed storage class.
- [Medium] P2P 
	MapReduce. The idea is to build a distributed processing platform 
	based on a P2P substrate. Students should write code to schedule and manage 
	the processes running the different tasks. The platform need not be 
	MapReduce, it just needs to be a platform that cam take in jobs and process 
	them in a distributed fashion. 
- [Medium]  Design 
	a P2P solution for mobile users who want to connect with their friends or 
	avoid some people.  It should run on mobile phones and 
	let you set your status (in a hurry, eager to hang-out, etc) and on that 
	basis, if two people come within some range of each other could pop up a 
	notification of the right kind.
- 
	[Medium] Build a 
	distributed storage system with a simple client interface like dropbox.
	Clients can connect through a web interface or a 
	desktop application. The system should support different users with 
	different files. The system should scale with the addition of new servers 
	and balance load across all servers. Data should be replicated to tolerate 
	server failure. Other interesting features may be added as well
- [Medium] Using gossip, 
	build a system that runs on a cloud and senses DDoS attacks on some of its 
	first-tier applications.  You should invent an 
	instrumentation API to sense the events that you are using as your 
	indication of an attack (so the application would "help" by providing you 
	with information).  Your service should have a visual GUI that an 
	administrator could use to see where hot-spots are arising and maybe even a 
	way to tell a service to shift from a hot-spot to some other node that isn't 
	under attack.  
	
- [Medium] Implement a 
	gossip-based failure detector inside cloud systems. A group nodes 
	should organize themselves by using gossip protocol, and monitoring its 
	neighbors' healthiness. Since failures in cloud can be server-based, 
	rack-based and even cluster-based, it is worthwhile thinking about exploring 
	group's geological deployment and optimize a layered neighboring selection. 
	For example, virtual machines on the same server should be 1st-layer 
	neighbors, servers in the same rack for 2nd-layer. Then, different fan-out 
	and gossip intervals can be tuned for less cross-layer traffic but fast 
	failure notification across the whole group. (Leave out the virtual host 
	detection feature if this is too hard.)
	
	
- [Medium/Hard]
	Build a purely P2P version of Twitter that has strong 
	guarantees of privacy and also anonymity.  It should have a notion of 
	groups that can be created and with access keys that get shared outside of 
	the system.  A user should be able to post private messages that only 
	members of the right groups can access, and the system should be designed so 
	that even if someone was spying on it, they would not be able to tell who 
	posted which message.
- 
	[Medium] Distributed Query Processing Service:  Using 
	Cornell's new MiCA gossip programming language, implement the following 
	application.   Individual nodes maintain their own logs, 
	which are collated to form a system-wide log partially ordered using vector 
	clocks.  Queries can be executed on the log, and ideally a query's result 
	should be updated continuously as new events are appended to the log 
	(without recomputing the result from scratch).  For example, every node 
	might periodically log its CPU usage and available disk capacity. 
	 Reasonable queries would be "what's the average CPU usage system-wide?" and 
	"what's the total available disk space in the system?" 
- 
	[Medium] Distributed Log Service:  Using 
	Cornell's new MiCA gossip programming language, implement
	cloud 
	tomography:  IInfer the shape of the underlying 
	communication network through gossip. 
- 
	[Medium]: 
	Using Cornell's new MiCA gossip programming language, implement
	
	sensor 
	network visualization. Make a general system for visualizing 
	the output of a gossip-based sensor network on Google Maps using PlanetLab. 
	
	 
- 
	
	[Medium]: Using Cornell's new MiCA gossip programming language,
	
	reinvent 
	Facebook:  Create an eventually consistent distributed 
	publish-subscribe system with a social network interface. 
- 
	
	
	[Hard]: Using Cornell's new MiCA gossip programming language, 
	implement 
	Gossip Objects as a MiCA layer.  Gossip Objects 
	improves the performance of probabilisic publish-subscribe by speculatively 
	delivering messages to intermediary nodes which may in turn deliver messages 
	to their intended recipients. 
- 
	
	
	
	[Medium]: Using Cornell's new MiCA gossip programming language, 
	Implement distributed 
	cache optimization.  Memcached is a distributed 
	in-memory cache that stores key-value pairs for rapid lookup.  Create a 
	gossip system that helps memcached nodes coordinate by speculatively caching 
	popular keys and evicting unpopular ones.  (See also Beehive by 
	Ramasubramanian and Sirer) 
- 
	[Hard] Using Isis2 design a replicated file system service 
	that brings a restarting file system "close" to synchronization with 
	existing active ones before joining and transfers just the remaining delta 
	of file system state, to minimize restart disruptions.  Demonstrate it 
	on Amazon EC2 or RedCloud or Azure.
- [Hard] Using the 
	(delayed) feed of aircraft location data from the FAA, build a 
	high-assurance ATC system.  Carefully justify the assurance 
	properties of the solution.
- [Hard] Customized web 
	service that adapts its behavior according to load. Monitor the 
	load servers and if the load is too high, may start more instances(EC2) to 
	serve request. If the load is low, may shutdown instances. Your solution 
	should make sure the instances are consistent with each other.
	
	
- 
	[Hard] Cloud Geo-Caching: build 
	a storage service to manage data on at least two geographical locations 
	(e.g. different Amazon AWS availability zones) and a local client. The local 
	client can be either a smart phone or a desktop application. The storage 
	service should transparently move the data between the different locations 
	(and perhaps do prefetching for reads) in order to minimize client-perceived 
	latency. The three locations (2 cloud and 1 local) should not be full 
	mirrors of one another. Instead, one of them should be a master and the 
	other 2 should act as caches to improve performance. The master can be 
	changed to a different location if the client relocates (and you need to be 
	able to show that). The space at each of the cache locations is limited.
 
 For example, assume the client application is a photo viewer+editor. The 
	end-user might add new photos to his album on the local client, and the data 
	will transparently move to the master. If the client views a picture from 
	one album, the storage layer can perhaps pre-fetch all the pictures of that 
	album in order to minimize the latency for future views.
 
 The use of caching in your project should minimize cost and improve 
	performance over a non-cached system.
- [Medium or Hard] 
	Port/Build an application based on Isis running on Android. Mono 
	has an android version and can support native C#/.Net code. So I am 
	interested in seeing an Android phone/tablet running something based on 
	Isis. I haven't look into it yet, and have no idea how to implement it so it 
	might be listed harder than it should be. The application can be running on 
	a cluster of phones and then with this bunch of wimpy computers we can 
	provide a portable consistent cluster...