CS6460 - Peer-to-Peer Systems

Emin Gun Sirer

Project 1

The hidden agenda behind the first homework is to get used to the concept of rapidly prototyping a P2P application based on loosely coordinated hosts connected over sockets. Both network programming and rapid prototyping are critical skills; this homework is designed to exercise both.

The concrete goal of the project is to develop an unstructured, Gnutella-like filesharing system.

To keep the user interface simple, you should build two simple programs.

The first program is a long-lived peer (peer), which is responsible for (1) building and maintaining an unstructured mesh, (2) for forwarding queries along edges of the mesh, and (3) responding to queries. So the peer will have to keep track of three things: a list of other peers in the network that it knows about (limited in number to some user-specified parameter N, usually in the 6-10 range), a list of files it would like to provide to other peers, and a list of queries it has forwarded recently.

The peer code should be started with the number for the peer's own control port, the name and port number of a bootstrap peer, and a list of files to share. The bootstrap peer is any peer that is already part of your "615tella" network. If the bootstrap peer name and port are "none:0", then the peer should skip the initial network initiation step and should itself become the seed node for the network.

The peer code will generally be structured as a server loop that accepts incoming connection requests at the peer's control port. The general behavior of a peer is to accept connections on the control port from other peers and respond to them. A newly arriving peer will, for instance, create connection to the control port of the bootstrap node.

Upon contacting the bootstrap node, the newly arriving peer should first insert itself into the network. To do this, it needs to locate a peer which has an empty slot in its peers table. Your wire protocol should support a "LISTPEERS" primitive which, if sent by a node A to B, should elicit B to respond with a list of its peers (ASCII string containing name:port tuples). Your wire protocol should also support "INSERTME name:port" primitive, which should return "1" if the insertion was succesful, or "0" if the insertion failed, typically due to lack of space in the peers list. A node whose first INSERTME failed can pick any of the peers returned by its previous LISTME, forge a connection to that peer, and try inserting itself at that node. This should yield an unstructured graph with constant outdegree.

You will also need to write a second program, called query, which takes a keyword, and the name and port number for a peer already in your 615tella network. It sends a QUERY packet with a query identifier (a 32-bit random number), a time-to-live in hops, and the keyword itself. Every peer node that receives a QUERY packet checks to see if that query has been seen before (if so, the packet is discarded), then checks to see if it matches any files (if so, a RESPONSE packet, carrying the peer name, control port number, and SHA1 hash of the file content, is sent back to the peer from which the QUERY packet was received), and then forwards the QUERY packet to its own peers.

The query program displays the responses it receives to the user as it receives them, or else times out after 10 seconds. Since you should not be spending too much time on the GUI portion of the program, if there are any responses at all, query should go ahead and download the file from any one of the peers. For this, it should create a connection to the named peer's control port, and issue a FETCH request, identifying the file by its SHA1 hash. query should simply exit when it is done fetching the file.

I have deliberately left unspecified the precise query forwarding algorithm. You might want to implement a full breadth-first flood, a depth-first search, or a random walk of the graph formed by peer nodes. Pick whatever you think provides a good tradeoff between implementability and low network load and high query accuracy.

The homework will be evaluated for correctness and code structure, so please pay careful attention to clean, modular and extensible design as you implement the two programs. This is also an opportunity to experiment with rapid prototyping languages such as Python, Ruby, and others. Turn in your code by mailing me a TAR or ZIP file containing your source, and a (very short) README describing how your code works.

Nodes should check their neighbors periodically to see if they have gone down, and open up new slots in the peer connection table. This detection should be done every time a query is forwarded (where a failed node will trigger a socket-level exception) as well as periodically (say, once every minute, even if there are no queries in the network).

Getting Started:

Since all endnodes are identified by host:port tuples, you can run many instances of peers on a single physical host simply by assigning each peer a separate port number. Start by writing a simple client-server program that sends a packet and receives a response, and build on this base to add the packet types described above. You may use the nodes in the CFS cluster (cfs01 through cfs39.cs.cornell.edu) to run your code.

Once your code runs for real, acquire a PlanetLab account and run it on the cornell_cs615 slice.

Extra Credit:

Here are some things you can do for extra credit and to earn a hacker badge: Caution about extra credit work: End-to-end system performance is critical. If the base system doesn't work, you don't get any credit!