615 - Peer-to-Peer Systems

<head>
<title>615 - Peer-to-Peer Systems, Spring 2005</title>
<link rel="stylesheet" type="text/css" href="foo1.css">
</head>

<center>
<font face="verdana">
<h1>615 - Peer-to-Peer Systems</h1>
<h2>Spring 2006</h2>
<h3><a href="http://www.cs.cornell.edu/People/egs/">Emin Gun Sirer</a></h3>
</center>

<center>
<table border=1 width=60% bgcolor=ff8080>
<tr><td>
<center><h3>Homework 2</h3></center>

The hidden agenda behind the second homework is to write code
that interoperates with existing protocols, to perform network
measurements, and to deploy some measurement code on PlanetLab.

<p>
The concrete goal of the project is to compare the network
properties of Gnutella peers to those of PlanetLab hosts.

<p>
The homework consists of two parts.

<h4>Part I: The Gnutella graph, and its peer properties</h4>

<p>
First, you should construct a Gnutella crawler that walks
over the Gnutella network, discovers peers, discovers which files
they are sharing, and performs some sample downloads from these
peers to determine the download bandwidth available from that peer.

<p>
The Gnutella specification, as well as many reference implementations,
are online. Your code needs to follow the spec in order to talk to
existing Gnutella clients. You might want to start by developing (or
adopting - feel free to reuse any of the existing Gnutella walkers
and clients as a starting point) a program that will simply walk
and output the Gnutella graph. Then modify it to query each node for the list of files
it is sharing. Then query the sizes of the files, and if you find
a suitable file in the 1 to 5 MB range, download it, discard all 
received data, and time how long it took for the download to complete.
Be sure to record all relevant information, including the time of
day, size of file, download duration, and peer ip address.

<p>
Be sure to write your code defensively. We do not want to bring
down the Gnutella network. You must make sure that you do not
leave any zombie processes behind that will perform pointless
downloads forever.

<p>
You should collect data from at least 300 peers. You should have 
at least 3 data points from each peer. Of course, it's always good
to collect more data, say from millions of peers, but make sure that 
you do not exert undue stress on any given node. Treat them as you
would like to have your home box treated.

<p>
This experiment should yield a CDF plot of bandwidth versus 
percentage of nodes. 

<h4>Part II: The PlanetLab peers and their properties</h4>

<p>
<a href="http://www.planet-lab.org">PlanetLab</a> is an
incredibly valuable system for distributed systems experimentation 
that you should all be familiar with.

<p>
Sign up for a PlanetLab account. It'll take 24 hours for it to be
enabled and propagated. Once that happens, you will have the ability
to ssh into any of the nodes on the PlanetLab network, spanning 5
continents and several hundred sites.

<p>
There are two ways in which we will use PL hosts in this homework.
First, we'll use them as participants in a direct experiment. Then 
we'll use them as an experimental platform from which we measure other
nodes on the Internet.

<p>
First, measure the bandwidth from Ithaca to a few hundred PL hosts, 
extract the bandwidth CDF, and compare the results against Gnutella 
peers. Do PL hosts have more or less bandwidth available than Gnutella
peers? 

<p>
Second, use your time of day data to determine the
variation between measurements during the day (noon-5pm) and in the evening
(7pm-midnight) (use our timezone, ignore the remaining hours). 
This should yield four graphs, corresponding to (day, night) and
(Gnutella, PlanetLab). You will likely see some natural diurnal 
variation. Is the diurnal variation more or less pronounced on PL hosts? 
Answering this question (whose answer no one knows) should require
nothing more than processing of data you collected earlier.

<p>
Finally, repeat your Gnutella tracing study, using different PlanetLab
hosts. How different are the characteristics of the Gnutella hosts, 
when you use a different vantage point for your measurements?

<p>
<h4>Extra Credit:</h4>
Here are some things you can do for extra credit and to earn a hacker badge:
<ul>
<li><b>Draw the Gnutella graph</b>. Everybody has one of these, so
should you. Free graphing tools on the internet make it trivial to
convert the output of your graph walker to a nice plot.

<li><b>Examine the Gnutella graph for vulnerabilities</b>. Draw a plot
of how badly the graph would be impacted (% nodes disconnected)
if certain nodes were taken out (% nodes failed). Is there a 
knee in this curve? If so, are there any common features to the
nodes below the knee, i.e. the most valuable assets in the Gnutella 
network, which, if targeted, would bring the vast majority of the
network down? 

<li><b>Draw a global bandwidth map</b>. Use 
<a href="http://www.cs.cornell.edu/~bwong/octant/">Octant</a> to determine the
approximate location of a Gnutella node on the global map. Octant will
compute an estimated region and a point estimate for any peer that responds
to ICMP ping requests (many peers do not, so expect that this approach will
work for 1 out of 10 nodes or so). For simplicity,
ignore the region, and assume that the node is at the estimated point. 
Divide the globe into grid squares, say 100x100 miles. Color the
grid square that the node resides in based on your bandwidth measurement to that peer. Keep doing this until a fair portion of the map is colored in. Take the median when
multiple nodes occupy the same grid square. This should yield a map
of earth's bandwidth achievable from Ithaca, NY. Do a nice job of 
plotting it, write your name on the lower right corner and I'll
get one framed copy for you and one for my office.

<li><b>Draw other global bandwidth maps</b>. Draw the graph as in the 
previous step, and repeat it from different PlanetLab nodes. How different
are the bandwidth maps when they are collected from Boston, SF, Seattle,
NYC, Ithaca, Urbana-Champagne, Toronto, and Cambridge, UK? Who has the
best connection to the rest of the world?

</ul>
<hr size=0>
<address>
<a href="/People/egs/"><img src="email.gif" width="140" height="25"></a></body></html>