Homework 2The concrete goal of the project is to compare the network properties of Gnutella peers to those of PlanetLab hosts. The homework consists of two parts. Part I: The Gnutella graph, and its peer propertiesFirst, you should construct a Gnutella crawler that walks over the Gnutella network, discovers peers, discovers which files they are sharing, and performs some sample downloads from these peers to determine the download bandwidth available from that peer. The Gnutella specification, as well as many reference implementations, are online. Your code needs to follow the spec in order to talk to existing Gnutella clients. You might want to start by developing (or adopting - feel free to reuse any of the existing Gnutella walkers and clients as a starting point) a program that will simply walk and output the Gnutella graph. Then modify it to query each node for the list of files it is sharing. Then query the sizes of the files, and if you find a suitable file in the 1 to 5 MB range, download it, discard all received data, and time how long it took for the download to complete. Be sure to record all relevant information, including the time of day, size of file, download duration, and peer ip address. Be sure to write your code defensively. We do not want to bring down the Gnutella network. You must make sure that you do not leave any zombie processes behind that will perform pointless downloads forever. You should collect data from at least 300 peers. You should have at least 3 data points from each peer. Of course, it's always good to collect more data, say from millions of peers, but make sure that you do not exert undue stress on any given node. Treat them as you would like to have your home box treated. This experiment should yield a CDF plot of bandwidth versus percentage of nodes. Part II: The PlanetLab peers and their propertiesPlanetLab is an incredibly valuable system for distributed systems experimentation that you should all be familiar with. Sign up for a PlanetLab account. It'll take 24 hours for it to be enabled and propagated. Once that happens, you will have the ability to ssh into any of the nodes on the PlanetLab network, spanning 5 continents and several hundred sites. There are two ways in which we will use PL hosts in this homework. First, we'll use them as participants in a direct experiment. Then we'll use them as an experimental platform from which we measure other nodes on the Internet. First, measure the bandwidth from Ithaca to a few hundred PL hosts, extract the bandwidth CDF, and compare the results against Gnutella peers. Do PL hosts have more or less bandwidth available than Gnutella peers? Second, use your time of day data to determine the variation between measurements during the day (noon-5pm) and in the evening (7pm-midnight) (use our timezone, ignore the remaining hours). This should yield four graphs, corresponding to (day, night) and (Gnutella, PlanetLab). You will likely see some natural diurnal variation. Is the diurnal variation more or less pronounced on PL hosts? Answering this question (whose answer no one knows) should require nothing more than processing of data you collected earlier. Finally, repeat your Gnutella tracing study, using different PlanetLab hosts. How different are the characteristics of the Gnutella hosts, when you use a different vantage point for your measurements?
Extra Credit:Here are some things you can do for extra credit and to earn a hacker badge:
|