From tmroeder@CS.Cornell.EDU Mon Nov 11 16:37:56 2002 Received: from dhcp97-250.cs.cornell.edu (dhcp97-250.cs.cornell.edu [128.84.97.250]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gABLbtQ17737 for ; Mon, 11 Nov 2002 16:37:55 -0500 (EST) Received: (from tmroeder@localhost) by dhcp97-250.cs.cornell.edu (8.11.6/8.11.6) id gABLbtp01924; Mon, 11 Nov 2002 16:37:55 -0500 From: Thomas Roeder MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15824.9011.156936.136495@dhcp97-250.cs.cornell.edu> Date: Mon, 11 Nov 2002 16:37:55 -0500 To: Emin Gun Sirer Subject: 615 PAPER #63 X-Mailer: VM 7.07 under Emacs 21.2.1 The three papers today present a view of the structure and characteristics of P2P networks. The questions for today's papers ask whether structured or unstructured P2P systems perform better. We are further asked about the design constraints of a P2P system. The Small World paper describes the characteristics a network must have to allow a decentralized algorithm to find a good short path to a given node. It gives proofs of the properties required. With respect to the proposed questions, it shows that given the dimensionality of the space, there are a particular number of long-distance contacts that an average node must have to find these short paths with reasonable probability. A design constraint of a P2P system is thus that the methods for establishing contacts within the system must encourage these kinds of links and in these proportions so that the small world property can be established. It shows that an unstructured P2P system might perform at least as well as a structured system by taking advantage of the small world property. The chapter "Performance" from the P2P O'Reilly book makes a comparison of Freenet and Gnutella and discusses the applicability of the small world effect to their structure. Both are unstructured P2P systems, but with significant differences in their performance characteristics. As above, the small world property is exploited in Freenet to get relatively small pathlengths for queries. A structure is suggested for Gnutella which would make a hierarchy of peers, which could then overcome the problems it has with its broadcast search. It does suggest that at least with respect to searchability, the unstructured network labors under a significant disadvantage. The measurement paper from UWashington measures Napster and Gnutella in terms of the characteristics that their users possess. They show that many of the standard assumptions about the users of P2P systems do not hold in practice. Their main result, however, is that the heterogeneity of these two P2P systems is very high, and that any such system which wishes to improve its performance needs to use information about the characteristics of nodes in the network (preferably by discovering them itself, since they also discovered a significant amount of misrepresentation in the system) to direct queries and downloads. This result suggests that the small world property discussed in the two papers above is a useful property but not sufficient to guarantee good performance of an unstructured system over a structured one. Indeed, the remedies proposed by this paper would be more easily implemented on a more structured system, since the less structure that exists, the more times the work involved in making these measurements would have to be repeated. The design constraints revealed by this study are particularly striking in terms of the uncooperative nature of the participants and the short periods of time for which they remain on the network. A short charaterization of these properties might be that a significant fraction of the members of the network lie about their attributes, are only there to get resources, and only stay on the network for short periods of time. The users tend to have low enough bandwidths on average that many cannot function as servers. From bd39@cornell.edu Mon Nov 11 23:18:01 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gAC4I1Q24157 for ; Mon, 11 Nov 2002 23:18:01 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id XAA19310; Mon, 11 Nov 2002 23:17:59 -0500 (EST) Date: Mon, 11 Nov 2002 23:17:59 -0500 (EST) From: bd39@cornell.edu X-Sender: bd39@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU cc: bd39@cornell.edu Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII PAPER 63 Small-World Phenomenon This paper gives an theoretical basis to the performance of peer to peer routing systems which exhibit local clustering along with knowledge of routes to neighbors which are topologically distant. The theoretical result of this paper is that given a k-dimensional space, maintaining a uniform distribution of long distance links between a node and a node at each distance scale (a node at distance k^i, for i=0 to log N), the expected routing length is logarithmic to number of nodes. It is also surprising that if the distribution is not uniform, then the performance quickly degrades from logarithmic, both for big and small network diameters. The important aspect with respect to peer-to-peer network routing is that routing must be able to traverse the network topology on different scales in order to efficiently find paths to destination nodes in an efficient manner. Performance of Freenet and Gnutella This paper (book chapter) gives an overview of a performance evaluation between Freenet and Gnutella. The Freenet portion we have seen before in the Freenet paper itself. In Freenet, we see that efficient routing is dependent on the existance of highly connected nodes which effectively present a shortcut routes. The Gnutella evaluation shows that gnutella queries can quickly saturate network links. Gnutella also does not have the small world highly connected nodes of Freenet, and thus queries into the network require contacting a large (50) number of nodes. Gnutella also scales linearly in bandwidth with the number of nodes in the system, which places an effective cap on the network size/query size. Measurement of Peer-to-Peer File sharing Systems This paper looks at the distribution of clients types in the peer-to-peer networks of Napster and Gnutella. The paper measures network bandwidth, latency, number of shared files and uptime of the peers of these peer to peer network. These measurements were taken by crawling both networks with queries and contacting their respective IP addresses to measure bandwidth, latency and uptime. These measurements show that a) the download capacity of the network exceeds the uploading capacity of the network b) few of the "peers" can fit in the high-availability, high-bandwidth profile of a server, while many of the clients are simply freeloaders on the network (especially Gnutella, where 7% of the peers serve more files than all other users combined). In terms of load balancing, most of the p2p "peers" behave in a client/server manner, rather than a peer to peer relationship. Can unstructured P2P systems perform as well as, or better than, more structured P2P networks ? What are the design constraints of P2P systems? >From the small world paper, it can be seen that if an unstructured p2p system has a uniform distribution of links among different routing distance scales, then it is possible to perform just as well as the structured routing schemes we have examined. It may be difficult to enforce this however, and from the theorectical result we can also see that performance depends on the uniformity of the choice of routes. The small world phenomenon also leads to an issue of load balancing. For those servers who happen to be highly connected, it seems that they would be a performance bottleneck for routing in the system. Because they lie on the prefered routes of many nodes, their resources to service requests could be exhausted by the amount of queries directed to them. Also, the selfish behavior of the users of a p2p network causes load imbalance in of itself. There must be some kind of quota of use system which balances what users give and take from the system. It seems that p2p systems need to have a mechanism to guarrantee the two P's, which is difficult to maintain given the heterogenaity of clients interacting in the network. Design constraints in a p2p network would be constant amount of bandwidth to work with (limiting scaling), support for many underfeatured clients and freeloaders with very few superpeers, (or some way to ensure this doesn't quite behave as badly). From shafat@CS.Cornell.EDU Tue Nov 12 00:10:46 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gAC5AkQ07522 for ; Tue, 12 Nov 2002 00:10:46 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 63 Date: Tue, 12 Nov 2002 00:10:46 -0500 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D134FD7@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 63 Thread-Index: AcKKCdt9PCTY75D0SbqOHUQIRZC2ww== From: "Syed Shafat Zaman" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id gAC5AkQ07522 The first paper is a discussion of the "Small-World Phenomenon" from an algorithmic perspective. It studies the correlation between the local structure and the long-range connections of a node in a network which are deciding factors in its capability to efficiently route messages to any other node on the network. The second paper further makes use of this observation and carries out two case studies involving Freenet and Gnutella. The theoretical analysis, based on the Small-Word Phenomenon, shows that given a critical threshold correlation, it is possible to route messages in an efficient manner regardless of the nature of the underlying graph. For example, for a regular graph, the high level of clustering leads to highly optimized local routing but poor average global pathlength. On the other end of the spectrum, random graphs exhibit exactly the opposite trends. The intermediate cases, which are more representative of P2P systems such as Freenet or Gnutella, however show desirable characteristics. Locally, they show high degree of clustering, while at a global level, the pathlength drops approximately to that of a random graph. The second paper presents detailed simulations results confirming this behavior in both Freenet and Gnutella. This essentially implies that unstructured P2P systems, which are what any decentralized P2P systems are, do have the ability to perform as well as more structured systems. There are a number of constraints that need to be taken into consideration during the design phase of a P2P system. System lifetime dictates the average lifespan of a node on the network as an end-user machine could go offline, crash, or simply become active anytime. Designers also have to work with a bottleneck link bandwidth which is the slowest hop between any two nodes on a P2P network. Since its a physical property of a network, it remains constant so long the same path is used. Freeloaders are another constraint that needs to be addressed. More specifically, there has to be policied defined regarding how to handle these nodes that are not sharing files or participating in the network by providing information to help routing. Finally, much consideration also has to be given to the trustworthiness of the information a node provides about itself to other users. Should this be determined by the system? Bandwidth information or type of connection, for example, can easily be falsely provided by a node. From mr228@cornell.edu Tue Nov 12 01:15:13 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gAC6FCQ20538 for ; Tue, 12 Nov 2002 01:15:12 -0500 (EST) Received: from cornell.edu (syr-24-58-48-238.twcny.rr.com [24.58.48.238]) by cornell.edu (8.9.3/8.9.3) with ESMTP id BAA06682 for ; Tue, 12 Nov 2002 01:15:11 -0500 (EST) Message-ID: <3DD09CF5.479E21EB@cornell.edu> Date: Tue, 12 Nov 2002 01:17:25 -0500 From: Mark Robson X-Mailer: Mozilla 4.76 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: egs@CS.Cornell.EDU Subject: 615 PAPER 63 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit The measurement paper compares the true-P2p system Gnutella and the more centralized Napster. Unfortunately, the paper does little comparison of the two protocols themselves and instead focuses on comparing characteristics of the user populations of both systems. This does allow them to draw some interesting conclusions, however. The primary contribution is to suggest that since hosts on the Internet differ greatly (by many orders of magnitude) in bandwidth, latency, processing power, etc. future P2P networks should be designed to make optimal use of this. (and in fact KaZaa and Grokster do this) They also observe that hosts tend to lie about their statistics, and therefore there needs to be incentives for hosts to be truthful or measurements (bandwidth, latency) need to be actively taken. Kleinberg's small world paper presents background on the small world conjecture and discusses Milgram's original experiment. Milgram's original work claims that the diameter of social networks is small (< 6) and furthermore that a path from any individual to another can be recovered using only "local" knowledge of each person. This translates to many other networks and the hope is that we can design these overlay networks to behave like small worlds and thus allow for bounded (and small) routing times. Kleinberg discusses how to design and grow networks so that this small world property emerges. This paper seems to suggest that an unstructured network (if it behaves like a small world) is similar to a structured one and, in theory, might perform just as well. The P2P Performance chapter from O'Reilly quickly dismisses any system that is in any way centralized and gives a comparison of the "true P2P" systems Gnutella and Freenet. They too discuss the small world property and its application to these systems. Freenet has the small world property as one of its design goals and this chapter discusses it at length. The Gnutella discussion suggests that the network can be quickly flooded and unreliable. Furthermore, Gnutella can't really play the small world card to help its case. This paper also discusses "free-riding" (removing resources from the network without donating any in return) and even goes so far as to call it a failure mode. Clearly something needs to be done to incentivize nodes to share. Design constraints that exist in P2P networks are as follows: (1) There is a limited amount of network power underneath the overlay network. Bandwidth, latency, etc. need to be used efficiently. (2) A certain percentage of hosts are unwilling or at least reluctant to share resources with the rest of the network. A scheme needs to be designed to provide the necessary incentive. (3) Some nodes have much more bandwidth, lower latency, more compute cycles, etc. than others. This heterogenity needs to be exploited by having some nodes do more work, maintain more connections, etc. This can further be supported by the incentive scheme of #2 From mp98@cornell.edu Tue Nov 12 02:24:07 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gAC7O6Q03789 for ; Tue, 12 Nov 2002 02:24:06 -0500 (EST) Received: from cornell.edu (r109493.resnet.cornell.edu [128.253.240.252]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id CAA13360 for ; Tue, 12 Nov 2002 02:24:06 -0500 (EST) Date: Tue, 12 Nov 2002 02:24:05 -0500 Mime-Version: 1.0 (Apple Message framework v546) Content-Type: text/plain; charset=US-ASCII; format=flowed Subject: 615 Paper 63 From: Warren Lapine To: egs@CS.Cornell.EDU Content-Transfer-Encoding: 7bit Message-Id: X-Mailer: Apple Mail (2.546) Kleinberg's paper shows us that it is possible to route efficiently in a decentralized system even without a strict structural underpinning. Kleinberg's model shows that it is possible to route a message towards a destination in a grid network with only order log n^2 hops if one has the following information: * A set local contacts within some distance and a set of some long range contacts chosen at random (with probability proportional to the inverse square of their distance from the source) * A correspondance of message to a location within the network and the location of all contacts. The importance of inverse square above is intuitive when one considers that the number of possible contacts should grow squarely with increasing distance. Given this information, Kleinberg's paper shows that the simple algorithm of forwarding the message to the contact closest to the destination should achieve an expected routing time proportional to the log of the number of nodes in the network. It is unlikely that a P2P system will have the same concept of locality as Kleinberg's grid. What is important to take away from the paper is that a small world property will only emerge in a system in which nodes maintain in addition to their local knowledge, a small set of long range links, and more importantly, that the frequency of these long range contacts must correlate closely to some notion of their density (i.e., in the grid model, the probability of adding a node at a certain distance was proportional to the number of nodes within that distance.) Hence, if we have an unstructured P2P network (like Gnutella or Freenet), we can only expect it to route effectively if it has the above mixture of local and long range links. But if it does, we can perhaps achieve route lengths logarithmic in the size of the network, as good as Pastry or Chord! Note, however, that neither of the above two systems should be expected to exhibit such a property, as even if their link structure follows the small world relation, they lack a correlation between the "location" of a query and the "location" of a node. It is difficult to see how one would even define such concepts in Gnutella. In Freenet, of course, keyspace specialization should eventually approach this property. The second paper for today, by Theodore Hong, emphasizes this point. Short paths do exist in Freenet between randomly selected pairs of nodes, and the Freenet graph is certainly connected (or close enough that it doesn't matter in any interesting way). Their simulation of Freenet does exhibit very short path lengths (6 Freenet hops) albeit under a lot of communication. If, however, we take away the behavior that Freenet always routes towards nodes who are closest to the destination in the keyspace, the median path length remains at 50 Freenet hops. In effect, this removes Kleinberg's second requirement above. The third paper for today tackles a very different subject: What is the real behavior of nodes in File Sharing systems? Such information is important as it tells us what kind of expectations we may place on a P2P system design (i.e. how stable do we expect the network to be in terms of node uptime): * Bandwidth measurements are fairly encouraging: More than half of file sharing users seem to have bandwidth measurements greater than 64 kbps. * Uptime is harder to measure as quality of service seems to be important to system life time measurements. Napster gave an arguably better experience than Gnutella even though their end-user modes of operation were much the same (time in a search, get a file). But whereas the best 20% of Napster nodes have an uptime of 83% or more, the best 20% of Gnutella have an uptime of only 45% or more. * Free riders are definitely a problem: As many as 25% of Gnutella users are free riding. Like the American economy, Gnutella is skewed, with 7% of its users offering more files than all of the other users combined. From nbs24@cornell.edu Tue Nov 12 08:35:10 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACDZAQ27241 for ; Tue, 12 Nov 2002 08:35:10 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id IAA18885; Tue, 12 Nov 2002 08:35:07 -0500 (EST) Date: Tue, 12 Nov 2002 08:35:07 -0500 (EST) From: nbs24@cornell.edu Message-Id: <200211121335.IAA18885@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: nbs24@cornell.edu Reply-To: nbs24@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: nbs24@cornell.edu X-Originating-IP: 132.236.71.82 Subject: 615 PAPER 63 The small-world phenomenon: An algorithmic perspective This paper examines decentralized algorithms by which nodes, knowing only the locations of their direct acquaintances, attempt to transmit a message from a source to a destination along a short path. The authors prove that in a class of networks generate according to the Watts-Strogatz model, there is no decentralized algorithm capable of constructing paths of small, expected length. They define an infinite family of random network models that generalizes the Watts-Strogatz model and demonstrate that there is a decentralized algorithm that is able to find short paths with high probability. They also prove that there is a unique model within this family for which decentralized algorithms are effective. The proofs demonstrate that the small-world phenomenon makes it possible for unstructured P2P systems to perform, at least, as well as more structured ones if the system has a uniform distribution of links. P2P and the small world phenomena This chapter focuses on decentralized peer-to-peer systems and applies some of the concepts of the small- world model to Freenet and Gnutella. Freenet: Queries are forwarded one peer to the next, till any peer owning a copy of a file can reply. In order to make Freenet a small world, the Freenet graph needs to be connected and short routes must exist between any two arbitrary peers; both properties are shown satisfied. It translates into good average performance and poor worst-case performance, good scalability and fault tolerance. Gnutella: Queries are conducted via broadcast – this does not invoke the small world effect. In order to increase Gnutella’s effieciency, a partly hierarchical model using super peers is proposed to reduce the effective size of the network. The effort required to do a search in an unstructured system grows logarithmically with respect to the search space. A Measurement Study of Peer-to-Peer File Sharing Systems The authors propose that the characteristics (bandwidth, latency, etc) of participating nodes in a P2P system need to be considered in system evaluations. These characteristics include bandwidth, latency, availability, degree of sharing, among others, at each node. They perform measurements on Napster and Gnutella and find that peers in these systems are significantly heterogeneous and peers don’t always tell the truth. They find that a significant proportion of peers are just ‘free-riders’. The authors suggest an attempt be made to directly measure the characteristics of peers to build more robust systems. The design constraints involve identifying the ‘free- riders’ and giving them less priority. Also generous participants with low bandwidths need to be identified somehow (because they might not admit they have problems) in order not to overload them and subsequently slow down the whole network. Nana B. Sam From kwalsh@CS.Cornell.EDU Tue Nov 12 10:32:00 2002 Received: from kwalsh4158.cs.cornell.edu (dhcp99-7.cs.cornell.edu [128.84.99.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACFW0Q27767 for ; Tue, 12 Nov 2002 10:32:00 -0500 (EST) Message-Id: <5.1.1.6.0.20021112103129.00a8a380@popsrv.cs.cornell.edu> X-Sender: kwalsh@popsrv.cs.cornell.edu X-Mailer: QUALCOMM Windows Eudora Version 5.1.1 Date: Tue, 12 Nov 2002 10:31:59 -0500 To: egs@CS.Cornell.EDU From: Kevin Walsh Subject: 615 PAPER 63 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Can unstructured P2P systems perform as well as, or better than, more structured P2P networks ? What are the design constraints of P2P systems (e.g. typical system lifetimes, bandwidths, etc.) ? Please submit a combined review under paper number (63). The small-world phenomenon: An algorithmic perspective Performance: P2P and the small world phenomena. A measurement study of peer-to-peer file sharing systems. The first of the three papers deals with a semi-structured network. The two essential features of the network are a high-diameter dense and structured lattice-type network, overlaid with random sparse long-range links. In this particular proposal, each node has small and equal degree. Using a reasonable definition of decentralized search algorithms, the paper examines the upper and lower bounds on path lengths that might be found by any search algorithm. It is found that if the long-range links are distributed uniformly at random (a common assumption), then the average path length is THETA(n^2/3), or exponential in the size of the network. However, we may choose instead a long-range link distribution according to d^-r where d is the distance between nodes, and r is a clustering coefficient (high r gives very clustered long-range links, low r gives very distant long-range links). In this case, it is found that for the optimal choice r = 2, the inverse square distribution, search can be efficiently performed in O((log n)^2) steps. The conclusion is it is possible to perform efficient decentralized search in composite structured/unstructured networks if one chooses the long-range links according to the right distribution. Compared to fully-structured networks, this has less efficient routing (compare to O(log n) for Chord/CAN/Pastry/etc), but has the advantage of very easy network construction. Each node maintains only a very small number of long range links, and these are randomized according to a easy-to-maintain distribution. There are some missing features of this type of structured/unstructured network, however. First, we know from the other papers that Gnutella-like networks tend to follow power-law degree distributions. A better search should be possible in such a network by preferentially moving towards nodes of high degree. The second paper, Performance, starts with a cute but perhaps meaningless anecdotal review of the small-world phenomena. A review of the well-known Watts and Strogatz paper on small world networks. The two interesting points here are that very few random long-range links are needed to turn a high-diameter network into a low-diameter network, and that the transition between these two situations is not noticeable at the local level. As was pointed out by Kleinberg, however, having a small world is not sufficient: we must be able to efficiently find routes in the small world, also. And here, again, the issue of degree distributions is ignored. Via simulation, the paper shows that Freenet can rewire a regular lattice into a small-world type network. The simulation also indicates that Freenet can find reasonably close-to-optimal paths in the network (median of 6 versus optimal 2 hops), but the distribution seems heavy-tailed, with some paths as long as 100 hops. The other interesting discussion in this paper is the distribution of degrees in Freenet. They find via simulation that the distribution looks maybe like it might be a power law distribution. The authors claim that Gnutella, on the other hand, does not have a power-law degree distribution, and is essentially a random network (low clustering coefficient). In Freenet, the only process to maintain clustering seems to be the specialization of nodes argument. It is not clear if this does a good job or not. Finally, the last paper presents interesting measurements of the nodes that participate in peer-to-peer networks. The authors were able to crawl between 8,000 and 10,000 Gnutella peers depending on the time of day, with a high turnover rate. It is surprising that so many stayed connected all night. Both Gnutella and Napster have a high percentage of low bandwidth nodes (especially upstream bandwidth). Gnutella users are less likely to be on a modem, perhaps an indication of Gnutella's overhead. Many peers are connected by very high latency links, bad news for poor neighbor selection strategies. Gnutella servers tended to be less available, even though the machines were reachable with normal TCP/IP. Overall, in both systems users appear to connect, remain on line for a short time, then disconnect. From jsy6@postoffice2.mail.cornell.edu Tue Nov 12 10:48:52 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACFmqQ02931 for ; Tue, 12 Nov 2002 10:48:52 -0500 (EST) Received: from Janet.cornell.edu (syr-24-58-41-193.twcny.rr.com [24.58.41.193]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id KAA23405 for ; Tue, 12 Nov 2002 10:48:28 -0500 (EST) Message-Id: <5.1.0.14.2.20021112104707.00b4a898@postoffice2.mail.cornell.edu> X-Sender: jsy6@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 12 Nov 2002 10:47:34 -0500 To: egs@CS.Cornell.EDU From: Janet Suzie Yoon Subject: 615 PAPER 63 Mime-Version: 1.0 X-Security: MIME headers sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.132 $Date: 2001-12-05 20:20:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="=====================_47831938==_.ALT" --=====================_47831938==_.ALT Content-Type: text/plain; charset="iso-8859-1"; format=flowed Content-Transfer-Encoding: quoted-printable =93The Small-World Phenomenon: An Algorithmic Perpective=94 In his paper, Kleinburg defines a generalization of the Watts-Strogate=20 model (a model that encompasses the small world phenomenon. Start with a=20 set of lattice points in an nxn square. Add a directed edge from a node u - to every other node within lattice distance k (short-range=20 contacts) and - to q other nodes v with a probability proportional to 1/(x^r) where= =20 x is the lattice distance from u to v (long-range contacts). When r =3D 0, then the long range contacts are uniformly distributed as in= =20 the Watt-Strogate model. As the value of r increases, the long-range=20 contacts become more clustered around node u and thus more structurally=20 organized and less useful. It is proven there is exactly one class within this infinite family of=20 models that displays property (2) - when r=3D2 and k=3Dq=3D1. For this= class of=20 models, the expected length of a path returned by a decentralized algorithm= =20 is at most c2(log n)2 where c2 is a constant dependent on the values of r,= =20 k, and q but independent of n. Let Aj be the set of nodes whose lattice=20 distance from node t is between 2j and 2j+1. The exponent r=3D2 is the only= =20 exponent at which all nodes other than node t are uniformly distributed=20 over the sets A0, A1, =85., Alog n. Suppose we want to construct a path= from=20 node s to node t. Let the current message holder be denoted as node u. An= =20 algorithm is in phase j if node u is in set Aj. The time it takes to break= =20 out of any given phase is bounded proportionally to log(n). There are at=20 most 1 + log(n) phases, hence the (log n)^2 bound. =93P2P and the Small World Phenomenon=94 This paper does a case study on Freenet and Gnutella. The small-world=20 effect is fundamental in Freenet=92s scalability, fault-tolerance, and=20 performance. The majority of nodes in a small network model have only a=20 few local connections to other nodes and only a few nodes are well=20 connected with large-range contacts. Freenet=92s scalability and= performance=20 is attributed to the shortcuts provides by these few well-connected=20 nodes. One characteristic of a small-world network is the existence of a=20 scale-free power-law distribution of links within the network. This=20 power-law distribution is the reason for Freenet=92s high level of=20 fault-tolerance. Random failure will most likely take out a=20 poorly-connected node since they are the majority of nodes in the=20 network. The effect of failure for a poorly-connected node is minimal in=20 the overall network routing. The performance of routing is noticeably=20 affected only with the failure of highly-connected nodes. Freenet handles= =20 free-loaders by ignoring them. Gnutella does not depend on the small-word= =20 effect and instead broadcasts its queries. Due to this characteristic,=20 Gnutella is not very scalable. In face of attacks, Gnutella performs worse= =20 than Freenet under random attacks but removing the most-connected nodes in= =20 Gnutella is not as severe as in Freenet. This is due to the fact that the= =20 nodes in Gnutella are considered roughly equivalent. Gnutella is=20 vulnerable to free-riding due to its lack of state about its=20 peers. Free-riding results in higher latency or even worse failed queries. =93A Measurement Study of Peer-to-Peer Systems=94 A measurement study is performed on Napster and Gnutella in the paper=20 titled where the focus is on characterizing the end-user host in this=20 system based on their latency, lifetime, bottleneck and bandwidth. The=20 above analysis is based on several crawls of both systems and snapshots of= =20 the users over a span of several days. Most of the current protocols of=20 P2P systems uniformly delegate responsibility to all nodes within the=20 system. It has been observed that roughly one third of the peers in=20 Gnutella are unsuitable to act as servers due to their bandwidth=20 capabilities. High latency is a problem in Gnutella since connections are= =20 formed in an unstructured manner, despite the face that latency amoung=20 communicating peers vary. Protocols also tend to assume the=20 cooperativeness of the peers and assume peers behave equally in terms of=20 downloads and uploads. A discrepancy in the upstream and downstream=20 bandwidth of Gnutella exists. The download capacity of the system is higher= =20 than its upload capacity. Napster has a better download capacity than that= =20 of Gnutella. Overall the majority of both systems are uncooperative and=20 free-riding is a major issue. The characteristics of both systems=20 represent that of a client-server model. In general, unstructured P2P systems do not behave as well as structured=20 P2P systems. As proven in Klienburg=92s paper, some structure is needed for= =20 more informed and thus more efficient routing. Freenet, though also an=20 unstructured system, is somewhat more =91organized=92 than that of Gnutella= =20 since it is depended on its topology being that of a small world. Its less= =20 unstructured topology results in better scalability and=20 performance. Design constraints are mentioned under =93A Measurement Study= =20 of Peer-to-Peer System=94. --=====================_47831938==_.ALT Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =93The Small-World Phenomenon: An Algorithmi= c Perpective=94
In his paper, Kleinburg defines a generalization of the Watts-Strogate model (a model that encompasses the small world phenomenon.  Start with a set of lattice points in an nxn square.  Add a directed edge from a node u
-       to every other node within lattice distance k (short-range contacts) and
-       <= font face=3D"Arial, Helvetica">to q other nodes v with a probability proportional to 1/(x^r) where x is the lattice distance from u to v (long-range contacts).
When r =3D 0, then the long range contacts are uniformly distributed as in the Watt-Strogate model.  As the value of r increases, the long-range contacts become more clustered around node u and thus more structurally organized and less useful. 
It is proven there is exactly one class within this infinite family of models that displays property (2) - when r=3D2 and k=3Dq=3D1.  For this class of models, the expected length of a path returned by a decentralized algorithm is at most c2(log n)2 where c2 is a constant dependent on the values of r, k, and q but independent of n.  Let Aj be the set of nodes whose lattice distance from node t is between 2j and 2j+1.  The exponent r=3D2 is the only exponent at which all nodes other than node t are uniformly distributed over the sets A0, A1, =85., Alog n.  Suppose we want to construct a path from node s to node t.  Let the current message holder be denoted as node u.  An algorithm is in phase j if node u is in set Aj.  The time it takes to break out of any given phase is bounded proportionally to log(n).  There are at most 1 + log(n) phases, hence the (log n)^2 bound. 

=93P2P and the Small World Phenomenon=94
This paper does a case study on Freenet and Gnutella.  The small-world effect is fundamental in Freenet=92s scalability, fault-tolerance, and performance.  The majority of nodes in a small network model have only a few local connections to other nodes and only a few nodes are well connected with large-range contacts.  Freenet=92s scalability and performance is attributed to the shortcuts provides by these few well-connected nodes.  One characteristic of a small-world network is the existence of a scale-free power-law distribution of links within the network.  This power-law distribution is the reason for Freenet=92s high level of fault-tolerance.  Random failure will most likely take out a poorly-connected node since they are the majority of nodes in the network.  The effect of failure for a poorly-connected node is minimal in the overall network routing.   The performance of routing is noticeably affected only with the failure of highly-connected nodes.  Freenet handles free-loaders by ignoring them.  Gnutella does not depend on the small-word effect and instead broadcasts its queries.  Due to this characteristic, Gnutella is not very scalable.  In face of attacks, Gnutella performs worse than Freenet under random attacks but removing the most-connected nodes in Gnutella is not as severe as in Freenet.  This is due to the fact that the nodes in Gnutella are considered roughly equivalent.  Gnutella is vulnerable to free-riding due to its lack of state about its peers.  Free-riding results in higher latency or even worse failed queries. 

=93A Measurement Study of Peer-to-Peer Systems=94
A measurement study is performed on Napster and Gnutella in the paper titled where the focus is on characterizing the end-user host in this system based on their latency, lifetime, bottleneck and bandwidth.  The above analysis is based on several crawls of both systems and snapshots of the users over a span of several days.  Most of the current protocols of P2P systems uniformly delegate responsibility to all nodes within the system.  It has been observed that roughly one third of the peers in Gnutella are unsuitable to act as servers due to their bandwidth capabilities.   High latency is a problem in Gnutella since connections are formed in an unstructured manner, despite the face that latency amoung communicating peers vary.  Protocols also tend to assume the cooperativeness of the peers and assume peers behave equally in terms of downloads and uploads.  A discrepancy in the upstream and downstream bandwidth of Gnutella exists. The download capacity of the system is higher than its upload capacity.  Napster has a better download capacity than that of Gnutella.  Overall the majority of both systems are uncooperative and free-riding is a major issue.  The characteristics of both systems represent that of a client-server model.  

In general, unstructured P2P systems do not behave as well as structured P2P systems.  As proven in Klienburg=92s paper, some structure is needed for more informed and thus more efficient routing.  Freenet, though also an unstructured system, is somewhat more =91organized=92 than that of Gnutella since it is depended on its topology being that of a small world.  Its less unstructured topology results in better scalability and performance.  Design constraints are mentioned under =93A Measurement Study of Peer-to-Peer System=94. 



--=====================_47831938==_.ALT-- From vrg3@cornell.edu Tue Nov 12 11:23:25 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACGNOQ13883 for ; Tue, 12 Nov 2002 11:23:24 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id LAA18180 for ; Tue, 12 Nov 2002 11:23:22 -0500 (EST) Date: Tue, 12 Nov 2002 11:23:22 -0500 (EST) From: vrg3@cornell.edu X-Sender: vrg3@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII The Kleinberg paper presents an explanation of the small world phenomenon, by which decentralized clustered networks (like those of personal human relationships) can contain short paths between nodes due to the presence of the appropriate quantities of long-distance links. It seems to suggest that an unstructured peer-to-peer network can perform as well as a structured one if the appropriate distribution of short- and long-distance links is achieved. The book chapter does case studies on the performance of Freenet and Gnutella. Freenet relies on its network having the small world property; there exists a small set of highly connected nodes, allowing for short routes between arbitrary pairs of peers. Gnutella networks do not exploit any kind of small world effect, since queries are broadcast. This hurts scalability and bandwidth usage. A suggestion is made to improve Gnutella by imposing some hierarchical structure to improve the average path length. The Saroiu paper compares Napster (a structured peer-to-peer network) and Gnutella (an unstructured peer-to-peer network) in terms of the characteristics of the individual nodes. The first big result of the study is that there is significant heterogeneity among peers in both networks, suggesting that a peer-to-peer system should make use of information about the differences between nodes (in terms of latencies, bandwidths, uptimes, and activity) to work more efficiently. This would be of particular importance in Freenet where the small set of highly connected nodes which serve to provide the short paths would ideally all have high-bandwidth network connections. The second big result is that there is a remarkable amount of selfishness present among the peers of both networks. Many users do not bother to accurately represent their advertised bandwidths, and very, very few users actually provide content for others to download, or stay online long enough to make it useful. The majority of users simply connect, download some files, and then disconnect. It would seem that it is possible for a decentralized network to perform as well as a structured one, but there are many conditions that would have to hold. The small-world property is necessary, and as Freenet has shown, it is possible to achieve. Peers would need to be able to measure the properties of other peers, without having to rely on voluntarily advertised information. Some type of scheme for ensuring that all nodes contribute to the global network would vastly improve performance. From mvp9@cornell.edu Tue Nov 12 11:37:26 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACGbQQ18622 for ; Tue, 12 Nov 2002 11:37:26 -0500 (EST) Received: from zoopark.cornell.edu (syr-24-58-46-186.twcny.rr.com [24.58.46.186]) by cornell.edu (8.9.3/8.9.3) with ESMTP id LAA17802 for ; Tue, 12 Nov 2002 11:37:26 -0500 (EST) Message-Id: <5.1.0.14.2.20021112113637.00ad4470@postoffice.mail.cornell.edu> X-Sender: mvp9@postoffice.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 12 Nov 2002 11:37:19 -0500 To: egs@CS.Cornell.EDU From: mike polyakov Subject: 615 PAPER 63 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed The papers presented today evaluate unstructured (p2p) networks from both a theoretical and experimental points of view. The Kleinberg explores the expected searchability of a family of networks such as those formed by people within the US. Most importantly he identifies a second parameter of a network, besides network diameter, that controls the distribution of links from a node wrt distance. At a critical threshold, where the ratio of long range links to local ones is just right, the network acquires the small-world property such that any two points can be connected by very few hops. When that parameter varies from its threshold of 2, there are either too many long links (so can't see what's around you) or not enough of them to make good progress. This provides a theoretical basis and explanation for any good performance of unstructured p2p networks. The chapter from the O'Reilly deals with gnutella and freenet and performs simulation to explore their similarity to a small-world model that Kleinberg described. They find that freenet does indeed form a small-world graph, where it remains highly clustered but its path length drops to log N. Further, they find that it scales at log N of network size, so that at 200,000 nodes the median path length is about 20. It is not terribly impressive, since 200K isn't that many and 20 hops (and often worse) can take considerable time. A somewhat surprising find is that the ratio of request pathlength to characteristic pathlength, that is routing performance, remains similar regardless of scale. Of gnutella, the O'Reilly chapter has a different story. Its graph is random (and thus with low clustering coefficient), and so the characteristic pathlength is short. Since queries are done breadth first, the short paths are fully exploited. Thus, the performance of searches can scale well arbitrarily high, but unfortunately the bandwidth consumed by the queries becomes hard limitation. Finally, the Saroiu et al paper makes observations about usage in the actual gnutella (and napster) networks. They find orders of magnitude difference between users in terms of bandwidth, amount shared, reliability, etc. and recommend that network protocols regulate and take advantage of these differences. Surprisingly, they find that gnutella DOES form a small-world, such that taking only 4% of the nodes out leaves it far less connected. So, how do these unstructured p2p networks compare with structured ones? Kleinberg shows that these networks can organize such that pathlengths exist that are as short as those in structured nets, and with appropriate routing, good performance can be achieved. The experiments show that freenet and gnutella possess the routing capability to take advantage of the short paths formed in the graph. However, they are far more brittle, than structured networks. In freenet, finding short paths depends on good decisions, which are occasionally lacking, destroying performance. Also, like in gnutella, while short paths are likely, they are not at all certain. In other words, there are no guarantees. Since both systems depend on having a small set of well connected nodes, they are also more susceptible to attacks and non-random failures. P2P systems have to deal with many constraints. Users may come onto the networks only for brief times (and rarely for long) creating a dynamic environment, where few paths remain stable; so information has to be updated constantly. A considerable portion of the users are free riders, causing contributing users to experience poor performance and possibly leading to a spiraling effect where the contributors leave, further degenerating system state. There is a great disparity between users in all respects bandwidth, reliability, amount shared, etc, that should be taken into account to increase performance. Finally, there are legal and reliability issues creating pressure for maximum decentralization. From hs247@cornell.edu Tue Nov 12 11:39:48 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACGdlQ19138 for ; Tue, 12 Nov 2002 11:39:47 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id LAA28041; Tue, 12 Nov 2002 11:35:52 -0500 (EST) Date: Tue, 12 Nov 2002 11:35:52 -0500 (EST) From: hs247@cornell.edu Message-Id: <200211121635.LAA28041@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: hs247@cornell.edu Reply-To: hs247@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: hs247@cornell.edu X-Originating-IP: 128.84.99.101 Subject: 615 Paper 63 The Small-World Phenomenon (applied to networks) was introduced in the first paper. The main contributions of this paper was to identify what the small-world phenomenon is and classify how some networks can fit into this category. It identifies that networks with nodes that have long range contact have a high probability of having a path that can be bounded by log n (ie. 6 degrees of separation). However, in peer-to- peer networks, we want these paths to be found with only local information. They prove that if long range contacts are randomly distributed, these paths are virtually impossible to find, however, if these contacts are not random, nodes will be able to find them only with local information (ie. with a decentralised algorithm). So in general, unstructured P2P systems that fit the “small-world Phenomenon” can perform just as well as structured networks if not better. This was shown in the second paper when the performance of Freenet was analysed. Through simulation, if they just randomly picked a link, a file took up to 50 hops to be found, however, with Freenet with its local information, the average could be found with an average number of hops of 5 hops or fewer. Gnutella was also analysed. The model used for Gnutella simulation was random model (which is not supported by the last paper which state that connections to server like nodes are preferred), a non-small-world model. Because of the randomness in Gnutella, this paper states that Gnutella is less susceptible to attacks on highly connected nodes (again contradictory to the last paper). Tradeoffs in performance were identified in this paper. Randomness of Gnutella trades efficiency for better worst case scenarios, where as Freenet trades worse-case performance for scalability and search efficiency. In the last paper, the end-users of p2p systems (Napster and Gnutella) where analysed. Up to this point, no evaluation has taken into account user characteristics. The paper identified that these systems are indeed heterogeneous. Many users are free- loaders and misreport their connections. With regards to lifetime, most nodes are not up more than 60 minutes per session. The upshot of this paper is that future designs should take into account the different characteristics including many nodes having low bandwidth, high latency, minimal participation and misreported information. In particular, there should be incentive for users to co-operate. From adam@graphics.cornell.edu Tue Nov 12 12:01:14 2002 Received: from bach.graphics.cornell.edu (bach.graphics.cornell.edu [128.84.247.50]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACH1DQ25919 for ; Tue, 12 Nov 2002 12:01:13 -0500 (EST) Received: from envy.graphics.cornell.edu (envy.graphics.cornell.edu [128.84.247.206]) by bach.graphics.cornell.edu (8.12.1/8.12.1) with ESMTP id gACH170k067949 for ; Tue, 12 Nov 2002 12:01:07 -0500 (EST) Date: Tue, 12 Nov 2002 12:00:35 -0500 (EST) From: Adam Kravetz To: egs@CS.Cornell.EDU Subject: 615 Paper 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII The Small-World Phenomenon: 6-degrees of separation, a common party game, and social theory applied to computer networks and connectivity. This paper gives us three theorems that predict expected delivery time, lower bound and upper bounds all based on the 6-degree ideas. Performance: This paper takes a look at Freenet and Gnutella as case studies. This paper points out that performance matters in P2P systems, Moore's law doesn't apply to bandwidth so waiting for faster modems and connections isn't going to be fruitful. Maximizing efficiency on a network is the best way to improve transport, scalability and other issues associated w/ P2P. In the case Study of Freenet it is found that pathlength scales logarithmically w/ the size of the network so in theory it scales well. Gnutella on the other hand scales not w/ path length but by the number of messages sent giving it a poor scalability due to the fact that as networks get bigger more and more messages are propogated per search. Gnutella scales linearly (bad) but a suggested hierarchical model could help. Measurement of P2P: This paper takes a look at Napster and Gnutella (structured centralized networks vs. unstructured decenteralized). They conclude from their studies that they would like to see un-shared responsibilities across a network. I think this suggestion is good, but should be used as a parameter of an adaptive system. Nodes are allowed to grow or fall in importance depending on resources available at any given time. They also make the important observation that people are willing to participate in a P2P system and this is not necessarily true. Generally people want to participate only if they feel it is to there advantage. Unstructured P2P systems can work, they however need highly adaptable, smarter and better constructed algorithms to control them. The design of an unstructured P2P needs to be topologically designed better. I think exploiting the small-world network discovery and trying to distributedly build topologies that use this information is important. Moving away from a metric where a "hop" is a link from node A to node B in a P2P system instead of the 10 router hops that it actually is, is a good first step. Using a set of close neighbors (topologically on the internet) may help keep network load down, and allow for greater scalability. This however falls pray to the problem that information clustering will happen (your local subset of nodes to ask has roughly similar data) and that a random outside link metric (which is what gives 6-degree type theories their successes) would need to be introduced. To obtain a good, scalable, reliable P2P network a good first idea would be to try and build extremely lightweight, highly adaptable link systems which attempted to have high path optimality. A second part of a P2P system should try to adapt by the availability of resources in the network, promoting links to higher status based on their ability to carry load (bandwidth or processing power or battery life). Finally a third part of a P2P system would focus on fault tolerance. This sort of hybrid, multi-pronged, layered approach to P2P systems would hopefully provide a more globally scalable system. From pj39@cornell.edu Tue Nov 12 12:18:13 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACHIDQ01184 for ; Tue, 12 Nov 2002 12:18:13 -0500 (EST) Received: from gourry.cornell.edu (pit089.cs.cornell.edu [128.84.223.189]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA00928 for ; Tue, 12 Nov 2002 12:17:57 -0500 (EST) Message-Id: <5.1.0.14.2.20021112120846.01a8ee80@postoffice2.mail.cornell.edu> X-Sender: pj39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 12 Nov 2002 12:17:56 -0500 To: egs@CS.Cornell.EDU From: Piyoosh Jalan Subject: 615 PAPER 63 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed The small-world phenomenon: An algorithmic perspective P2P and the small world phenomena A Measurement Study of Peer-to-Peer File Sharing Systems This time we are asked to read the above three papers and give comments for the below two questions. Can unstructured P2P systems perform as well as, or better than, more structured P2P networks ? The Small-World Phenomenon paper by Kleinberg discusses the small world phenomenon. He presents it with an algorithmic approach and proves all the theorems proposed in the paper algorithmically. Small world phenomenon means that any two nodes in a network is likely to be connected through a short sequence of intermediate acquaintances. According to the survey by Prof. Milgram of Harvard it took a median of 5.5 intermediaries to connect two arbitrary nodes(people in this case). Kleinbergs paper states that a class of networks exhibit the small world phenomenon if the network is divided into both local and long range contacts and not just one extreme. The local contacts are the nodes nearest neighbors and long range contacts are formed with edges having uniform random end points. Thus to summarize this paper states that a random unstructured decentralized network which resembles an unstructured P2P system can find short paths if it has both local and long range contacts. The second paper which is a chapter from the Oreilly P2P book supports this fact by presenting detailed simulation results for Freenet and Gnutella. Freenet and Gnutella both being decentralized P2P systems (ie unstructured) portrays small world phenomenon among nodes. Thus unstructured P2P systems do have the ability to perform as well as more structured systems. What are the design constraints of P2P systems (e.g. typical system lifetimes, bandwidths, etc.) ? There are a number of design constraints of P2P systems that need to be considered. The important ones being - Bandwidth of hosts: This is the speed of the connection of the peer to the network. This information can either be asked when the user registers to the network ie the type of connection they are using (Cable, LAN, MODEM etc) or could be calculated. - System lifetimes of a host: The duration that peer choose to remain connected to the infrastructure has implication for the degree of redundancy necessary to keep data or index meta data highly available. Thus to delegate a task to a peer the system must know if its suitable to do so based on its availability, latency, bandwidth and other factors. A peer can be in three possible state during its entire lifetime ie offline, inactive and active. - Latency: Round trip latency between peers. The nodes acting as a server should have the characteristics of High-Bandwidth and low latency. Moreover to fit a high bandwidth server a node must have a high upstream bottleneck link bandwidth. - Availability: Servers apart from high bandwidth and low latency should also have high availability. This is important to decide upon the degree of replication necessary. - Free Riding: The system has to consider free riders who does not share any files themselves thus causing a bottleneck at nodes that share files. There could be an incentive to share files such as nodes that share files have preference to download files from a node with limited bandwidth and allowed number of users thus creating contention among nodes. Taking care of free riding also helps in load balancing. - Fault Tolerance: The ability of the system to perform under random failures of nodes and in the face of attacks. Some other design constraints that need to be considered are - Population of end user hosts that participate in the system - Degree of cooperation between hosts - Degree of sharing - Scalability: One of the design constraints that need to be addressed is scalability if the system is to be scaled to a global level. From ag75@cornell.edu Tue Nov 12 12:29:35 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACHTYQ03920 for ; Tue, 12 Nov 2002 12:29:35 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id MAA09881 for ; Tue, 12 Nov 2002 12:29:32 -0500 (EST) Date: Tue, 12 Nov 2002 12:29:31 -0500 (EST) From: ag75@cornell.edu X-Sender: ag75@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII The three papers that we read this week deal with performance of P2P systems. The first one examines the "small-world phenomenon" - the principle that we are all linked by short chains of acquaintances, and its effect on the design of P2P systems. In this paper the authors prove that no decentralized algorithm, operating with local information only, can construct short paths in the family of recently proposed network models that are rich in short paths with non-negligible probability. They then show that there is a unique model within the family for which decentralized algorithms are effective. This results in the conclusion that in addition to having short paths, a network should contain latent structural cues that can be used to guide a message towards a target, which means that minimizing the transmission rate of a network is not necessarily the same as minimizing its diameter. The second paper compares the performance of Freenet and Gnutella. The two systems are compared along the lines of performance, fault taulerance and scalability. While both Frenet and Gnutella are decentralized systems, Freenet relies on the "small-world phenomenon", while Gnutella doesn't. In terms of performance, Freenet has good average but bad worst case performance. Gnutella makes a tradeoff of a much greater search effort for optimal paths and better worst case performance. Since highly connected nodes are much less a factor in Gnutella than in Freenet, Gnutella is better at handling attacks but worse at handling failures. Freenet scales better than Gnutella, scaling logarithmically compared with linearly for Gnutella. The third paper compares the performance of Gnutella and Napster. This paper shows that bandwidth, latency, availability, and the degree of sharing vary between three and five orders of magnitude across the peers in both Gnutella and Napster. It also shows that peers tend to deliberately misreport information if there is an incentive to do so and that the download capacities of both systems exceed their upload capacities. Finally it points out the fact that Gnutella has a large portion of freeriders. This means that a large percentage of clients rely on a small percentage of servers, which in turn means that the system is highly vulnerable to targeted attacks. It seems that it should be possible to design an unstructured P2P system that performs as well as a structured one. However, when designing such a system several things must be kept in mind: the "small-world phenomenon" is your friend - use it; different peers have significant differences in their abilities and the system should take that into account; freeloaders are a problem because they not only take up system resources but can also make the system vulnerable to failures and attacks, thus there have to be strong incentives for people not to freeload. From xz56@cornell.edu Tue Nov 12 12:37:22 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACHbMQ05957 for ; Tue, 12 Nov 2002 12:37:22 -0500 (EST) Received: from Xin (dhcp-ece-171.ece.cornell.edu [132.236.232.171]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id MAA26600 for ; Tue, 12 Nov 2002 12:37:21 -0500 (EST) Message-ID: <003501c28a72$1fec2630$abe8ec84@Xin> From: "Xin Zhang" To: "Emin Gun Sirer" Subject: 615 PAPER 63 Date: Tue, 12 Nov 2002 12:37:07 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 The small-world phenomenon itself says that people in the world are all linked by short chains of acquaintances. Imported to networks, it lays the basis structure for peer-to-peer system on network level and assures the connectivity of nodes in p2p system. The model of Watts and Strogatz, in fact, has already been used in Chord routing scheme, in which the structure parameter r (introduced in the paper as measuring of "networked", 0--uniform distribution over long-distance contacts, 2--uniform distribution over "distance scales" in a 2-D lattice) equals to 0. This paper shows that in the present models, the decentralized algorithm for constructing shortest paths is impossible. The decentralized algorithms are only effective in the model with r at its optimum value of 2, namely the long-distance contacts distribute uniformly over distance. The "performance" chapter focuses on the aspects of querying performance (in terms of time and bandwidth consumed), fault tolerance and scaling with case study on Freenet and Gnutella. For Freenet to work, two conditions must hold: connectivity and being a small-world, because the only difference between Freenet and Milgram's experiment on small world is that the target in Freenet can be one of many peers with desired file and if Freenet is a small world, the short paths will be assured. The Web is a small world with pathlength of 19 is shown in another paper. The authors use the similar simulation show that the Freenet can be also a small world network. Fault tolerance is also measured and it shows that nodes failing of less than 30% will not cause serious degrade in it. Also Freenet is effective in combat free riders by maintaining references in each node. Similar as small world model, Freenet scales logarithmically with the size of the network. Gnutella however, uses a simple broadcast model and doesn't invoke small-world effect. It offers faster search and better worst-case guarantees, but it is vulnerable to free-riders and scales linearly which is not as good as Freene t. The measurement study raised the issue of significant heterogeneity and lack of cooperation across peers in current peer-to-peer systems (Gnutella and Napster in this paper) by showing the percentage of host and that upstream and downstream bandwidth, the latencies, the number of shared, downloaded and uploaded files can be differ greatly among peers. So when designing future robust p2p protocols, special care should be taken to the assumption of homogeneous nodes and willingness to cooperate over all peers. From ashieh@CS.Cornell.EDU Tue Nov 12 12:51:11 2002 Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACHpBQ09809 for ; Tue, 12 Nov 2002 12:51:11 -0500 (EST) Received: from localhost (ashieh@localhost) by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id gACHpBU22794 for ; Tue, 12 Nov 2002 12:51:11 -0500 (EST) Date: Tue, 12 Nov 2002 12:51:11 -0500 (EST) From: Alan Shieh To: Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Can unstructured P2P systems perform as well as, or better than, more structured P2P networks? An unstructured P2P system can perform as well as a structured one, as the network construction mechanism can be induced to form occasional long-range links such that the network diameter becomes small. However, even with the presence of such structure in the network topology, a local routing algorithm may not in general be able to find the short paths that these long-range links induce. Freenet's DFS mechanism, which performs the same behavior as the simple algorithm described in Jon's paper, plus backtracking, appears capable of finding such routes in the average case after the network has "converged". Hence, Freenet appears to somehow induce the necessary structure for efficient routing; however, it provides no guarantees about worst-case routing performance, and its topology is heavily dependent on query pattern and frequency. Structured P2P systems do not suffer from these shortcomings so long as the network has stabilized. However, maintaining the necessary invariants imposes organizational overhead when nodes join and die. What are the design constraints of P2P systems (e.g. typical system lifetimes, bandwidths, etc.)? The design constraints of P2P systems depend on the target application. For instance, Tapestry targets a utility infrastructure model, and hence a more uniform distribution of node capability that is biased towards more powerful nodes and reliable. Moreover, these nodes are assumed to be in both the center and "edge" of the Internet, for otherwise it would not make sense for them to have evaluated the performance of TCP on top of such a Tapestry routing fabric. However, the majority of P2P applications is targeted for end-users. These users, who are typically on the edge of the network and constrained by an asymmetric, possibly slow last-mile connection, for social reasons such as demand for "music-sharing", social conscience, are the most likely to be early adopters of P2P filesharing, and form the vast majority of the online community. Even with widespread adoption in the corporate world or infrastructure companies, end-users will be the rule rather than the exception. Thus, P2P file systems must explicitly take into account the vast disparities of capabilities within the system, as an ad hoc approach is unlikely to map functionality onto the appropriate nodes. The important constraints are system lifetime, bandwidth, and trustworthiness. For reasons of reliability, a P2P system should map index state and critical high degree routing tables or hotspots onto systems well-endowed in the above areas, for otherwise underpowered systems would choke on high query and routing traffic, and freeloading nodes can potentially drop queries and route requests. The Saroiu paper suggests mechanisms for ensuring these qualities in a network, see below for description. ** The small-world phenomenon: An algorithmic perspective This paper discusses theoretical bounds on the performance of local routing algorithms in a class of random graph models. In many classes of random graphs, the expected network diameter is small, and hence there exist short routing paths. However, it is difficult for a routing algorithm to find such paths given only local routing information. The topology in this paper is a 2-D lattice with deterministic local connectivity and non-deterministic global connectivity, where the long-range links obey a power distribution (parameter r) on the distance of the hop. It is shown that a routing algorithm which is aware of the location of the target on the lattice, the deterministic local structure, and the routing tables of all nodes which have touched a particular message, finds routes with short (logarithmic) expected hop counts if and only if r=2, and indeed such a routing algorithm will yield this bound for any topology in which which links are equally likely to occur in any level (range 2^j-2^j+1). The intuition is that the r=2 distribution allows the local routing algorithm to determine a good route given geographic knowledge of the network. ** P2P and the small world phenomena This work's main contribution is an analysis of the differing failure modes in Gnutella and Freenet. Gnutella clients tend to form a small number of links, whose node distribution is exponential in the out-degree. Freenet tends to form networks whose node distributions follow a powerlaw in the out-degree. Thus, Freenet is considerably more resilient against random failure than Gnutella. However, Gnutella is considerably more resilient against directed attack against high degree nodes, the intuition being that the exponential distribution tends to make every node about equally useful, and hence a directed attack does not differ much from random failures. The author also notes that a freeriding node in Freenet does not tend to pollute query operations, as routing would never be established through such a node. However, Gnutella does establish links to nodes that only issue queries, and hence such non-sharing nodes may waste slots and decrease the number of useful nodes covered by the TTL horizon. ** A Measurement Study of Peer-to-Peer File Sharing Systems This paper evaluates the distribution of node capabilities and trustworthiness in the Gnutella and Napster networks. It finds that, as expected, most nodes are indeed operated by users on the edge of the Internet, and hence connected to the Internet via slow or asymmetric links. Moreover, nodes are very likely to lie about their available bandwidth, and the vast majority do not share files, as providing this information in both of these systems strictly worsens one's performance, and users are likely to log off the network once the desired files are found. Hence, P2P systems must either independently verify the bandwidth of each node in the system, or provide incentives for honesty. Moreover, the clients should be designed such that users are encouraged to remain online, and to share files. Without such mechanisms, the social (irate ISPs) and performance consequences favor freeloading attitudes, and mainly the altruism of a small part of the online community keeps these systems useful. From sc329@cornell.edu Tue Nov 12 13:01:59 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACI1xQ12858 for ; Tue, 12 Nov 2002 13:01:59 -0500 (EST) Received: from sangeeth.cornell.edu (syr-24-58-36-135.twcny.rr.com [24.58.36.135]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA07618 for ; Tue, 12 Nov 2002 13:01:54 -0500 (EST) Message-Id: <5.1.0.14.2.20021112125455.033eda00@postoffice2.mail.cornell.edu> X-Sender: sc329@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 12 Nov 2002 13:01:54 -0500 To: egs@CS.Cornell.EDU From: Sangeeth Chandrakumar Subject: 615 PAPER 63 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed submitted by sangeeth Chandrakumar The paper on the small-phenomenon presents an algorithmic perspective to the principle that we are all linked by short chains of acquaintances. Over many trials, it was shown that the average number of intermediate steps in a successful chain from any arbitrary source to any destination lies between five and size, a quantity which is popularly referred to as the "six degrees of separation". In the paper, this theory has been associated with decentralized networks. It has been proven that form random graph theory, with high probability there exists between every pair of nodes whose lengths are bounded by a polynomial in log N. The main contribution of the paper is that, there is a correlation between local structure and long-range connections provides fundamental cures for finding paths through the network. When the correlation is near a critical threshold, the structure allows individuals to guide a message efficiently towards a target. As the correlation drops below a value, the model still have short chains in between, but the individuals are unable to find them. The performance paper tries to quantity the p2p system sin terms of bandwidth, latency, fault tolerance capability and scaling abilities. Computer networks bears a strong resemblance to social networks and can be represented by graphs in a similar way. This paper explains the general graph theory and correlates the performance of freenet and gnutella to a small-world model. Two important characteristics of a graph is its pathlength and clustering coefficient. In a uniform graph mode, the pathlength is order of N and the clustering coefficient is also high as most of the nodes connected by its neighbors are also connected to the given node. In a random graph, however the pathlengths becomes proportional to the logarithmic value and the clustering coefficient is also shown to have a low value. Freenet's routing is depth oriented, that is it is depends on the neighbor's routing tables to send a query to its destination. Simulations indicae that freenet networks do evolve small world characteristics. From a uniform model. by randomly adding a few links, the freenet network is shown to show characteristics of random graph. Freenet has good average performance but poor worst case performance. The results holds good, when the number of nodes increases. The network is shown to have good fault tolerant capability till 30 % of all nodes fail. But a random characteristic, in which a few nodes remain highly connected, makes the network succeptible to attacks on nodes. Gnutella queries perform a breadth-first search on the network graph. So it is neccessary that the network graph is connected for the request to eventually reach some peer having the desired data. A random model gnutella network is also shown to have to small world characteristics. Moreover gnutella is shown to respond equally to failure and attack. But gnutella scales linearly with increasing number of nodes. Modifying a pure decentralized network to a partly heirarchical structure is shown to exhibit better properties in terms of both scalability and with respect to path lengths. The third paper performs an evaluation of gnutella and napster network for characteristics such as bandwidth bottlenecks, IP-level latency, node availability, degree of co-operation. The measurements were done done using a network crawler for a few days time on each set of peers. They came up with the following observations: 1. a peer tends to have higher downstream than upstream bandwidth. Gnutella users tend to have higher downstream bottleneck than napster users, probably due to the fact that more gnutella users are tech savvy having better links to internet. 2. In a p2p system, where connections are forged in an ad-hoc way, a substantial fraction of connections will suffer form high latency. 3. The median availability of users was found to be 60 minutes. 4. Substantial number of "free-riders" exists in both the networks. 5. Although highly resilient in the face of random breakdowns, gnutella is vulnerable in the face of well orchestrated, targeted attacks. So well designed p2p system should take the following into consideration: - p2p systems should delegate different degrees of responsibility to different hosts, based on hosts physical characteristics and the degree of trust. - a robust system should attempt to directly measure the physical characteristics of peers in the system. - a robust system should account for the hosts heterogenity, relying on self-inspection and adaptation to exploit the differences in the hosts characteristics. From liuhz@CS.Cornell.EDU Tue Nov 12 14:06:24 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACJ6OQ03364 for ; Tue, 12 Nov 2002 14:06:24 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 63 Date: Tue, 12 Nov 2002 14:06:24 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302CEE6F3@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 63 Thread-Index: AcKKfpgsu1XIm5ApTl+K1+AWkVn1CA== From: "Hongzhou Liu" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id gACJ6OQ03364 Three papers are presented today. Jon Kleinberg's paper introduces the small world phenomenon and summarizes some previous work in the area. While previous work focuses on the question: why should there exist short chains of acquaintances linking together arbitrary pairs of strangers, this paper is intent to answer another question: why should arbitrary pairs of strangers be able to find such short chains. The paper defines an infinite family of random network modles that simulates a small world and proves that in most cases, decentralized algorithms are unable to find such short chains effectively and that only in a unique modle within this family, decentralized algorithms are effective and are able to find such short chains in O(log n) time(where n is the number of nodes in the network), which may be the best routing performance we can expect in a p2p system. Thus unstructured p2p systems, if designed carefully, can perform at least as well as structured p2p system with respect to routing performace. The O'Reilly P2P book also investigates the small world model and compares two p2p systems, Freenet and Gnutella, which can be considers two instances of the small world model. As expected, both Freenet and Gnutella are able to deliver messages any pair of nodes in the systems using a short path. However, there are still some differences between these two systems. The lookup path length in Gnutella is shorter than that in Freenet, because Gnutella uses BFS search. Gnutella, however, has a much higher overhead than Freenet, because is uses broadcast to flood queries and queries are forwarded by nodes if they can meet the queries in their local store. Broadcasting property also limits Gnutella's scalability because broadcast in a large network requires high bandwidth. The paper suggests to modify Gnutella to a hierarchical model to improve its scalability which will harm Gnutella's autonomy in some degree of course. Thus here we can see another tradeoff between structured and unstructureed p2p systems. Structured systems can guarrantee that lookup scales logarithmically, however, it implies a considerable loss of autonomy promised by peer-to-peer.Regarding scalability, unstructured p2p systmes can perform as well as structured systems and they can outperform structured systems in self-organization. This paper also compares Freenet and Gnutella's performance with respect to fault tolerance. The paper states that Freenet behaves better under dandom failure, but Gnutella can better cope with targeted attacks. However, the paper make an unreasonable assumption here. That is the attacker is able to locate nodes with high links in Freenet easily and thus start attacks on them. The localiztion of these nodes, however I think, should be very difficult given Freenet's anonymity schems. Such targeted attacks requires some efforts in Freenet. The paper from UW evalutes two popular peer-to-peer file sharing system in various aspects. The measurement in this paper shows there is siginificant hereroneneity and lack of cooperation across peers prticipating in these systems, which, however, is noticed by few currect(relative to the paper) peer-to-peer architectures.The paper also points some general guides that each p2p systems designer should pay attention to: -The set of nodes in p2p systems is heterogeneous with respect to many characteristics :bandwidth, latency, lifetime, shared data and etc. P2P systems should delegate different degrees of responsibility to different nodes, based on the hosts' physical characteristics and the degree of trust or reliability. -Lots of peers in the P2P systems have a quite short lifetime, typically less than one hour. P2P system should be self-adaptive to the changes in the system. -A large fraction of peers in the system can be dishonest. They may, for example, mis -report they bandwidth for their own benefits. Thus P2P systems should have the ability to measure these data itself. -Lots of nodes in the P2P systems are free riders. P2P systems should have some schemes that can exclude these free riders and encourage nodes to provide their own resources to the system. From ks238@cornell.edu Tue Nov 12 14:38:26 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACJcPQ14154 for ; Tue, 12 Nov 2002 14:38:26 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id OAA09836; Tue, 12 Nov 2002 14:38:15 -0500 (EST) Date: Tue, 12 Nov 2002 14:38:15 -0500 (EST) From: ks238@cornell.edu Message-Id: <200211121938.OAA09836@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: ks238@cornell.edu Reply-To: ks238@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: ks238@cornell.edu X-Originating-IP: 128.84.99.155 Subject: 615 Paper 63 The focus of these three papers is on decentralized P2P systems and searching through them. We first start with Kleinberg’s study of the “Small World Phenomenon” which is an algorithmic approach to routing in a decentralized P2P system. This notion, which was founded my Milgram after conducting a series of societal experiments led him to conclude that the average distance (measured in the number of intermediate people between a starting person and a target node) is approximately 5.5. Kleinberg uses this notion and applies it to the study of decentralized networks and their inherent structure. He argues that while searching in a network that exhibits small world properties one must look for that node which exhibits the largest jump from one cluster of nodes to another. In other words, the person who almost serves as a “gateway” to an alternate cluster of people in an area that is close to the target. When such nodes can be contacted, it is possible to have (log n)^2 nodes in each route, where n is the total number of nodes in the network. This is a very efficient runtime that is achieved and proved in the paper. In the next paper we see an evaluation between Gnutella and Freenet. In Freenet we see a system which assumes a connected network, whereby, traversing certain nodes in the graph can and will eventually allow us to reach the target node. Also, it assumes that short routes exist between two peers so that they can communicate with a relative small number of hops. Because Freenet exhibits the small world properties discussed by Kleinberg, the number of hops in routes is about the same. The second case that is studied is that of Gnutella, which doesn’t assume a small world. The key point of Gnutella is that it employs a broadcasting scheme which “saturates” the network with requests in an effort to find the shortest route. Each node tries to have three more simultaneous links that it broadcasts to and through iterative broadcasting at each hop, the target is eventually found. Gnutella is good in finding optimal paths however is weak since it costs so much to reach the results it does. The final paper is a measurement study of the Napster and Gnutella file sharing systems and the characteristics which reduce the performance of them. In the paper, the two systems were crawled with unique crawlers for both systems. Then based on these crawls, latency measurements, lifetime measurements and bottleneck bandwidth measurements were recorded. From these measurements a couple of contributions were made. First, they concluded that more people were serving as clients rather than servers. In other words, there was a larger number of peers that were downloading files rather than sharing files. In addition, another problem was the level of bandwidth that certain nodes exhibited that could potentially bottleneck a file transfer. To conclude, the primary contribution of this paper was to suggest the relevance of individual node properties in the actual performance of a P2P system. These papers touch on some very interesting and hot areas of current research. Many of the questions that should be addressed about such systems is security. P2P systems have an immense number of possible security breaches since your network’s performance is contingent on the reliability of other nodes. Also, an interesting topic that could also be discussed is using certain caching schemes that based on frequency of requests could mitigate the costs associated with long distant queries that are inherently expensive. From vivi@CS.Cornell.EDU Tue Nov 12 14:46:19 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gACJkJQ16270 for ; Tue, 12 Nov 2002 14:46:19 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615paper63 Date: Tue, 12 Nov 2002 14:46:18 -0500 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D11AFA4@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615paper63 Thread-Index: AcKKhCtgUj/rKuQjSr+vmV8jujzcYA== From: "Vivek Vishnumurthy" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id gACJkJQ16270 The first paper among today's set (Kleinberg) presents the small-world model, and how one can build a network that matches the small-world model (the Watts-Strogatz Model), in that the distance between any two nodes in the network, on the average is small (logarithmic in the number of nodes). The paper also gives the necessary and sufficient condition for the distance (on average) as found by a purely distributed algorithm in practice to be small. The second paper (Theodore Hong) shows how a regular graph (with high clustering) and high path lengths can be transformed with very few changes, into a graph that fits the small-world model, and also retains the high clustering. It then shows that performance of P2P systems Freenet and Gnutella match the small-world model. It evaluates both in terms of average path-length during normal operation, failure (random) and attacks (malicious). Both Freenet and Gnutella perform well under normal operation (good hop counts). Freenet is robust under random failures, because of the widely varying number of links of different nodes. But its performance degrades rapidly for this very reason, under targeted attacks, where the most highly connected nodes are removed from the system. This paper finds that Gnutella resists both random failures and targeted attacks reasonably well, but it pays in terms of the total number of messages sent per query. (That Gnutella resists targeted attacks is contradicted by the next paper) The third paper (Saroiu...) performs a measurement study of Napster and Gnutella. It finds that the two systems are widely heterogeneous, and conventional P2P systems that assume uniform capacities of all nodes and delegate responsibilities uniformly have to be adapted to such systems. It also finds that a substantial percentage of the peers misreport their capacities (since there is an incentive to lie). Thus it concludes that the system should be built in such a way that (i) capacities (bandwidths, etc.) are measured directly, rather than having to rely on a peer to report its capacity itself, OR (ii) create an incentive to be truthful, by rewarding peers that contribute to the functioning of the network. It finds that Gnutella is highly susceptible to targeted attacks. These papers suggest that unstructured P2P systems could perform well under normal modes of operation, but face one common problem: drastic performance degradation under targeted attacks. A more structured P2P system (eg: a Centralized system) could resist such an attack by preferentially protecting more important nodes against attacks.