From asr32@cornell.edu Tue Feb 21 01:15:52 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,MAILTO_TO_SPAM_ADDR autolearn=no version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1L6Fqt25459 for ; Tue, 21 Feb 2006 01:15:52 -0500 (EST) Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1L6Fp9B001618 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Tue, 21 Feb 2006 01:15:51 -0500 (EST) Message-Id: <6.2.1.2.2.20060219150526.02e7c178@postoffice8.mail.cornell.edu> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Tue, 21 Feb 2006 00:42:31 -0500 To: egs+summary@cs.cornell.edu From: Ari Rabkin Subject: PAPER 9 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sybil: A Sybil attack is when a given principal acquires many different IDs, in order to take control of a disproportionate fraction of the network, to do damage. A Sybil attack, properly defined, is a building block of some other attack. In order to prove strong security properties, it would be very nice to be able to show that distinct machines in a P2P network are owned by different principals. Alas, the central result of this paper is that there are no good strategies for limiting Sybil attacks. In particular, techniques for directly proving that two hosts are distinct via network have an error margin proportional to the difference in resources between a "powerful" host and a "weak" one. Different hosts must be tested simultaneously, rendering distributed challenges difficult. Vouching schemes are also problematic. The Sybil paper unfortunately does a fairly shallow analysis. It implicitly assumes that all authentication must be done across the network, through messages. It assumes that every malicious node has a valid identity. It implicitly assumes that identities are either accepted or not, with no middle ground. Eclipse: In an eclipse attack, the attacker attempts to inject faulty nodes into every entry in the target host's routing table, with an eye to preventing proper overlay operations. In the end, the target host may have all its network operations mediated through hostile nodes. To make this attack harder, the authors propose to limit the degree of nodes in the overlay network. They propose a protocol for this, in which nodes can anonymously query their neighbors to verify that their neighbor's table includes them, and has a legal degree. If not, then that neighbor is dropped. This defense is only partial. The anonymous query protocol proposed has a number of weaknesses. If an attacker controls a node and its entire anonymity set, the protocol fails. An attacker can also cause innocent nodes to be marked as suspicious by falsifying their responses. And even if the protocol works properly, an attacker's fraction of routing table entries can still be substantially higher than their fraction of overall nodes. Secure routing on structured overlays: This paper proposes a general scheme for reliable delivery of messages on peer-to-peer networks, in the presence of malicious nodes. Messages are sent redundantly, to each of several root nodes for a given object. The authors propose to use two routing tables, one optimized for speed, the other constrained deterministically to limit an attacker's flexibility. This does allow something otherwise impossible--very reliable behavior, with a substantial fraction of the network compromised. The protocol fails when more than around 25% of nodes are compromised. When it does work, the proposed protocol is quite expensive, with substantial time and message cost; an attacker can easily force it every time, which would significantly degrade the network. Another weakness is that the paper does not offer suggestions for cheaper protocols when the threat level is less severe. Ari Rabkin asr32@cornell.edu Risley Hall 454 3-2842 The resources of civilization are not yet exhausted. --William Gladstone From asg46@cornell.edu Tue Feb 21 10:47:56 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1LFlut04362 for ; Tue, 21 Feb 2006 10:47:56 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1LFlsbW010088 for ; Tue, 21 Feb 2006 10:47:54 -0500 (EST) Received: from 128.84.98.90 by webmail.cornell.edu with HTTP; Tue, 21 Feb 2006 10:47:55 -0500 (EST) Message-ID: <2818.128.84.98.90.1140536875.squirrel@webmail.cornell.edu> Date: Tue, 21 Feb 2006 10:47:55 -0500 (EST) Subject: paper 9 - security From: "Abhishek Santosh Gupta" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal SECURITY SYBIL ATTACK exploiting redundancy in a system requires the ability to determine whether two entities which appear to be different are actually distinct entities. Forging of multiple entities is termed as Sybil attack. the tempting idea of having a chain of identities vouch for other identities fails if the initial bootstrap is compromised. DIRECT VALIDATION The only direct means by which two entities can convince a third entity that they are distinct by performing some task that a single entity cannot. However, the complexity of this task is limited by the weakest valid entity in the system. Thus, if p is the ratio of the resources of a faulty entity to a minimally capable entity then f can represent/forge floor(p) entities. (assuming these challenges are issued concurrently to all entities) However, if the challenges are not concurrent to all entities forged by the malicious entity then it can represent any number of forged entities. INDIRECT IDENTITY VALIDATION a group of identities validate another identity. However, a group of faulty entities can vouch for other faulty entities. DEFENDING AGAINST ECLIPSE ATTACK ON OVERLAY NETWORKS if an attacker controls a large fraction of the neighbors of correct nodes , then it can "eclipse" the correct node and prevent correct overlay operation. Strong structural constraints on the overlay are used to defend against this attack. When the eclipse attack is launched the average indegree of attacker nodes will be higher than the average indegree of correct nodes ( this assumes a defense against the Sybil attack) thus, the correct nodes can identify faulty ones from their indegree values. However, the faulty nodes can consume the indegree of the correct node and prevent other nodes from pointing to them. Thus, the outdegree of each node must also be bounded below a certain threshold. This also requires an efficient auditing scheme to prevent entities from lying about their indegrees and outdegrees.the auditing scheme requires each node x to maintain a back pointer list. Node x forwards packets only from nodes in its backpointer list. x will periodically challenge members of its back-pointer set for their neighbor set. If x is not included in this set or the size of the set is greater than the threshold , x removes the node from its back pointer list. This requires sender anonymity to be efficient - i.e the node being challenged must not know about the challenger or else it could include x in its neighbor set. Anonymization is carried by forwarding requests to a via its neighbor node always (for all nodes x). A single anonymizer node does not suffice as the anonymizer could be malicious and reveal the identity of the sender. Randomizing the period between challenges from the same node prevents the attacker from correlating the arrival time of the challenge with the identity of the challenger. SECURE ROUTING FOR... this paper presents attacks which prevent correct message delivery in structured peer-to-peer overlays and presents defenses against these attacks. secure routing requires : 1) a secure assignment of node identifiers 2) secure routing table maintenance 3) secure message forwarding SECURE NODE ASSIGNMENT: if attackers are allowed to choose node IDs they can create lots of problems e.g. partition the overlay in 2 disjoint sets. A formal term for it is the Sybil Attack. solution : use a set of trusted CAs to assign node Ids to principals and to sign node certificates that bind a random nodeID to the public key of the node and the IP address. Multiple nodeID certificates are allowed per IP to prevent denial of service attacks by hijacking IP addresses. However, this scheme does not work in those designs which cause node Ids to change over time(e.g CAN) To prevent an attacker from obtaining a large number of certificates, we could have them pay a certain amount per certificate so that Sybil attacks could be prevented. (Distributed NodeID generation suffers from causes mentioned earlier in Sybil attack paper) SECURE ROUTING TABLE MAINTENANCE Attackers can spoil a routing table over time by supplying bad routing updates to it. If they are able to intercept probe messages they can reply using other nodes such that the probe sending node has a distorted idea about the overlay (probes could be to check alive status or to improve routing performance) this results in a cascading effect over time. solution: the authors suggest using 2 routing tables : one which is used for routing based on network proximity and the other that constrains routing table entries. (this is based on the observation that systems with strong structural constraints on the set of node Ids that can fill a routing table slot are less vulnerable to the above attack.) Thus a node updates it constrained routing table by using a node that is closest to it(this acts a structural constraint). SECURE MESSAGE FORWARDING Faulty nodes may choose not to forward packets thereby lowering system performance. solution : detect faults , use diverse routes fault detection is based on a time out mechanism. once a fault is detected redundant routing is carried out - i.e. route multiple copies of the message over along diverse paths. If the nodeId space is uniformly distributed then this is sufficient. In the other case as in Pastry and Chord, another technique called neighbor anycast that sends copies of the message towards the destination key until it reaches a node with the key's root in the neighbor set and then this node will use its detailed knowledge for efficient routing. iterative routing was rejected as at any hop an attacker could provide a faulty node as the next hop. From lackhand@gmail.com Wed Feb 22 17:03:49 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.6 required=5.0 tests=AWL,HTML_00_10,HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.193] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1MM3mt09150 for ; Wed, 22 Feb 2006 17:03:49 -0500 (EST) Received: by zproxy.gmail.com with SMTP id f1so1534574nzc for ; Wed, 22 Feb 2006 14:03:48 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=CfeXYnChJd02PoO9xmjHvmoSypWq9zqd0f7uP2mw8dzB8eOchaHwHGDMM+yKX9FyJfryZxxbnTWEjwe9VA9XGIahht3wVwBl+Jt0uLIyoZzOY5Lq+5hhkxH8emA/rY9A07a76Np9lEQK/r12ysC5IkjZs/17oIWcBEvLNnlrZ2M= Received: by 10.36.157.19 with SMTP id f19mr5561218nze; Wed, 22 Feb 2006 14:03:48 -0800 (PST) Received: by 10.36.146.8 with HTTP; Wed, 22 Feb 2006 14:03:48 -0800 (PST) Message-ID: <9aa7a97d0602221403u232e9eg8d6eefe9391f5579@mail.gmail.com> Date: Wed, 22 Feb 2006 17:03:48 -0500 From: "Andrew Cunningham" To: egs+summary@cs.cornell.edu Subject: PAPER 9 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_24139_3983403.1140645828280" ------=_Part_24139_3983403.1140645828280 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham arc39 _A_Measurement_Study_of_Peer-to-Peer_File_Sharing_Systems_ Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble This is an almost psychological study on the profiles of users of Napster and Gnutella, and of their habits and capabilities. It includes readings on bottleneck bandwidth between the various hosts and the internet at large, how often hosts connect and disconnect forom the system, how many hosts share and download, how much they cooperate, and the correlations between these various behavior. The end result of this study is that there is significant heterogeneity and lack of cooperation across peers participating in these systems, (conjecture:) most likely due to the lack o= f reward structure in place to induce cooperation. It is humorous that the authors feel that four days of Napster activity, and eight of Gnutella, is sufficient to draw conclusions about the systems: they claim that the main users of these systems are edge computers on the internet; the problem is that for such uncoordinated beings, time-of-year can have vastly important impact. For instance, the period during which they observed the systems is during Cornell's end-of-year, final, and graduation period; user activity i= s likely to be anything other than normal for college students! Another odd result (under their Napster study) is that they discover that high speed connections to the internet users constitute both more of the peers and more than 3 times more of the upload traffic than modem, low speed connecting clients. They do not seem to draw the same result from thi= s as I do, which is that some do choose to identify themselves as possessing = a high speed; they get more downloads and many more uploads, and behave in al= l ways slightly utopian. On the down side, there are a large number of unknown-speed hosts; it is to these that the author's desire for a better reward structure exists. They suggest that to solve the client's deliberate misrepresenting of connection capability, that the software automatically configure itself; this is a stopgap measure, but morally reprehensible. Not only does it not solve the problem (if you rely on these values, it is in the interest of th= e community to hack together a version of the client which will underreport) and make the user suspicious of what their computer is doing against them, it is in some senses a breach of confidence. Moreover, it's out of the scop= e of this paper: better phrasing would be to simply issue a call-to-arms for = a solution, without suggesting a software, versus protocol, solution. _Measurement,_Modeling,_and_Analysis_of_a_Peer-to-Peer_File-Sharing_Worklo= ad_ Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble, Henry M. Levy, and John Zahorjan Using a much larger trace of Kazaa traffic, this paper analyzes various file-sharing behavioral hypotheses. Among these are a debunking of the traditional view of Zipf-like distribution for content (due to specific filesharing characteristics), and the implications of this datum. Among the implications are optimizations for using the currently untapped locality of operations to improve the Kazaa workload, with quantified bandwidth savings= . Fortunately, this paper avoids some of the problems of the previous one, in that the data-gathering period was much longer, although for a much more constrained population, and thus took into account human behavioral pattern= s (as they determined that a large amount of their campus traffic was P2P related, studies during the summer will ignore that traffic!). One thing that was not considered was the amount of data transferred over Kazaa; text, portable document format, and executable. The reason that this is relevant is that these formats are somewhat mutable, and might tend to combat the non-Zipf nature of Kazaa files; essentially, they form groups and aggregate their downloads for a general popularity metric. Similarly, the odd behavior surrounding new object arrival -- that new popular object crystallize load and thus lead to better network load behavior -- ignores the actual behaviors of users, who will seeks odd/archaic files regardless of the general opinion of them. In my experience, of course. The other odd news from this paper is bad news for consistent hashing schemes: to utilize locality-aware request routing, it is beneficial for clients to be aware of files that are within some internet-relevant domain -- perhaps on the same subnet, etc. However, under the ring based systems that we've studied, the nodeID of hosts (of objects) are some sort of hash of, say, the IP address -- and thus likely to be completely disassociated from the ID of the object= ! This is not an insurmountable obstacle -- with sufficient indirection, nothing is insurmountable -- but it is a cause for a pause to think. Finally, there is little incentive for peers to use some sort of locality aware caching scheme. This improves performance on a global sense, but requires admitting that one is "officially" hosting a file; a better wa= y would be for the protocol to build some sense of this into itself through some sort of anonymity routine. This is not a well thought out concept, but the problem is simple: their specific implementation of the locality heuristic relied on perfect global knowledge in a central location. ------=_Part_24139_3983403.1140645828280 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham
arc39
    _A_Measurement_Study_of_Peer-to-Peer_File_Sharing_System= s_
    Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble
    
    This is an almost psychological study on the profiles of users of Napster and Gnutella, and of their habits and capabilities. It includes readings on bottleneck bandwidth between the various hosts and the internet at large, how often hosts connect and disconnect forom the system, how many hosts share and download, how much they cooperate, and the correlations between these various behavior. The end result of this study is that there is significant heterogeneity and lack of cooperation across peers participating in these systems, (conjecture:) most likely due to the lack of reward structure in place to induce cooperation. It is humorous that the authors feel that four days of Napster activity, and eight of Gnutella, is sufficient to draw conclusions about the systems: they claim that the main users of these systems are edge computers on the internet; the problem is that for such uncoordinated beings, time-of-year can have vastly important impact. For instance, the period during which they observed the systems is during Cornell's end-of-year, final, and graduation period; user activity is likely to be anything other than normal for college students!
    Another odd result (under their Napster study) is that they discover that high speed connections to the internet users constitute both more of the peers and more than 3 times more of the upload traffic than modem, low speed connecting clients. They do not seem to draw the same result from this as I do, which is that some do choose to identify themselves as possessing a high speed; they get more downloads and many more uploads, and behave in all ways slightly utopian. On the down side, there are a large number of unknown-speed hosts; it is to these that the author's desire for a better reward structure exists.
    They suggest that to solve the client's deliberate misrepresenting of connection capability, that the software automatically configure itself; this is a stopgap measure, but morally reprehensible. Not only does it not solve the problem (if you rely on these values, it is in the interest of the community to hack together a version of the client which will underreport) and make the user suspicious of what their computer is doing against them, it is in some senses a breach of confidence. Moreover, it's out of the scope of this paper: better phrasing would be to simply issue a call-to-arms for a solution, without suggesting a software, versus protocol, solution.
    
        _Measurement,_Modeling,_and_Analysis_= of_a_Peer-to-Peer_File-Sharing_Workload_
    Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Stev= en D. Gribble, Henry M. Levy, and John Zahorjan

    Using a much larger trace of Kazaa traffic, this paper analyzes various file-sharing behavioral hypotheses. Among these are a debunking of the traditional view of Zipf-like distribution for content (due to specific filesharing characteristics), and the implications of this datum. Among the implications are optimizations for using the currently untapped locality of operations to improve the Kazaa workload, with quantified bandwidth savings. Fortunately, this paper avoids some of the problems of the previous one, in that the data-gathering period was much longer, although for a much more constrained population, and thus took into account human behavioral patterns (as they determined that a large amount of their campus traffic was P2P related, studies during the summer will ignore that traffic!).
    One thing that was not considered was the amount of data transferred over Kazaa; text, portable document format, and executable. The reason that this is relevant is that these formats are somewhat mutable, and might tend to combat the non-Zipf nature of Kazaa files; essentially, they form groups and aggregate their downloads for a general popularity metric. Similarly, the odd behavior surrounding new object arrival -- that new popular object crystallize load and thus lead to better network load behavior -- ignores the actual behaviors of users, who will seeks odd/archaic files regardless of the general opinion of them. In my experience, of course. The other odd news from this paper is bad news for consistent hashing schemes: to utilize locality-aware request routing, it is beneficial for clients to be aware of files that are within some internet-relevant domain -- perhaps on the same subnet, etc. However, under the ring based systems that we've studied, the nodeID of hosts (of objects) are some sort of hash of, say, the IP address -- and thus likely to be completely disassociated from the ID of the object! This is not an insurmountable obstacle -- with sufficient indirection, nothing is insurmountable -- but it is a cause for a pause to think.
    Finally, there is little incentive for peers to use some sort of locality aware caching scheme. This improves performance on a global sense, but requires admitting that one is "officially"= ; hosting a file; a better way would be for the protocol to build some sense of this into itself through some sort of anonymity routine. This is not a well thought out concept, but the problem is simple: their specific implementation of the locality heuristic relied on perfect global knowledge in a central location. ------=_Part_24139_3983403.1140645828280-- From ids3@cornell.edu Wed Feb 22 22:39:41 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N3dft06565 for ; Wed, 22 Feb 2006 22:39:41 -0500 (EST) Received: from [128.253.212.208] (r253212208.resnet.cornell.edu [128.253.212.208]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N3dfHM004129 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Wed, 22 Feb 2006 22:39:41 -0500 (EST) Message-ID: <43FD2E7D.2000007@cornell.edu> Date: Wed, 22 Feb 2006 22:39:41 -0500 From: Ivan Stoyanov User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: egs+summary@cs.cornell.edu Subject: PAPER 9 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit The motivation behind both papers is similar. They want to develop insights and draw conclusions about peer-to-peer networks by looking directly at a real deployments like Napster, Gnutella and Kazaa. The authors claim that actual usage patterns, node characteristic and motivation for participation are very important and should be taken into consideration when making design choices about the system and the related protocols. Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload The paper focuses on the properties of the file-sharing phenomenon as delivered by the popular medium Kazaa. Through the data collected they make two important observations about the characteristics of users and demand. -Unlike in the case of web site or DNS, popularity for large multimedia files does not follow a Zipf distribution. This is due to what they call "fetch-at-most-once" behavior, where the mutimedia files are requested once becase of their immutability (in contrast to web sites). Similarly, movie rentals and theater visits are actions committed once per object. The popularity graph of such objects has a "flat head", when drawn or a log-log scale. -The primary forces in Kazaa are the creation of new objects and addition of new users. This is in contrast to the WWW, where constant mutation of objects generates the traffic. Users in Kazaa are "patient". Kazaa is a batch-mode delivery system as opposed to the "interactive" web. Also, the number of user requests tend to go down as the users "age". A Measurement Study of Peer-to-Peer File Sharing Systems This paper focuses more strictly to the profile of the users with respect to the available bandwidth, latency, availability and degree of sharing. Not surprisingly, the authors find that there is a great heterogeneity there. Their second observation is that when peers have no incentives they tend to misreport information. One problem is that the data shown seems like a too small sample -- only four and eight days, respectively for Napster and Gnutella. Since some of the authors are the same in the other paper, they seem to have learned a lesson. The main problem that the paper is trying to deal with is the misrepresenting of the available bandwidth. There is no good solutions to that--users will always report what they have incentives to report. A good solution will be an incentive to report bandwidth correctly or to somehow measure it directly. The paper shows interesting graphs regarding the resilience of Gnutella's connectivity. If random 30% of the nodes are removed the network will be still connected. However, if the top 4% high-degree nodes are removed, the network will be partitioned in a large number of small networks. From sh366@cornell.edu Thu Feb 23 00:09:15 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N59Ft27737 for ; Thu, 23 Feb 2006 00:09:15 -0500 (EST) Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1N59ET3018450 for ; Thu, 23 Feb 2006 00:09:15 -0500 (EST) Message-ID: <544606907.1140671353535.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 23 Feb 2006 00:09:13 -0500 (EST) From: Huang Shiang-Jia To: egs+summary@cs.cornell.edu Subject: PAPER 9 Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Mailer: uPortal WEB email client 3.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1N59Ft27737 The two papers measure the characteristics of Napster and Gnutella and the workload of Kazaa, respectively. Measurement on the characteristics of nodes is important in delegating responsibilities across them in the peer-to-peer system. Measurement on the forces that drive workload in peer-to-peer systems, mostly multimedia, helps understand the future of multimedia workload. [Comments] As the paper observes a large fraction of server-like and client-like behaviors in Napster and Gnutella, a large-scale system may be classified into classes of peer-to-peer systems accordingly to the peers' resource, and peers may be required to provide a certain amount of uploads before they can download others’ shared files, to make it a real and fair peer-to-peer system. * Current proposed peer-to-peer routing protocols assume that nodes participating in the system behave equally (having the same resource…) and are willing to cooperate. This paper presents a measurement study over nodes in the system and shows that there is significant heterogeneity and lack of cooperation across them. * Measurements are performed on Napster and Gnutella in this paper. They periodically crawl the systems to gather peer population snapshots and measure properties about them. These snapshots are crawled by issuing queries to central servers in Napster and by exploiting the ping/pong messages in Gnutella. The metadata of peers from the responses are kept in a list. Characteristics such latency, lifetime and bandwidth are measured for each snapshot. * In latency measurement, they use TCP to discriminate against flows and large round-trip time. The latency, however, is dependent on the location of host from which it is measured. For lifetime, they measure IP-level uptime as well as application-level uptime. Finally, they use bottleneck link bandwidth as an approximation to available bandwidth because bandwidth may fluctuate significantly. As the bottleneck generally equal to capacity of the slowest hop, it’s a physical property that remains constant over time. * The results show (1) peers tend to have higher downstream than upstream bandwidth; (2) there are three classes of latencies a peer interacts with: same part of a continent, opposite part of a continent and trans-oceanic peers; (3) on average, Napster peers have longer uptime than Gnutella peers; (4) there are free-riders (nodes having little or no shared data but always download files) in both systems and this is more significant in Gnutella where 7% peers offer more files than all of the other peers; (5) Napster peers have an incentive to report a small bandwidth than the true value; (6) though having no central servers, Gnutella is vulnerable to attacks that are directed to best-connected, popular and high degree nodes. * The heterogeneity on latency, bandwidth and availability implies that a peer-to-peer system should delegate responsibilities (storing replicas or popular data) across the nodes based on their characteristics. The dishonest report of nodes implies that a peer-to-peer system should have built-in incentives for them to be honest or directly measure the information. * This paper explores the forces that drive current peer-to-peer file-sharing workload and implications for their future. * Peer-to-peer file-sharing system differs from web content distribution in its data type, multimedia files. The large size and immutability of these data have effects both on user behavior and on object dynamics: (1) Many of these data are fetched at most once per client. This paper shows that such kind of behavior causes the Kazaa population distribution to deviate substantially from Zipf curve. They compare Kazaa with other non-Zipf workload such as web with proxy caching or media streaming and use a model to explore that the non-Zipf behavior in Kazaa is best explained by the fetch-at-most-once behavior of clients. (2) Popularity of the objects is often short-lived, popular objects tend to be recently born and many requests are for old objects. Along with the analysis in their model and other properties presented in this paper, such as: Kazaa is a batch-mode delivery system where users are more patient, while web is an interactive system where users are sensitive to page-fetch latency; and new clients generate most load in Kazaa while older clients consume fewer bytes as they age, they conclude that, unlike the web system whose workload is driven by document changes, the primary forces in Kazaa are creation of new objects and addition of new clients. * This paper finally demonstrates an untapped locality existing in Kazaa workload and presents a locality-aware mechanism, implemented by either centralized or decentralized nodes that redirect to caches from local peers, for reducing external downloads. Its concern is the availability of the content, since the redirectors have no control over the peers. From asr32@cornell.edu Thu Feb 23 00:47:04 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,MAILTO_TO_SPAM_ADDR autolearn=no version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N5l4t06663 for ; Thu, 23 Feb 2006 00:47:04 -0500 (EST) Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N5l3FJ019677 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 23 Feb 2006 00:47:04 -0500 (EST) Message-Id: <6.2.1.2.2.20060221205640.03084a78@postoffice8.mail.cornell.edu> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Thu, 23 Feb 2006 00:47:03 -0500 To: egs+summary@cs.cornell.edu From: Ari Rabkin Subject: PAPER 9 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed [this time for real...sent a few previous paper responses with bad subject lines...fencepost error on the numbering, coupled with carelessness] A measurement study of peer-to-peer filesharing systems It would be nice to know what the 'typical' napster or gnutella node looks like, and the authors set out to measure this. They built a small toolbox of programs for measuring bandwidth, latency, and so forth for a large sample of napster and gnutella nodes, over a short period of time (4-8 days). The authors show that nodes are very diverse, in terms of bandwidth, uptime, latency, and files shared. They provide detailed numeric data on each. For instance, they show that the bulk of nodes on both systems are users connecting through cable modems and DSL links, but that there are sizeable fractions of LAN users, and even of modems. Unfortunately, precisely because of the short measurement period, the authors may not capture long-term, or even annual, variation in the properties of peer-to-peer networks. These data are perhaps relevant for a year or two; as the overall demographics of the internet shift, it seems likely that these results are no longer directly applicable. The paper's Figure 2 shows interesting diurnal variation; sadly, the authors do not indicate whether their data is measuring local time or GMT, making it much less useful. The authors mention that other Gnutella clouds may exist; it would be interesting to know whether different clouds have different geographic distributions. Measurement and modelling Whereas the previous study examined peer to peer hosts qua hosts, this study concentrates on the data being shared, and the users qua users. The study looked just at the UW campus, and just at one P2P protocol; Kazaa. The authors present population studies on object popularity, and how Kazaa users change their behavior over time. There were a number of interesting results. To give one: due to the fact that users generally download a given file only once, the distribution of requests is not zipfian--the most popular items are "clamped" in popularity, being requested less often than Zipf's law would suggest. The authors use these results to propose a statistical model of user behavior and file choices. This study is quite narrow in both breadth and scope. It is limited to a few months, and to a heavily-networked university campus. It is not obvious that the results generalize to other network settings, or even whether they are constant over time. The authors assert that "fetch at most once" is a fundamentally different cause for non-zipfian distribution from web caching. This is not necessarily true: the user's hard disk is essentially a very large cache for media files of interest to the user, who may play the same song or movie many times. The fundamental phenomenon resulting in non-Zipf distribution here could plausibly be caching--whether the cache is shared or per-client is not fundamental for analyzing the distribution. Ari Rabkin asr32@cornell.edu Risley Hall 454 3-2842 The resources of civilization are not yet exhausted. --William Gladstone From tc99@cornell.edu Thu Feb 23 01:33:16 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N6XGt18518 for ; Thu, 23 Feb 2006 01:33:16 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1N6XFA8016999 for ; Thu, 23 Feb 2006 01:33:15 -0500 (EST) Received: from 24.59.114.243 by webmail.cornell.edu with HTTP; Thu, 23 Feb 2006 01:33:15 -0500 (EST) Message-ID: <1412.24.59.114.243.1140676395.squirrel@webmail.cornell.edu> Date: Thu, 23 Feb 2006 01:33:15 -0500 (EST) Subject: paper 9 From: "Theodore Ming Shiuan Chao" To: egs@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal The two papers discussed both measure certain aspects of a few popular P2P File Sharing applications, as those were the most common use of large-scale P2P networks. The first paper measures the bandwidth (reported and/or actual estimates), session durations, and number of shared files in Gnutella and Napster networks. They found that the two networks shared many similar features. For one, a large fraction of the users in both networks were behind broadband connections with downstream bandwidths of 1000 kbps. However, many of the users were behind asynchronous connections with lower upstream bandwidth, resulting in a significant difference between total available upstream and downstream bandwidth in the network. They also noticed the CDF of the session time of users resembles a log scale, with over half the users with session times under an hour and 80% under 3 hours, and that a small percentage of the users account for a large percentage of the files shared. They also note some differences between the two networks. One is that Gnutella has a far smaller percentage of <64kbps bandwidth users, and another is that the size of files on Gnutella varies much more than Napster where nearly all the files are <10MB audio files. They also note that Gnutella can be vulnerable to organized attacks on nodes with especially high degrees, which can cause large-scale partitioning of the network overlay. The second paper takes a look at Kazaa, and focuses on the distribution of queries over time. The major point to note is because of the "fetch-at-most-once" property of the media files shared in the P2P network, the distribution of requests does not follow a Zipf-like distribution that is commonplace when viewing Web traffic. Even assuming an initial Zipf-like distribution, opular objects are downloaded quickly, but once each user has fetched it, there is no need for it to request it again and the number of requests for it decreases as the age of the object increases. Influx of new popular objects can bring the distribution closer to the Zipf distribution, but it still quickly falls off. New clients that do not have the object provide a marginal gain, as they represent merely a small percentage of the population, unless the influx of clients is on an exponential scale. While it is likely true that most users in the studied P2P networks are free-riders, the first paper has a flaw in that it doesn't take into account POPULARITY of files each user is sharing. If you consider that the most popular files are the most often downloaded, and by default the program will share the files a user downloads, there is a chance that this could serve as a crude form of replication. Thus, a user with a small number of files could be serving some of the more popular ones. Conversely, a user with a large number of files might not necessarily be satisfying more queries if the objects it is serving are not as popular. A possible extension would to be combine the two studies: examine the popularity and number of files shared on users as a function of time. As objects age, do users replace the files they are sharing with newer popular objects? Or do they keep sharing the old objects, possibly in addition to newer ones? For the second paper, they also only looked at the most popular requests in the first and last 30 days. 6 months is quite a long turn-around period for files; a more informative experiment would be to take snapshots of popular files every 30 days. Then they could look at the lifetime of different file types/sizes, and also see how the request fall-off for popular objects looks compared to the fall-off of less popular objects. From pjk25@cornell.edu Thu Feb 23 01:56:12 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N6uCt23161 for ; Thu, 23 Feb 2006 01:56:12 -0500 (EST) Received: from [10.0.1.2] (cpe-69-207-41-159.twcny.res.rr.com [69.207.41.159]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N6uBKL023631 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 23 Feb 2006 01:56:12 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary@cs.cornell.edu From: Philip Kuryloski Subject: PAPER 9 Date: Thu, 23 Feb 2006 01:58:57 -0500 X-Mailer: Apple Mail (2.746.2) A MEASUREMENT STUDY OF PEER-TO-PEER FILE SHARING SYSTEMS The authors produce a survey of the characteristics of peers in both the Napster and Gnutella file sharing networks. In order to achieve this, they crawl each network to obtain a list of peers, and then use that list to scan and test peers in the network. The napster crawl was performed by obtaining a list of popular files from the web, querying the napster network, and recording hosts with copies of the file. The Gnutella crawler would use the Gnutella protocol to flood the network with ping messages to gather hostnames. Once a list of peers had been generated, number of shared files, ping latency, uptime or lifetime, and bandwidth to the internet was measured for individual peers. Distribution of peers across DNS domains was also measured. They find that very few peers server like high bandwidth (> 3Mbps) connections, and that a significant portion of peers have very low bandwidth modem grade connections. Also, upstream bandwidth is typically lower than downstream bandwidth. They note that latencies in the Gnutella network vary greatly due to the unstructured nature of the network. In terms of uptime, Napster peers have significantly higher availability than Gnutella peers, although no network has a large number of peers with server-like uptime (93% +). There are also a large number of freeloading users in both networks, and a significant number of users who misreport their bandwidth. Based on these findings, the authors derive several suggestions for p2p network designers. The great variation in peers suggests that systems should delegate different levels of responsibility to different peers. Also, the system should do it's best to measure peer capability rather than rely on peer supplied values, as these are often misreported. MEASUREMENT, MODELING, AND ANALYSIS OF A PEER-TO-PEER FILE-SHARING WORKLOAD This work focuses around a 200 day trace of Kazaa traffic. The authors begin by generally classifying aspects of this traffic as compared to web traffic. They find that web traffic is driven by the dynamic nature of pages, while Kazaa traffic is generated by the introduction of new content and by users entering and leaving the system. Generally this is due to the immutable nature of typical Kazaa content, where users are likely to download an item once, as opposed to a webpage they may visit repeatedly. The also find that Kazaa users are extremely patient with content requests, unlike web page requests. Another interesting finding is that users request less content as they age in the system. In terms of content on the network, there are three natural classifications, small (< 10MB), medium (10-100MB), and large (>100MB). The majority of the requests are for small files, however large files generate the most traffic overall. Also, popularity for objects is short lived, although less so for large files than small. However, older objects receive more requests overall than new ones. Finally, they note that the popularity of Kazaa objects does not follow the same Zif distribution as Web objects. The authors also suggest that locality-aware routing, mainly forming distributed local caches to reduce overall network traffic. The most popular objects are replicated more often. In both papers, the authors find that p2p networks are often a far cry from the homogeneous set of nodes envisioned as the substance of the network. In fact, peers vary greatly in capacity, as does object popularity. Most importantly, the types of objects transferred in p2p networks is quite different from that of the web. In essence, a p2p network is highly dynamic, and this is the most notable design challenge and stress upon the network, not due to it's content, but instead due to the peers itself. From ryanp@cs.cornell.edu Thu Feb 23 03:38:24 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N8cNt18541 for ; Thu, 23 Feb 2006 03:38:23 -0500 (EST) Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 03:38:23 -0500 Received: from [128.84.223.115] ([128.84.223.115]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 03:38:19 -0500 Message-ID: <43FD74F1.9000100@cs.cornell.edu> Date: Thu, 23 Feb 2006 03:40:17 -0500 From: "Ryan S. Peterson" User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513) X-Accept-Language: en-us, en MIME-Version: 1.0 To: egs+summary@cs.cornell.edu Subject: PAPER 9 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 23 Feb 2006 08:38:19.0213 (UTC) FILETIME=[7F614FD0:01C63854] Two papers out of the University of Washington measure characteristics of early peer-to-peer file sharing systems to get a sense of traffic patterns and client behavior over time. The first paper, by Saroiu, et al., studies Napster and Gnutella using crawlers to scour the networks for a interval of four to eight days. The paper comes away with two major observations about large multimedia sharing p2p systems. First, although such systems often assume homogeneity in clients' bandwidth, latency, and number of files shared, clients across the network often differ by three to five orders of magnitude on all those characteristics. Second, to game the system, many clients seem to misreport information about themselves, such as connection speed. By doing so, clients have less responsibility for uploading files, and can be "free riders," using the service to download files without reciprocating the favor. Without auditing algorithms in place to prevent misrepresentation, approximately 25% of all Napster and Gnutella users are free riders. The second paper, by Gummadi, et al., focuses on Kazaa, a later p2p multimedia-sharing system. They tracked Kazaa traffic in and out of the University of Washington for a 200-day interval. Much of this paper is dedicated to comparing query distributions in Kazaa versus the Web. In particular, the paper shows that Kazaa queries do not follow a Zipf distribution as Web queries do. The most popular files are not as popular as a Zipf distribution would predict, making the popularity function "flatten out" at the popular end. The authors attribute this phenomenon to the "download once" model. Since Kazaa distributes immutable multimedia objects, there is little reason to download the same file multiple times. Contrast this with Web pages, which a client might check several times every day. The authors also point out various other statistics about Kazaa users, such as their patience as compared with Web users, who often give up on downloading a Website after a matter of seconds. Kazaa users treat the network as a "batch download" program, waiting hours of days before their requested downloads are complete. Both papers note the heterogeneity among clients and nodes in the three studied p2p networks. Because of the vast differences in machine capabilities and network conditions throughout the networks, newer systems have begun taking such conditions into account. Peer-to-peer networks, however, despite their dominance on world-wide network traffic, are still not as well understood as traditional Web services. Ryan From ns253@cornell.edu Thu Feb 23 10:27:38 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NFRbt14218 for ; Thu, 23 Feb 2006 10:27:37 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1NFRZkR024330 for ; Thu, 23 Feb 2006 10:27:36 -0500 (EST) Received: from 69.207.49.126 by webmail.cornell.edu with HTTP; Thu, 23 Feb 2006 10:27:36 -0500 (EST) Message-ID: <36076.69.207.49.126.1140708456.squirrel@webmail.cornell.edu> Date: Thu, 23 Feb 2006 10:27:36 -0500 (EST) Subject: PAPER 9 From: "Niranjan Sivakumar" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Niranjan Sivakumar A Measurement Study of Peer-to-Peer File Sharing Systems Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload Both of these papers deal with the issue of probing popular file-sharing systems to try to derive some meaningful characteristics. The first paper set up crawlers for Napster and Gnutella. Different findings were made about bandwidth usage and the contribution of different nodes in the networks. It was found that nodes generally had lower upstream bandwidth than downstream. It was also found that there were many leechers on the network. Despite the appearance of a system where many nodes were each carrying out similar responsibilities, the models of these systems seemed to be much more like a modification on the traditional client server model with much of the load borne by larger, more resourceful nodes with a significantly smaller load on a large number of other nodes. Gnutella showed some resilience to failure, but it was shown that a coordinated attack would be able to "shatter"; the system because of the uneven load distribution. It was also found that there is quite a bit of churn on these systems, worse on Gnutella than on Napster. The second paper considered measurements of Kazaa. First, the authors compared Kazaa usage to web usage. They found that Kazaa users are generally more patient than web users. Also, Kazaa usage differs from web usage because Kazaa users will often obtain an object only once from the system, as opposed to a website that a user may visit many times due to its changing nature. Distribution of object popularity was also considered. It was shown to be non-Zipf, like websites, but the reasons for the distribution were different between websites and Kazaa objects. Website distribution was driven by the nature of proxy caches, whereas Kazaa doesn't really have a caching system like that. The study also showed that file-sharing effectiveness decreases with client age. In turn, this showed that the infusion of new objects rejuvenated the system. However, new clients joining the network did not have the same effect. The paper also looked at ways to improve performance, and showed that a locality-aware mechanism may help to reduce bandwidth usage. Both of these papers may not seem too ground breaking when considered in retrospect, as some of the findings have now become somewhat commonly known knowledge. However, given the time when the measurements were made, it was probably an important step to understanding such file-sharing systems. However, one of the drawbacks of these papers may be the temporal nature, the papers do not look forward too much with respect to changing Internet infrastructures, such as the recent development of significantly higher speed networks in Europe and Asia, where bandwidths may increase significantly and equalize in terms of up and downstream capacity. From km266@cornell.edu Thu Feb 23 11:53:18 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.3 required=5.0 tests=AWL autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NGrHt11165 for ; Thu, 23 Feb 2006 11:53:17 -0500 (EST) Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1NGrGPH027179 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Thu, 23 Feb 2006 11:53:17 -0500 (EST) Message-ID: <000501c63899$a4fbb970$4425cf45@KEVSTOY> Reply-To: "Kevin" From: "Kevin" To: Subject: PAPER 9 Date: Thu, 23 Feb 2006 11:53:16 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2527 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 The first paper (A Measurement Study of Peer-to-Peer File Sharing Systems) is a focused study on the usage patterns of people on the Gnutella and Napster file-sharing networks. The authors developed a web-crawling-like system to scan the p2p infrastructure of the two networks. They gathered information on number of files shared, reported bandwidth, actual bandwidth (by measuring it themselves), current # of uploads, current # of downloads, and other information. They monitored Napster for four days and the Gnutella network for eight days. Their results seemed to show that people on the Gnutella network were almost "better" than those people on the Napster network. On average, their latencies were lower, their bandwidths were higher, and they misreported less. Their initial reaction was the tech-savvy nature of people who used the Gnutella network (therefore having higher bandwidth connections). One of the big takeaways of this paper was the heterogeneity of the users. Many users (especially those on slower connections) acted more like clients while those on faster connections acted more like servers. The authors kept making the argument that future p2p system designers should create systems which take this into account and do not assume everyone has a similar speed connection. One of the arguments the authors also constantly made was that software should not ask users for their bandwidth and other information, but rather that the software should measure that itself. It seems almost ironic that they first say that high speed connections act like servers and are (on average) up more frequently and for longer durations yet they do not trust the end user. It seems like the high speed connections are already doing a favor to the network by being up so much, it seems as if the authors do not trust the users. Other problems (such as small sample size) are an issue, but this is resolved in the second paper. The second paper, Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload, was a detailed look at the Kazaa file sharing network. This paper had a much longer sampling period (over half a year) which allowed the authors to view human-related cycles (such as college students going home over summer-break and coming back in the fall). One of the first things the paper seems to point out is that people are willing to wait for content. Unlike web surfing (where people normally don't wait for more than a few seconds before moving on to the next, possibly competing, site), people who downloaded content through Kazaa had no problem waiting a day, a week, or even longer to get files. People, they discovered, are patient! The next takeaway from this paper was that files on the Kazaa network did not follow a Zipf-like distribution. Unlike DNS queries and other internet systems, the content, when graphed on a log-log plot, was clearly not Zipf. They explain this by the fetch-at-most-once concept. Basically, once you have a media file you no longer need to download it. You have it stored locally and the external network's version is immutable -- there is no reason to download it again. The phenomenon of "the new song" which is often immediately popular, might skew the network to be more zipf like temporarily, but then quickly dies back down as more and more people have the file. The authors also measure something else of interest: the number of requests for small files (those under 10 megabytes) greatly outnumber (by over 10x) requests for large (>100 megabytes) files while the actual transferred bytes show that 65% of the bytes transferred are for big files and closer to 15% for smaller files. All in all, the papers were useful for determining the behaviors of the system and the users and should give insight to future p2p developers. While not all of the schemes are immediately realizable (like locality in the second paper, which will be difficult with hashing), they give good pointers. --Kevin From victoria@cs.hmc.edu Thu Feb 23 11:57:00 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NGuxt11725 for ; Thu, 23 Feb 2006 11:56:59 -0500 (EST) Received: by turing.cs.hmc.edu (Postfix, from userid 34382) id 72A6B5325D; Thu, 23 Feb 2006 08:56:53 -0800 (PST) Date: Thu, 23 Feb 2006 08:56:53 -0800 From: Victoria Krafft To: egs+summary@cs.cornell.edu Subject: PAPER 9 Message-ID: <20060223165653.GA22544@cs.hmc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i So far, all the peer-to-peer networks we've looked at have made some general assumptions about the hosts which will be running the systems. We've assumed that network bandwidth and latency were the limiting factors, and not worried too much about CPU time or memory usage. However, we had no idea how much bandwidth the average node had, or how much this varied between nodes. These papers study the characteristics of the nodes running file-sharing software. The paper by Saroiu et al. showed that the nodes in both Gnutella and Napster were heterogeneous, with variations of several orders of magnitude. In addition, their work showed that nodes were likely to misrepresent their characteristics, such as available bandwidth. They also found that the 'free rider' problem from economics carries over into peer-to-peer networks; a portion of the nodes in the system only downloaded files, and never uploaded them. Furthermore, 50% of the sessions lasted less than 60 minutes, which gives us a value to use for network churn. All of these facts suggest that the assumptions made when creating and evaluating several popular peer-to-peer networks were not correct. The second paper by Gummadi et al. presents long-term study of the use of the Kazaa peer-to-peer file sharing system. They noted several important facts about the characteristics of file-sharing networks. For one thing, unlike web pages, objects in such a network will generally be downloaded at most once by a single node. Kazaa users were willing to wait for several days for a file to be downloaded, making Kazaa more like a batch mode system. There is also a potential for significantly reduced bandwidth usage if caches or locality awareness was used. Both of these papers present the characteristics of nodes on a file sharing network; they do not examine any other types of overlay networks. It has also been several years since these papers were published, so the characteristics of nodes in a file-sharing network may have changed over time. The file-sharing networks have also changed as a result of these studies; some networks require their users to maintain a certain upload/download ratio, and there are currently several projects which attempt to provide load balancing for heterogeneous networks. -- Victoria Krafft From nsg7@cornell.edu Thu Feb 23 12:00:45 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NH0jt13068 for ; Thu, 23 Feb 2006 12:00:45 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1NH0hUx027119 for ; Thu, 23 Feb 2006 12:00:43 -0500 (EST) Received: from 132.236.227.119 by webmail.cornell.edu with HTTP; Thu, 23 Feb 2006 12:00:44 -0500 (EST) Message-ID: <1156.132.236.227.119.1140714044.squirrel@webmail.cornell.edu> Date: Thu, 23 Feb 2006 12:00:44 -0500 (EST) Subject: PAPER 9 From: "Nicholas S Gerner" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Saroiu et al.'s "A measurement study..." present measurement studies of the p2p content distribution systems Gnutella and Napster. A methodology for colling information related to IP address/port, upstream and downstream bandwidth (as reported and directly measured), node lifetimes (and statuses throughout lifetimes) and workload generation is presented. This methodology is used to collect and present data regarding the node composition of these networks: bandwidth capabilities as reported vs measured, which can differ significantly for a significant number of peers and varies considerably across peers, workload generation distribution and the distribution of nodes which service that workload. The conclusion made from this presentation is that nodes in these networks can vary considerably in bandwidth and workload generation and servicing. And nodes may often misrepresent themselves or provide no benefit to system (such as serving content). Gummadi et al.'s "Measurement, Modelling..." presents another study of a p2p content distribution system, Kazaa on the University of Washington campus. Specifically the system considered the Kazaa traffic leaving out of and coming into the campus network. Similar statistics to "A measurement Study..." were taken over a 200-day period. Additionally, content classes were identified. The presentation here highlights the significant difference between the workload distribution between a p2p content distribution network and the web. The notion of "fetch-at-most-once" behavior in Kazaa is compared to "fetch-repeatedly" behavior in the web. This difference is pointed to as driving the divergence from Zipf workload distribution in p2p networks. A model of this "fetch-at-most-once" behavior is presented which highlights that object and client births drive the workload in a p2p file sharing application versus object change which drives web workloads. Finally, the notion of locality aware routing is considered as a possible approach to address bandwidth consumption experienced by oganizational networks like Universities. From tudorm@cs.cornell.edu Thu Feb 23 12:33:47 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NHXlt23090 for ; Thu, 23 Feb 2006 12:33:47 -0500 (EST) Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 12:33:47 -0500 Received: from [128.84.223.148] ([128.84.223.148]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 12:33:46 -0500 Message-ID: <43FDF1FA.6080603@cs.cornell.edu> Date: Thu, 23 Feb 2006 12:33:46 -0500 From: Tudor Marian User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: egs+summary@cs.cornell.edu Subject: PAPER 9 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 23 Feb 2006 17:33:46.0648 (UTC) FILETIME=[4CD2D980:01C6389F] This paper starts from the premise that p2p systems should take under consideration the characteristics of the peers that participate in the system, reports on measurements of bottleneck bandwidth, latency, availability and file sharing patterns of peers performed by crawling the Napster and Gnutella networks, and conclude by providing recommendations to p2p systems designers. The study showed that most of the users are behind modest bandwidth connections and have asymmetric connections with higher downstream then upstream capacity. Moreover the study revealed that peers tend to misreport their bandwidth capacity in order to avoid upstream bandwidth consumption, they are not willing to cooperate and the systems provide no incentives for them to do so to the point where 25% of the peers are free-riders. Another key observation is that even if designed with symmetry in mind, there is a large discrepancy between the peers. The authors propose that effective delegation of responsibility should take under consideration accurate characteristics of the peers for the whole system to perform better. The second paper reports on a measurement study that was performed against the Kazaa p2p system over a period of 200 days. The most notable observation is the fact that Kazaa shared file distribution, unlike the web, does not obey a power law. This is attributed to the immutable, fetch-it-once nature of the relatively large files (objects) shared. Popularity of objects decreases as they get to be downloaded more, since there's no reason to get the same file multiple times, yet this phenomenon is counter-balanced to some extent by the injection of new files in the system. New nodes joining do not have the same effect however, since they cannot counteract the penalty incurred by aging -- as Kazaa clients age they demand less from the system. The paper also shows that even if the popularity distribution is not bound by a power law there exists enough locality in the workload for caching to reduce the bandwidth consumption. Tudor From okennedy@cs.cornell.edu Thu Feb 23 12:53:13 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NHrCt28867 for ; Thu, 23 Feb 2006 12:53:12 -0500 (EST) Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 12:53:12 -0500 Received: from [128.84.98.36] ([128.84.98.36]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 12:53:12 -0500 Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: <56DB5146-FFBC-4C1C-BCB3-91FC1A7CA5CB@cs.cornell.edu> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary@cs.cornell.edu From: Oliver Kennedy Subject: PAPER 9 Date: Thu, 23 Feb 2006 12:53:21 -0500 X-Mailer: Apple Mail (2.746.2) X-OriginalArrivalTime: 23 Feb 2006 17:53:12.0217 (UTC) FILETIME=[038E9090:01C638A2] This week's papers (both written by the same group) attempt to direction to people creating P2P network applications through statistics gathered from real world cases. The first paper uses Napster and Gnutella as case studies, while the second paper focuses on Kazaa. The interesting conclusions, in no particular order are: 1) A large portion of P2P users use more system resources than they provide. 2) A small portion of P2P users provide a large portion of the resources available on the network. 3) People disable their P2P applications when they are not using them 4) While Gnutella is resilient in the face of most failures, there are a handful of nodes that act as key routers. A concerted attack could cripple the Gnutella network. 5) Most files distributed by P2P applications do not follow a zipf distribution, as clients will only request them once. Thus, an effective limit on download count is imposed. 6) Large objects represent the largest portion of P2P traffic, despite representing the smallest portion of P2P requests. By comparison, small objects are requested frequently but represent only a small portion of the traffic. Many of these points are facts that we take for granted nowadays. Particularly interested is the note about Gnutella's weaknesses. Their trace demonstrates a very strong tree-structure, though they neglect to show where their node is located. Depending on how the search was conducted, it is possible that these graphs are not complete. Gnutella's failure points may be artifacts of an incomplete perception of the entire state of the network, combined with a demi-deterministic random walk algorithm. The second paper could have shown series of graphs demonstrating object downloads as a function of both time and rank. One would expect to see a near-zipf distribution over a small window, turning into the graph they showed as the window increased. Given the depth of their analysis, confirmation of this behavior would be a useful addition. -Oliver Kennedy Computer science isn't about computers any more than astronomy is about telescopes. -- Dijkstra From kash@cs.cornell.edu Fri Feb 24 18:31:36 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED,AWL,HTML_10_20, HTML_MESSAGE autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe1.cs.cornell.edu (exchfenlb-1.cs.cornell.edu [128.84.97.33]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1ONVat21926 for ; Fri, 24 Feb 2006 18:31:36 -0500 (EST) Received: from EXCHVS1.cs.cornell.edu ([128.84.97.24]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 23 Feb 2006 01:07:48 -0500 Subject: PAPER 9 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C6383E.B6546D02" Date: Thu, 23 Feb 2006 01:02:22 -0500 Content-class: urn:content-classes:message X-MimeOLE: Produced By Microsoft Exchange V6.5 Message-ID: <2EE48095D8C21643B0B70EC95F9BFBAF0EE643@EXCHVS1.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: PAPER 9 Thread-Index: AcY4PrZK79qDy5P9QRC3d6l1IYFTWw== From: "Ian Kash" To: X-OriginalArrivalTime: 23 Feb 2006 06:07:48.0227 (UTC) FILETIME=[787E3530:01C6383F] This is a multi-part message in MIME format. ------_=_NextPart_001_01C6383E.B6546D02 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable The measurement study reports the characteristics of peers in the = Napster and Gnutella networks based on a pair of short traces. They find there is a significant discrepancy between the upstream bandwidth available to well provisioned and poorly provisioned peers. Another interesting = observation is that many peers are uncooperative, both by sharing no files and by misreporting the type of connection they had (underreporting capacity to discourage others from making requests). There are several flaws with = the methodology of this study. The Gnutella trace ran only 8 days and the Napster trace ran only 4 days, which means that they have data only from = a short time period that may be non-representative or contain misleading = trends (for example weekday / weekend cycles will not be adequately captured). = For the Napster trace, they gathered peers by seaching for popular files. = This biases the data because (for example) it seems like people with = collections of rare things rather than popular ones are less likely to be free = riders (because they may well be the only source for the items they will be the target of requests no matter what they report) and people who do not = share at all will not be captured (and they may be motivated to free ride by poor connections). Their technique for bottleneck bandwidth measurement = raises ethical issues. Even if it is only done for a short time, it appears = they are essentially launching a DoS attack on the person's connection for = that time. In 3.3.3, their explanation of a disproportionate percentage of downloads by low bandiwdth peers does not make sense. They claim that = low bandwidth peers tend to be free riders and this explains why they make = more than their share of downloads. However, there doesn't seem to be any = reason why the number of uploads provided should have any influence on the = desire for downloads of these peers. If anything it would seem that they = should be consuming less because it takes longer for them to complete a given = download. Finally, their numbers for the removal of nodes from Gnutella have an = error somewhere. They claim they removed 30% of 1771 and left 1300, but 70% = of 1771 is 1240, which even if rounded is closer to 1200. The workload study reports the results of a 200 day trace of requests = made by users at the University of Washington on the Kazaa network and some simulations based on this workload. The most interesting result of the = trace is that the distribution of requests for large media files is not Zipf, = a fact that they are able to explain by the fact that files a requested = once rather than repeatedly as web pages are. They draw a distinction = between two main classes of files on Kazaa: small files < 10 MB (typically music) = and large files > 100 MB (typically movies). Most requests are from the = former but most bandwidth is used by the later. This means that to optimize = user experience a system should focus on the former while to optimize = bandwidth use it should focus on the latter. They also propose and simulate a = system for decreasing the bandwidth used by a file sharing system for a = community sharing an external connection (for example a university). They dismiss caching because of potential policy and legal issues. Instead they = propose directing requests to internal peers whenever possible, effectively = using them as a distributed local cache. Their simulations suggest that this = can make a significant difference in external bandwidth consumption. ------_=_NextPart_001_01C6383E.B6546D02 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable PAPER 9

The measurement study reports the characteristics of = peers in the Napster and Gnutella networks based on a pair of short = traces.  They find there is a significant discrepancy between the = upstream bandwidth available to well provisioned and poorly provisioned = peers.  Another interesting observation is that many peers are = uncooperative, both by sharing no files and by misreporting the type of = connection they had (underreporting capacity to discourage others from = making requests).  There are several flaws with the methodology of = this study.  The Gnutella trace ran only 8 days and the Napster = trace ran only 4 days, which means that they have data only from a short = time period that may be non-representative or contain misleading trends = (for example weekday / weekend cycles will not be adequately = captured).  For the Napster trace, they gathered peers by seaching = for popular files.  This biases the data because (for example) it = seems like people with collections of rare things rather than popular = ones are less likely to be free riders (because they may well be the = only source for the items they will be the target of requests no matter = what they report) and people who do not share at all will not be = captured (and they may be motivated to free ride by poor = connections).  Their technique for bottleneck bandwidth measurement = raises ethical issues.  Even if it is only done for a short time, = it appears they are essentially launching a DoS attack on the person's = connection for that time.  In 3.3.3, their explanation of a = disproportionate percentage of downloads by low bandiwdth peers does not = make sense.  They claim that low bandwidth peers tend to be free = riders and this explains why they make more than their share of = downloads.  However, there doesn't seem to be any reason why the = number of uploads provided should have any influence on the desire for = downloads of these peers.  If anything it would seem that they = should be consuming less because it takes longer for them to complete a = given download.  Finally, their numbers for the removal of nodes = from Gnutella have an error somewhere.  They claim they removed 30% = of 1771 and left 1300, but 70% of 1771 is 1240, which even if rounded is = closer to 1200.

The workload study reports the results of a 200 day trace of requests = made by users at the University of Washington on the Kazaa network and = some simulations based on this workload.  The most interesting = result of the trace is that the distribution of requests for large media = files is not Zipf, a fact that they are able to explain by the fact that = files a requested once rather than repeatedly as web pages are.  = They draw a distinction between two main classes of files on Kazaa: = small files < 10 MB (typically music) and large files > 100 MB = (typically movies).  Most requests are from the former but most = bandwidth is used by the later.  This means that to optimize user = experience a system should focus on the former while to optimize = bandwidth use it should focus on the latter.  They also propose and = simulate a system for decreasing the bandwidth used by a file sharing = system for a community sharing an external connection (for example a = university).  They dismiss caching because of potential policy and = legal issues.  Instead they propose directing requests to internal = peers whenever possible, effectively using them as a distributed local = cache.  Their simulations suggest that this can make a significant = difference in external bandwidth consumption.

------_=_NextPart_001_01C6383E.B6546D02-- From nsg7@cornell.edu Tue Feb 28 11:09:46 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1SG9jt03103 for ; Tue, 28 Feb 2006 11:09:46 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1SG9itu014240 for ; Tue, 28 Feb 2006 11:09:44 -0500 (EST) Received: from 132.236.227.119 by webmail.cornell.edu with HTTP; Tue, 28 Feb 2006 11:09:45 -0500 (EST) Message-ID: <1859.132.236.227.119.1141142985.squirrel@webmail.cornell.edu> Date: Tue, 28 Feb 2006 11:09:45 -0500 (EST) Subject: PAPER 9 From: "Nicholas S Gerner" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Samsara, PPay, SHARP and KARMA each support p2p resource exchange. Symmetric storage exchange allows one node may store data at a second if and only if it stores an equal amount for the second. Samsarsa supports this through claims which are uncompressible sequences of data created and validated by a storage node and stored by a storing node. Periodically a storage node B storing data for node A can validate that A still stores a claim for B. This claim may later be replaced by actual data from B as long as B still stores data for A. Samsara extends this notion to dependency chains where three or more nodes store data in a chain (A stores data on B who stores data on C), where there is only one claim (storage overhead) per chain at the root of the chain (so A stores a C's claim for B). If a claim fails a validation query, the node performing the query probabalistically ejects a storage block from the node failing the query. A might fail a validation query on B's claim, B might respond by ejecting A's storage block. While a simple system, there are several weaknesses. First if chains are avoided then the storage overhead for the system equals the amount of data stored in the system. Second, dependency chains are only as strong as the weakest link and a malicious or failed node in a dependency chain can trigger a cascading failure along the chain (in one direction). While the paper points out that dependency cycles are much more resilient to failures, in fact cycles are only resilient to one failure, after which they become a dependency chain. "PPay..." presents a coin exchange system where a set of trusted brokers issue coins to nodes. These coins can then be securely transferred from the owning node to another and delegated from node to node by layering secure delegations on the original coin (or previous delegation layers). These coins can later be cashed in at the brokers. This system is aimed at micropayments where the risk of each transaction is very small, so detection of illegal transactions can be deferred for some reasonable period (so load at the centralized brokers can be performed offline). Additionally a scheme using PPay coins to support pico payments (even smaller, less risky transactions) is presented which assumes that a node can extend a fixed amount of credit (divided amongst other nodes in transactions with the first). In this scheme transactions need not be collected on (by using PPay coins) until the fixed amount of credit is exceeded. The centralized nature of the brokers and implicit trust that all nodes must simultaneously share in the brokers goes against the spirit of completely decentralized systems. The picopayment scheme presented is well decoupled from the PPay system and might be adapted to use other payment schemes. "SHARP..." presents a system of resource claim advertising where nodes issue tickets which are (probabalistic) claims on resources at the node. Other nodes form a network and listen to each other's advertisements forwarding requests for these claims. These claims are validated by site domain authorities, which are named in the tickets, are trusted by nodes issuing tickets with that site domain. Resource managers cash in tickets across domains by contacting the site authority which in turn issues a concrete claim on the resources listed in the claim (or rejects the ticket if it conflicts with a previously cashed ticket). Tickets can be subdivided arbitrarily so one agent might obtain a ticket for a large block of resources advertised by another node. This large block might then be divided amongst many nodes on whose behalf the original agent acts. To maximize resource utilization, resources can be overbooked (much like airline flights) to handle tickets which are never claimed (due to holder failure, etc.). A study on PlanetLab is presented which uses SHARP to securely allocate PlanetLab computing resources amongst the PlanetLab constituents (including future resource reservations etc.). SHARP has many roles to be played by different agents and includes some semi-centralization within domains. Each domain must have a single site authority who is authorized to validate all tickets for this domain including keeping state on currently validated tickets and potentially future resource reservations. This model might be appropriate for some problems exhibiting nodes to which other nodes are willing to delegate this responsibility. And while we've seen that many p2p systems exhibit nodes with greatly different performance characteristics, it's not clear that such trust clusters exist generally. "Karma..." presents a system for completely distributed credit. A transaction between two nodes A and B involves A and B two additional sets of nodes acting as the bank sets for A and B. The bank set of a node A is responsible for keeping (by majority rule to support failures and disagreements, legitimate and malicious) the "karma" balance of A. When A requests resources from B costing some amount of karma (set by B), the bank sets for A and B transfer that amount of karma from A's account to B's account. The transaction involves seven steps starting with a request from A to B leading to communication between all memebers of A's bank set to B's bank set and to and from A and B and ending with the transfer of resources and a validatable receipt. This protocol ensures that B receives both the karma it is due and A receives it's resources and a receipt that ensures that B received payment. Several potential attacks are examined and ruled out by the protocol. No evaluation is presented and a mechanism to offset inflation (in a distributed manner) is presented, but poorly motivated. It's not clear what problems inflation will cause and it's not clear that additional complexity added by this anti-inflation protocol is worth the cost that inflation presents. Additionally, nodes are granted a certain amount of karma upon entering the system (in exchange for solving a puzzle, verified by the new node's bank set). It's not clear that this is the best choice and a zero balance initialization might be more natural (of course a method for introducing currency into the market would be needed in this case). No analysis of this initialization problem is presented. From gp72@cornell.edu Fri May 12 21:38:12 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe1.cs.cornell.edu (exchfe1.cs.cornell.edu [128.84.97.27]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k4D1cC210526 for ; Fri, 12 May 2006 21:38:12 -0400 (EDT) Received: from penguin.cs.cornell.edu ([128.84.96.11]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Fri, 12 May 2006 21:38:03 -0400 Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Fri, 12 May 2006 21:38:00 -0400 Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k4D1bwE6017066; Fri, 12 May 2006 21:37:59 -0400 (EDT) Received: from 128.84.98.245 by webmail.cornell.edu with HTTP; Fri, 12 May 2006 21:37:58 -0400 (EDT) Message-ID: <4952.128.84.98.245.1147484278.squirrel@webmail.cornell.edu> In-Reply-To: <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu> References: <2537.128.84.98.245.1147406914.squirrel@webmail.cornell.edu> <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu> Date: Fri, 12 May 2006 21:37:58 -0400 (EDT) Subject: PAPER 9 From: "Gopal Parameswaran" To: egs+summary@cs.cornell.edu Cc: gp72@cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-OriginalArrivalTime: 13 May 2006 01:38:00.0117 (UTC) FILETIME=[DE42E650:01C6762D] Measurements Study of peer to Peer file sharing Systems In This paper the authors analyze two Peer-to-Peer networks Napster and Gnutella and do detailed measurements on the bandwidths and the number of files shared and also on the latencies between hosts. Also the number of nodes that shows a lower upload speed is also studied. The bottleneck bandwidth that does not only depend on the upload speed of the node but also depends on the bandwidth of the routing path and so affects the bandwidth measurements. Also it was observed that 25% of the nodes did not choose to report their bandwidth and this is attributed to the lack of a penalizing mechanism for providing low upload speeds. Also the observations show that both Napster and Gnutella networks show highly similar percentage distribution of bandwidth for their hosts though Gnutella nodes have in general more bandwidth than those of Napster because 50% of users in Napster and 60% of users in Gnutella use broadband connections. Also the study shows the number of nodes required to break down the Gnutella network and how the network responds under attack. It was observed that for a node degree of 20, 60% of the nodes would need to be taken down for the overlay to be fragmented. Another interesting take from this paper is the skew shown in Napster downloads where though low bandwidth nodes have higher percentage of downloads than their actual participation in the network showing that high bandwidth nodes provide much the download bandwidth for downloading content. Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload In this paper the authors analyze the Kazaa network for effect of file sharing workloads and the effect of system parameters and the impact of locality awareness in Kazaa. The most important observation is made on the behavior of users who show patience for long downloads spanning days due to the slow nature of the network. Also the authors show that the popularity of objects have a short lifespan and the popularity keeps on transferring to new objects introduced into the system. Thus the users responses to the Kazaa network is widely different from the Internet where spontaneous responses are expected as compared to the Kazaa network. Also it was observed in the study that larger objects tend to age faster than smaller objects and arrival of newer objects play an important role in peer-to-peer systems as does newer content play an important role in the dynamics of the web and performance in the file sharing systems decrease over time in the absence of introduction of newer objects. Another interesting observation made by the author is that Kazaa does not follow Zip’s distribution.