From asr32@cornell.edu  Tue Feb 21 01:15:52 2006
Return-Path: <asr32@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,MAILTO_TO_SPAM_ADDR 
	autolearn=no version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1L6Fqt25459
	for <egs+summary@cs.cornell.edu>; Tue, 21 Feb 2006 01:15:52 -0500 (EST)
Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1L6Fp9B001618
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Tue, 21 Feb 2006 01:15:51 -0500 (EST)
Message-Id: <6.2.1.2.2.20060219150526.02e7c178@postoffice8.mail.cornell.edu>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2
Date: Tue, 21 Feb 2006 00:42:31 -0500
To: egs+summary@cs.cornell.edu
From: Ari Rabkin <asr32@cornell.edu>
Subject: PAPER 9
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed


Sybil:
         A Sybil attack is when a given principal acquires many different 
IDs, in order to take control of a disproportionate fraction of the 
network, to do damage.  A Sybil attack, properly defined, is a building 
block of some other attack.  In order to prove strong security properties, 
it would be very nice to be able to show that distinct machines in a P2P 
network are owned by different principals.  Alas, the central result of 
this paper is that there are no good strategies for limiting Sybil 
attacks.  In particular, techniques for directly proving that two hosts are 
distinct via network have an error margin proportional to the difference in 
resources between a "powerful" host and a "weak" one.  Different hosts must 
be tested simultaneously, rendering distributed challenges 
difficult.  Vouching schemes are also problematic.
         The Sybil paper unfortunately does a fairly shallow analysis.  It 
implicitly assumes that all authentication must be done across the network, 
through messages.  It assumes that every malicious node has a valid 
identity.  It implicitly assumes that identities are either accepted or 
not, with no middle ground.


Eclipse:
         In an eclipse attack, the attacker attempts to inject faulty nodes 
into every entry in the target host's routing table, with an eye to 
preventing proper overlay operations. In the end, the target host may have 
all its network operations mediated through hostile nodes.  To make this 
attack harder, the authors propose to limit the degree of nodes in the 
overlay network.  They propose a protocol for this, in which nodes can 
anonymously query their neighbors to verify that their neighbor's table 
includes them, and has a legal degree.  If not, then that neighbor is dropped.
         This defense is only partial.  The anonymous query protocol 
proposed has a number of weaknesses.  If an attacker controls a node and 
its entire anonymity set, the protocol fails.  An attacker can also cause 
innocent nodes to be marked as suspicious by falsifying their 
responses.  And even if the protocol works properly, an attacker's fraction 
of routing table entries can still be substantially higher than their 
fraction of overall nodes.


Secure routing on structured overlays:

         This paper proposes a general scheme for reliable delivery of 
messages on peer-to-peer networks, in the presence of malicious 
nodes.  Messages are sent redundantly, to each of several root nodes for a 
given object.  The authors propose to use two routing tables, one optimized 
for speed, the other constrained deterministically to limit an attacker's 
flexibility.  This does allow something otherwise impossible--very reliable 
behavior, with a substantial fraction of the network compromised.
         The protocol fails when more than around 25% of nodes are 
compromised.  When it does work, the proposed protocol is quite expensive, 
with substantial time and message cost; an attacker can easily force it 
every time, which would significantly degrade the network.  Another 
weakness is that the paper does not offer suggestions for cheaper protocols 
when the threat level is less severe.


Ari Rabkin  asr32@cornell.edu      Risley Hall 454   3-2842

The resources of civilization are not yet exhausted.
         --William Gladstone  

From asg46@cornell.edu  Tue Feb 21 10:47:56 2006
Return-Path: <asg46@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1LFlut04362
	for <egs+summary@cs.cornell.edu>; Tue, 21 Feb 2006 10:47:56 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1LFlsbW010088
	for <egs+summary@cs.cornell.edu>; Tue, 21 Feb 2006 10:47:54 -0500 (EST)
Received: from 128.84.98.90
        by webmail.cornell.edu with HTTP;
        Tue, 21 Feb 2006 10:47:55 -0500 (EST)
Message-ID: <2818.128.84.98.90.1140536875.squirrel@webmail.cornell.edu>
Date: Tue, 21 Feb 2006 10:47:55 -0500 (EST)
Subject: paper 9 - security
From: "Abhishek Santosh Gupta" <asg46@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

SECURITY

SYBIL ATTACK

exploiting redundancy in a system requires the ability to determine
whether two entities which appear to be different are actually distinct
entities. Forging of multiple entities is termed as Sybil attack.

the tempting idea of having a chain of identities vouch for other
identities fails if the initial bootstrap is compromised.

DIRECT VALIDATION
The only direct means by which two entities can convince a third entity
that they are distinct by performing some task that a single entity
cannot. However, the complexity of this task is limited by the weakest
valid entity in the system. Thus, if p is the ratio of the resources of a
faulty entity to a minimally capable entity then f can represent/forge
floor(p) entities.
(assuming these challenges are issued concurrently to all entities)

However, if the challenges are not concurrent to all entities forged by
the malicious entity then it can represent any number of forged entities.


INDIRECT IDENTITY VALIDATION
a group of identities validate another identity. However, a group of
faulty entities can vouch for other faulty entities.


DEFENDING AGAINST ECLIPSE ATTACK ON OVERLAY NETWORKS

if an attacker controls a large fraction of the neighbors of correct nodes
, then it can "eclipse" the correct node and prevent correct overlay
operation.

Strong structural constraints on the overlay are used to defend against
this attack.

When the eclipse attack is launched the average indegree of attacker nodes
will be higher than the average indegree of correct nodes ( this assumes a
defense against the Sybil attack)
thus, the correct nodes can identify faulty ones from their indegree values.
However, the faulty nodes can consume the indegree of the correct node and
prevent other nodes from pointing to them.
Thus, the outdegree of each node must also be bounded below a certain
threshold.

This also requires an efficient auditing scheme to prevent entities from
lying about their indegrees and outdegrees.the auditing scheme requires
each node x to maintain a back pointer list. Node x forwards packets only
from nodes in its backpointer list. x will periodically challenge members
of its back-pointer set for their neighbor set. If x is not included in
this set or the size of the set is greater than the threshold , x removes
the node from its back pointer list.  This requires sender anonymity to be
efficient - i.e the node being challenged must not know about the
challenger or else it could include x in its neighbor set. Anonymization
is carried by forwarding requests to a via its

neighbor node always (for all nodes x). A single anonymizer node does not
suffice as the anonymizer could be malicious and reveal the identity of
the sender. Randomizing the period between challenges from the same node
prevents the attacker from correlating the arrival time of the challenge
with the identity of the challenger.


SECURE ROUTING FOR...

this paper presents attacks which prevent correct message delivery in
structured peer-to-peer overlays and presents defenses against these
attacks.

secure routing requires :
1) a secure assignment of node identifiers
2) secure routing table maintenance
3) secure message forwarding


SECURE NODE ASSIGNMENT:
if attackers are allowed to choose node IDs they can create lots of
problems e.g. partition the overlay in 2 disjoint sets.
A formal term for it is the Sybil Attack.
solution : use a set of trusted CAs to  assign node Ids to principals and
to sign node certificates that bind a random nodeID to the public key of
the node and the IP address. Multiple nodeID certificates are allowed per
IP to prevent denial of service attacks by hijacking IP addresses.
However, this scheme does not work in those designs which cause node Ids
to change
over time(e.g CAN)
To prevent an attacker from obtaining a large number of certificates, we
could have them pay a certain amount per certificate so that Sybil attacks
could be prevented.
(Distributed NodeID generation suffers from causes mentioned earlier in
Sybil attack paper)


SECURE ROUTING TABLE MAINTENANCE
Attackers can spoil a routing table over time by supplying bad routing
updates to it. If they are able to intercept probe messages they can reply
using other nodes such that the probe sending node has a distorted idea
about the overlay (probes could be to check alive status or to improve
routing performance) this results in a cascading effect over time.
solution: the authors suggest using 2 routing tables :
one which is used for routing based on network proximity and the other
that constrains routing table entries.
(this is based on the observation that systems with strong structural
constraints on the set of node Ids that can fill a routing table slot are
less vulnerable to the above attack.) Thus a node updates it constrained
routing table by using a node that is closest to it(this acts a structural
constraint).

SECURE MESSAGE FORWARDING
Faulty nodes may choose not to forward packets thereby lowering system
performance.

solution : detect faults , use diverse routes

fault detection is based on  a time out mechanism.
once a fault is detected redundant routing is carried out - i.e. route
multiple copies of the message over along diverse paths. If  the nodeId
space is uniformly distributed then this is sufficient. In the other case
as in Pastry and Chord,

another technique called neighbor anycast that sends copies of the message
towards the destination key until it reaches a node with the key's root in
the neighbor set and then this node will use its detailed knowledge for
efficient routing.

iterative routing was rejected as at any hop an attacker could provide a
faulty node as the next hop.


From lackhand@gmail.com  Wed Feb 22 17:03:49 2006
Return-Path: <lackhand@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.6 required=5.0 tests=AWL,HTML_00_10,HTML_MESSAGE 
	autolearn=no version=3.1.0
X-Spam-Level: 
Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.193] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1MM3mt09150
	for <egs+summary@cs.cornell.edu>; Wed, 22 Feb 2006 17:03:49 -0500 (EST)
Received: by zproxy.gmail.com with SMTP id f1so1534574nzc
        for <egs+summary@cs.cornell.edu>; Wed, 22 Feb 2006 14:03:48 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type;
        b=CfeXYnChJd02PoO9xmjHvmoSypWq9zqd0f7uP2mw8dzB8eOchaHwHGDMM+yKX9FyJfryZxxbnTWEjwe9VA9XGIahht3wVwBl+Jt0uLIyoZzOY5Lq+5hhkxH8emA/rY9A07a76Np9lEQK/r12ysC5IkjZs/17oIWcBEvLNnlrZ2M=
Received: by 10.36.157.19 with SMTP id f19mr5561218nze;
        Wed, 22 Feb 2006 14:03:48 -0800 (PST)
Received: by 10.36.146.8 with HTTP; Wed, 22 Feb 2006 14:03:48 -0800 (PST)
Message-ID: <9aa7a97d0602221403u232e9eg8d6eefe9391f5579@mail.gmail.com>
Date: Wed, 22 Feb 2006 17:03:48 -0500
From: "Andrew Cunningham" <lackhand@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative; 
	boundary="----=_Part_24139_3983403.1140645828280"

------=_Part_24139_3983403.1140645828280
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham
arc39
    _A_Measurement_Study_of_Peer-to-Peer_File_Sharing_Systems_
    Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble

    This is an almost psychological study on the profiles of users of
Napster and Gnutella, and of their habits and capabilities. It includes
readings on bottleneck bandwidth between the various hosts and the internet
at large, how often hosts connect and disconnect forom the system, how many
hosts share and download, how much they cooperate, and the correlations
between these various behavior. The end result of this study is that there
is significant heterogeneity and lack of cooperation across peers
participating in these systems, (conjecture:) most likely due to the lack o=
f
reward structure in place to induce cooperation. It is humorous that the
authors feel that four days of Napster activity, and eight of Gnutella, is
sufficient to draw conclusions about the systems: they claim that the main
users of these systems are edge computers on the internet; the problem is
that for such uncoordinated beings, time-of-year can have vastly important
impact. For instance, the period during which they observed the systems is
during Cornell's end-of-year, final, and graduation period; user activity i=
s
likely to be anything other than normal for college students!
    Another odd result (under their Napster study) is that they discover
that high speed connections to the internet users constitute both more of
the peers and more than 3 times more of the upload traffic than modem, low
speed connecting clients. They do not seem to draw the same result from thi=
s
as I do, which is that some do choose to identify themselves as possessing =
a
high speed; they get more downloads and many more uploads, and behave in al=
l
ways slightly utopian. On the down side, there are a large number of
unknown-speed hosts; it is to these that the author's desire for a better
reward structure exists.
    They suggest that to solve the client's deliberate misrepresenting of
connection capability, that the software automatically configure itself;
this is a stopgap measure, but morally reprehensible. Not only does it not
solve the problem (if you rely on these values, it is in the interest of th=
e
community to hack together a version of the client which will underreport)
and make the user suspicious of what their computer is doing against them,
it is in some senses a breach of confidence. Moreover, it's out of the scop=
e
of this paper: better phrasing would be to simply issue a call-to-arms for =
a
solution, without suggesting a software, versus protocol, solution.


 _Measurement,_Modeling,_and_Analysis_of_a_Peer-to-Peer_File-Sharing_Worklo=
ad_
    Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble,
Henry M. Levy, and John Zahorjan

    Using a much larger trace of Kazaa traffic, this paper analyzes various
file-sharing behavioral hypotheses. Among these are a debunking of the
traditional view of Zipf-like distribution for content (due to specific
filesharing characteristics), and the implications of this datum. Among the
implications are optimizations for using the currently untapped locality of
operations to improve the Kazaa workload, with quantified bandwidth savings=
.
Fortunately, this paper avoids some of the problems of the previous one, in
that the data-gathering period was much longer, although for a much more
constrained population, and thus took into account human behavioral pattern=
s
(as they determined that a large amount of their campus traffic was P2P
related, studies during the summer will ignore that traffic!).
    One thing that was not considered was the amount of data transferred
over Kazaa; text, portable document format, and executable. The reason that
this is relevant is that these formats are somewhat mutable, and might tend
to combat the non-Zipf nature of Kazaa files; essentially, they form groups
and aggregate their downloads for a general popularity metric. Similarly,
the odd behavior surrounding new object arrival -- that new popular object
crystallize load and thus lead to better network load behavior -- ignores
the actual behaviors of users, who will seeks odd/archaic files regardless
of the general opinion of them. In my experience, of course. The other odd
news from this paper is bad news for consistent hashing schemes: to utilize
locality-aware request routing, it is beneficial for clients to be aware of
files that are within some internet-relevant domain -- perhaps on the same
subnet, etc. However, under the ring based systems that we've studied, the
nodeID of hosts (of objects) are some sort of hash of, say, the IP address
-- and thus likely to be completely disassociated from the ID of the object=
!
This is not an insurmountable obstacle -- with sufficient indirection,
nothing is insurmountable -- but it is a cause for a pause to think.
    Finally, there is little incentive for peers to use some sort of
locality aware caching scheme. This improves performance on a global sense,
but requires admitting that one is "officially" hosting a file; a better wa=
y
would be for the protocol to build some sense of this into itself through
some sort of anonymity routine. This is not a well thought out concept, but
the problem is simple: their specific implementation of the locality
heuristic relied on perfect global knowledge in a central location.

------=_Part_24139_3983403.1140645828280
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham<br>
arc39<br>
&nbsp;&nbsp; &nbsp;_A_Measurement_Study_of_Peer-to-Peer_File_Sharing_System=
s_<br>
&nbsp;&nbsp; &nbsp;Stefan Saroiu, P. Krishna Gummadi, Steven D. Gribble<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;This is an almost psychological study on the
profiles of users of Napster and Gnutella, and of their habits and
capabilities. It includes readings on bottleneck bandwidth between the
various hosts and the internet at large, how often hosts connect and
disconnect forom the system, how many hosts share and download, how
much they cooperate, and the correlations between these various
behavior. The end result of this study is that there is significant
heterogeneity and lack of cooperation across peers participating in
these systems, (conjecture:) most likely due to the lack of reward
structure in place to induce cooperation. It is humorous that the
authors feel that four days of Napster activity, and eight of Gnutella,
is sufficient to draw conclusions about the systems: they claim that
the main users of these systems are edge computers on the internet; the
problem is that for such uncoordinated beings, time-of-year can have
vastly important impact. For instance, the period during which they
observed the systems is during Cornell's end-of-year, final, and
graduation period; user activity is likely to be anything other than
normal for college students! <br>
&nbsp;&nbsp; &nbsp;Another odd result (under their Napster study) is
that they discover that high speed connections to the internet users
constitute both more of the peers and more than 3 times more of the
upload traffic than modem, low speed connecting clients. They do not
seem to draw the same result from this as I do, which is that some do
choose to identify themselves as possessing a high speed; they get more
downloads and many more uploads, and behave in all ways slightly
utopian. On the down side, there are a large number of unknown-speed
hosts; it is to these that the author's desire for a better reward
structure exists.<br>
&nbsp;&nbsp; &nbsp;They suggest that to solve the client's deliberate
misrepresenting of connection capability, that the software
automatically configure itself; this is a stopgap measure, but morally
reprehensible. Not only does it not solve the problem (if you rely on
these values, it is in the interest of the community to hack together a
version of the client which will underreport) and make the user
suspicious of what their computer is doing against them, it is in some
senses a breach of confidence. Moreover, it's out of the scope of this
paper: better phrasing would be to simply issue a call-to-arms for a
solution, without suggesting a software, versus protocol, solution.<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; &nbsp;_Measurement,_Modeling,_and_Analysis_=
of_a_Peer-to-Peer_File-Sharing_Workload_<br>
&nbsp;&nbsp; &nbsp;Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Stev=
en D. Gribble, Henry M. Levy, and John Zahorjan<br>
<br>
&nbsp;&nbsp; &nbsp;Using a much larger trace of Kazaa traffic, this
paper analyzes various file-sharing behavioral hypotheses. Among these
are a debunking of the traditional view of Zipf-like distribution for
content (due to specific filesharing characteristics), and the
implications of this datum. Among the implications are optimizations
for using the currently untapped locality of operations to improve the
Kazaa workload, with quantified bandwidth savings. Fortunately, this
paper avoids some of the problems of the previous one, in that the
data-gathering period was much longer, although for a much more
constrained population, and thus took into account human behavioral
patterns (as they determined that a large amount of their campus
traffic was P2P related, studies during the summer will ignore that
traffic!).<br>
&nbsp;&nbsp; &nbsp;One thing that was not considered was the amount of
data transferred over Kazaa; text, portable document format, and
executable. The reason that this is relevant is that these formats are
somewhat mutable, and might tend to combat the non-Zipf nature of Kazaa
files; essentially, they form groups and aggregate their downloads for
a general popularity metric. Similarly, the odd behavior surrounding
new object arrival -- that new popular object crystallize load and thus
lead to better network load behavior -- ignores the actual behaviors of
users, who will seeks odd/archaic files regardless of the general
opinion of them. In my experience, of course. The other odd news from
this paper is bad news for consistent hashing schemes: to utilize
locality-aware request routing, it is beneficial for clients to be
aware of files that are within some internet-relevant domain -- perhaps
on the same subnet, etc. However, under the ring based systems that
we've studied, the nodeID of hosts (of objects) are some sort of hash
of, say, the IP address -- and thus likely to be completely
disassociated from the ID of the object! This is not an insurmountable
obstacle -- with sufficient indirection, nothing is insurmountable --
but it is a cause for a pause to think.<br>
&nbsp;&nbsp; &nbsp;Finally, there is little incentive for peers to use
some sort of locality aware caching scheme. This improves performance
on a global sense, but requires admitting that one is &quot;officially&quot=
;
hosting a file; a better way would be for the protocol to build some
sense of this into itself through some sort of anonymity routine. This
is not a well thought out concept, but the problem is simple: their
specific implementation of the locality heuristic relied on perfect
global knowledge in a central location.

------=_Part_24139_3983403.1140645828280--

From ids3@cornell.edu  Wed Feb 22 22:39:41 2006
Return-Path: <ids3@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N3dft06565
	for <egs+summary@cs.cornell.edu>; Wed, 22 Feb 2006 22:39:41 -0500 (EST)
Received: from [128.253.212.208] (r253212208.resnet.cornell.edu [128.253.212.208])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N3dfHM004129
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Wed, 22 Feb 2006 22:39:41 -0500 (EST)
Message-ID: <43FD2E7D.2000007@cornell.edu>
Date: Wed, 22 Feb 2006 22:39:41 -0500
From: Ivan Stoyanov <ids3@cornell.edu>
User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

The motivation behind both papers is similar. They want to develop 
insights and draw conclusions about peer-to-peer networks by looking 
directly at a real deployments like Napster, Gnutella and Kazaa. The 
authors claim that actual usage patterns, node characteristic and 
motivation for participation are very important and should be taken into 
consideration when making design choices about the system and the 
related protocols.

Measurement, Modeling and Analysis of a Peer-to-Peer File-Sharing Workload

The paper focuses on the properties of the file-sharing phenomenon as 
delivered by the popular medium Kazaa. Through the data collected they 
make two important observations about the characteristics of users and 
demand.

-Unlike in the case of web site or DNS, popularity for large multimedia 
files does not follow a Zipf distribution. This is due to what they call 
"fetch-at-most-once" behavior, where the mutimedia files are requested 
once becase of their immutability (in contrast to web sites). Similarly, 
movie rentals and theater visits are actions committed once per object. 
The popularity graph of such objects has a "flat head", when drawn or a 
log-log scale.

-The primary forces in Kazaa are the creation of new objects and 
addition of new users. This is in contrast to the WWW, where constant 
mutation of objects generates the traffic. Users in Kazaa are "patient". 
Kazaa is a batch-mode delivery system as opposed to the "interactive" 
web. Also, the number of user requests tend to go down as the users "age".


A Measurement Study of Peer-to-Peer File Sharing Systems

This paper focuses more strictly to the profile of the users with 
respect to the available bandwidth, latency, availability and degree of 
sharing. Not surprisingly, the authors find that there is a great 
heterogeneity there. Their second observation is that when peers have no 
incentives they tend to misreport information.

One problem is that the data shown seems like a too small sample -- only 
four and eight days, respectively for Napster and Gnutella. Since some 
of the authors are the same in the other paper, they seem to have 
learned a lesson.

The main problem that the paper is trying to deal with is the 
misrepresenting of the available bandwidth. There is no good solutions 
to that--users will always report what they have incentives to report. A 
good solution will be an incentive to report bandwidth correctly or to 
somehow measure it directly.

The paper shows interesting graphs regarding the resilience of 
Gnutella's connectivity. If random 30% of the nodes are removed the 
network will be still connected. However, if the top 4% high-degree 
nodes are removed, the network will be partitioned in a large number of 
small networks.

From sh366@cornell.edu  Thu Feb 23 00:09:15 2006
Return-Path: <sh366@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=AWL autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N59Ft27737
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 00:09:15 -0500 (EST)
Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1N59ET3018450
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 00:09:15 -0500 (EST)
Message-ID: <544606907.1140671353535.JavaMail.webber@orpheus3.dataserver.cornell.edu>
Date: Thu, 23 Feb 2006 00:09:13 -0500 (EST)
From: Huang Shiang-Jia <sh366@cornell.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Mailer: uPortal WEB email client 3.0
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1N59Ft27737

The two papers measure the characteristics of Napster and Gnutella and the workload of Kazaa, respectively. Measurement on the characteristics of nodes is important in delegating responsibilities across them in the peer-to-peer system. Measurement on the forces that drive workload in peer-to-peer systems, mostly multimedia, helps understand the future of multimedia workload.

[Comments] As the paper observes a large fraction of server-like and client-like behaviors in Napster and Gnutella, a large-scale system may be classified into classes of peer-to-peer systems accordingly to the peers' resource, and peers may be required to provide a certain amount of uploads before they can download others’ shared files, to make it a real and fair peer-to-peer system.

<Measurement on Napster and Gnutella>
* Current proposed peer-to-peer routing protocols assume that nodes participating in the system behave equally (having the same resource…) and are willing to cooperate. This paper presents a measurement study over nodes in the system and shows that there is significant heterogeneity and lack of cooperation across them.
* Measurements are performed on Napster and Gnutella in this paper. They periodically crawl the systems to gather peer population snapshots and measure properties about them. These snapshots are crawled by issuing queries to central servers in Napster and by exploiting the ping/pong messages in Gnutella. The metadata of peers from the responses are kept in a list. Characteristics such latency, lifetime and bandwidth are measured for each snapshot.
* In latency measurement, they use TCP to discriminate against flows and large round-trip time. The latency, however, is dependent on the location of host from which it is measured. For lifetime, they measure IP-level uptime as well as application-level uptime. Finally, they use bottleneck link bandwidth as an approximation to available bandwidth because bandwidth may fluctuate significantly. As the bottleneck generally equal to capacity of the slowest hop, it’s a physical property that remains constant over time.
* The results show (1) peers tend to have higher downstream than upstream bandwidth; (2) there are three classes of latencies a peer interacts with: same part of a continent, opposite part of a continent and trans-oceanic peers; (3) on average, Napster peers have longer uptime than Gnutella peers; (4) there are free-riders (nodes having little or no shared data but always download files) in both systems and this is more significant in Gnutella where 7% peers offer more files than all of the other peers; (5) Napster peers have an incentive to report a small bandwidth than the true value; (6) though having no central servers, Gnutella is vulnerable to attacks that are directed to best-connected, popular and high degree nodes.
* The heterogeneity on latency, bandwidth and availability implies that a peer-to-peer system should delegate responsibilities (storing replicas or popular data) across the nodes based on their characteristics. The dishonest report of nodes implies that a peer-to-peer system should have built-in incentives for them to be honest or directly measure the information.

<Measurement on Kazaa>
* This paper explores the forces that drive current peer-to-peer file-sharing workload and implications for their future.
* Peer-to-peer file-sharing system differs from web content distribution in its data type, multimedia files. The large size and immutability of these data have effects both on user behavior and on object dynamics:
(1) Many of these data are fetched at most once per client. This paper shows that such kind of behavior causes the Kazaa population distribution to deviate substantially from Zipf curve. They compare Kazaa with other non-Zipf workload such as web with proxy caching or media streaming and use a model to explore that the non-Zipf behavior in Kazaa is best explained by the fetch-at-most-once behavior of clients.
(2) Popularity of the objects is often short-lived, popular objects tend to be recently born and many requests are for old objects. Along with the analysis in their model and other properties presented in this paper, such as: Kazaa is a batch-mode delivery system where users are more patient, while web is an interactive system where users are sensitive to page-fetch latency; and new clients generate most load in Kazaa while older clients consume fewer bytes as they age, they conclude that, unlike the web system whose workload is driven by document changes, the primary forces in Kazaa are creation of new objects and addition of new clients.
* This paper finally demonstrates an untapped locality existing in Kazaa workload and presents a locality-aware mechanism, implemented by either centralized or decentralized nodes that redirect to caches from local peers, for reducing external downloads. Its concern is the availability of the content, since the redirectors have no control over the peers.

From asr32@cornell.edu  Thu Feb 23 00:47:04 2006
Return-Path: <asr32@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.4 required=5.0 tests=AWL,MAILTO_TO_SPAM_ADDR 
	autolearn=no version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N5l4t06663
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 00:47:04 -0500 (EST)
Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N5l3FJ019677
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 00:47:04 -0500 (EST)
Message-Id: <6.2.1.2.2.20060221205640.03084a78@postoffice8.mail.cornell.edu>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2
Date: Thu, 23 Feb 2006 00:47:03 -0500
To: egs+summary@cs.cornell.edu
From: Ari Rabkin <asr32@cornell.edu>
Subject: PAPER 9
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed

[this time for real...sent a few previous paper responses with bad subject 
lines...fencepost error on the numbering, coupled with carelessness]

A measurement study of peer-to-peer filesharing systems

         It would be nice to know what the 'typical' napster or gnutella 
node looks like, and the authors set out to measure this.  They built a 
small toolbox of programs for measuring bandwidth, latency, and so forth 
for a large sample of napster and gnutella nodes, over a short period of 
time (4-8 days).  The authors show that nodes are very diverse, in terms of 
bandwidth, uptime, latency, and files shared.  They provide detailed 
numeric data on each.  For instance, they show that the bulk of nodes on 
both systems are users connecting through cable modems and DSL links, but 
that there are sizeable fractions of LAN users, and even of modems.
         Unfortunately, precisely because of the short measurement period, 
the authors may not capture long-term, or even annual, variation in the 
properties of peer-to-peer networks.  These data are perhaps relevant for a 
year or two; as the overall demographics of the internet shift, it seems 
likely that these results are no longer directly applicable.  The paper's 
Figure 2 shows interesting diurnal variation; sadly, the authors do not 
indicate whether their data is measuring local time or GMT, making it much 
less useful.  The authors mention that other Gnutella clouds may exist; it 
would be interesting to know whether different clouds have different 
geographic distributions.


Measurement and modelling

         Whereas the previous study examined peer to peer hosts qua hosts, 
this study concentrates on the data being shared, and the users qua 
users.  The study looked just at the UW campus, and just at one P2P 
protocol; Kazaa.  The authors present population studies on object 
popularity, and how Kazaa users change their behavior over time.  There 
were a number of interesting results.  To give one: due to the fact that 
users generally download a given file only once, the distribution of 
requests is not zipfian--the most popular items are "clamped" in 
popularity, being requested less often than Zipf's law would suggest.  The 
authors use these results to propose a statistical model of user behavior 
and file choices.
         This study is quite narrow in both breadth and scope. It is 
limited to a few months, and to a heavily-networked university campus.  It 
is not obvious that the results generalize to other network settings, or 
even whether they are constant over time.  The authors assert that "fetch 
at most once" is a fundamentally different cause for non-zipfian 
distribution from web caching.  This is not necessarily true: the user's 
hard disk is essentially a very large cache for media files of interest to 
the user, who may play the same song or movie many times.  The fundamental 
phenomenon resulting in non-Zipf distribution here could plausibly be 
caching--whether the cache is shared or per-client is not fundamental for 
analyzing the distribution.


Ari Rabkin  asr32@cornell.edu      Risley Hall 454   3-2842

The resources of civilization are not yet exhausted.
         --William Gladstone  

From tc99@cornell.edu  Thu Feb 23 01:33:16 2006
Return-Path: <tc99@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N6XGt18518
	for <egs@cs.cornell.edu>; Thu, 23 Feb 2006 01:33:16 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1N6XFA8016999
	for <egs@cs.cornell.edu>; Thu, 23 Feb 2006 01:33:15 -0500 (EST)
Received: from 24.59.114.243
        by webmail.cornell.edu with HTTP;
        Thu, 23 Feb 2006 01:33:15 -0500 (EST)
Message-ID: <1412.24.59.114.243.1140676395.squirrel@webmail.cornell.edu>
Date: Thu, 23 Feb 2006 01:33:15 -0500 (EST)
Subject: paper 9
From: "Theodore Ming Shiuan Chao" <tc99@cornell.edu>
To: egs@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

The two papers discussed both measure certain aspects of a few popular P2P
File  Sharing applications, as those were the most common use of
large-scale P2P  networks.
The first paper measures the bandwidth (reported and/or actual estimates),
session  durations, and number of shared files in Gnutella and Napster
networks. They found  that the two networks shared many similar features.
For one, a large fraction of  the users in both networks were behind
broadband connections with downstream  bandwidths of 1000 kbps. However,
many of the users were behind asynchronous  connections with lower
upstream bandwidth, resulting in a significant difference  between total
available upstream and downstream bandwidth in the network. They also 
noticed the CDF of the session time of users resembles a log scale, with
over half  the users with session times under an hour and 80% under 3
hours, and that a small  percentage of the users account for a large
percentage of the files shared.
They also note some differences between the two networks. One is that
Gnutella has  a far smaller percentage of <64kbps bandwidth users, and
another is that the size  of files on Gnutella varies much more than
Napster where nearly all the files are  <10MB audio files. They also note
that Gnutella can be vulnerable to organized  attacks on nodes with
especially high degrees, which can cause large-scale  partitioning of the
network overlay.
The second paper takes a look at Kazaa, and focuses on the distribution of
queries  over time. The major point to note is because of the
"fetch-at-most-once" property  of the media files shared in the P2P
network, the distribution of requests does not  follow a Zipf-like
distribution that is commonplace when viewing Web traffic. Even  assuming
an initial Zipf-like distribution, opular objects are downloaded quickly, 
but once each user has fetched it, there is no need for it to request it
again and  the number of requests for it decreases as the age of the
object increases. Influx  of new popular objects can bring the
distribution closer to the Zipf distribution,  but it still quickly falls
off. New clients that do not have the object provide a  marginal gain, as
they represent merely a small percentage of the population,  unless the
influx of clients is on an exponential scale.

While it is likely true that most users in the studied P2P networks are 
free-riders, the first paper has a flaw in that it doesn't take into
account  POPULARITY of files each user is sharing. If you consider that
the most popular  files are the most often downloaded, and by default the
program will share the  files a user downloads, there is a chance that
this could serve as a crude form of  replication. Thus, a user with a
small number of files could be serving some of the  more popular ones.
Conversely, a user with a large number of files might not  necessarily be
satisfying more queries if the objects it is serving are not as  popular.
A possible extension would to be combine the two studies: examine the 
popularity and number of files shared on users as a function of time. As
objects  age, do users replace the files they are sharing with newer
popular objects? Or do  they keep sharing the old objects, possibly in
addition to newer ones?

For the second paper, they also only looked at the most popular requests
in the  first and last 30 days. 6 months is quite a long turn-around
period for files; a  more informative experiment would be to take
snapshots of popular files every 30  days. Then they could look at the
lifetime of different file types/sizes, and also  see how the request
fall-off for popular objects looks compared to the fall-off of  less
popular objects.

From pjk25@cornell.edu  Thu Feb 23 01:56:12 2006
Return-Path: <pjk25@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N6uCt23161
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 01:56:12 -0500 (EST)
Received: from [10.0.1.2] (cpe-69-207-41-159.twcny.res.rr.com [69.207.41.159])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1N6uBKL023631
	(version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 01:56:12 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Transfer-Encoding: 7bit
Message-Id: <E0CB1CCE-E562-4117-B211-BCA18EA79EE2@cornell.edu>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: egs+summary@cs.cornell.edu
From: Philip Kuryloski <pjk25@cornell.edu>
Subject: PAPER 9
Date: Thu, 23 Feb 2006 01:58:57 -0500
X-Mailer: Apple Mail (2.746.2)

A MEASUREMENT STUDY OF PEER-TO-PEER FILE SHARING SYSTEMS

The authors produce a survey of the characteristics of peers in both  
the Napster and Gnutella file sharing networks.  In order to achieve  
this, they crawl each network to obtain a list of peers, and then use  
that list to scan and test peers in the network.  The napster crawl  
was performed by obtaining a list of popular files from the web,  
querying the napster network, and recording hosts with copies of the  
file.  The Gnutella crawler would use the Gnutella protocol to flood  
the network with ping messages to gather hostnames.

Once a list of peers had been generated, number of shared files, ping  
latency, uptime or lifetime, and bandwidth to the internet was  
measured for individual peers.  Distribution of peers across DNS  
domains was also measured.  They find that very few peers server like  
high bandwidth (> 3Mbps) connections, and that a significant portion  
of peers have very low bandwidth modem grade connections.  Also,  
upstream bandwidth is typically lower than downstream bandwidth.   
They note that latencies in the Gnutella network vary greatly due to  
the unstructured nature of the network.  In terms of uptime, Napster  
peers have significantly higher availability than Gnutella peers,  
although no network has a large number of peers with server-like  
uptime (93% +).  There are also a large number of freeloading users  
in both networks, and a significant number of users who misreport  
their bandwidth.

Based on these findings, the authors derive several suggestions for  
p2p network designers.  The great variation in peers suggests that  
systems should delegate different levels of responsibility to  
different peers.  Also, the system should do it's best to measure  
peer capability rather than rely on peer supplied values, as these  
are often misreported.


MEASUREMENT, MODELING, AND ANALYSIS OF A PEER-TO-PEER FILE-SHARING  
WORKLOAD

This work focuses around a 200 day trace of Kazaa traffic.  The  
authors begin by generally classifying aspects of this traffic as  
compared to web traffic.  They find that web traffic is driven by the  
dynamic nature of pages, while Kazaa traffic is generated by the  
introduction of new content and by users entering and leaving the  
system.  Generally this is due to the immutable nature of typical  
Kazaa content, where users are likely to download an item once, as  
opposed to a webpage they may visit repeatedly.  The also find that  
Kazaa users are extremely patient with content requests, unlike web  
page requests.  Another interesting finding is that users request  
less content as they age in the system.  In terms of content on the  
network, there are three natural classifications, small (< 10MB),  
medium (10-100MB), and large (>100MB).  The majority of the requests  
are for small files, however large files generate the most traffic  
overall.  Also, popularity for objects is short lived, although less  
so for large files than small.  However, older objects receive more  
requests overall than new ones.  Finally, they note that the  
popularity of Kazaa objects does not follow the same Zif distribution  
as Web objects.

The authors also suggest that locality-aware routing, mainly forming  
distributed local caches to reduce overall network traffic.  The most  
popular objects are replicated more often.

In both papers, the authors find that p2p networks are often a far  
cry from the homogeneous set of nodes envisioned as the substance of  
the network.  In fact, peers vary greatly in capacity, as does object  
popularity.  Most importantly, the types of objects transferred in  
p2p networks is quite different from that of the web.  In essence, a  
p2p network is highly dynamic, and this is the most notable design  
challenge and stress upon the network, not due to it's content, but  
instead due to the peers itself.

From ryanp@cs.cornell.edu  Thu Feb 23 03:38:24 2006
Return-Path: <ryanp@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1N8cNt18541
	for <egs@unix.cucs>; Thu, 23 Feb 2006 03:38:23 -0500 (EST)
Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 03:38:23 -0500
Received: from [128.84.223.115] ([128.84.223.115]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 03:38:19 -0500
Message-ID: <43FD74F1.9000100@cs.cornell.edu>
Date: Thu, 23 Feb 2006 03:40:17 -0500
From: "Ryan S. Peterson" <ryanp@cs.cornell.edu>
User-Agent: Mozilla Thunderbird 1.0.2-6 (X11/20050513)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 23 Feb 2006 08:38:19.0213 (UTC) FILETIME=[7F614FD0:01C63854]

Two papers out of the University of Washington measure characteristics 
of early peer-to-peer file sharing systems to get a sense of traffic 
patterns and client behavior over time.  The first paper, by Saroiu, et 
al., studies Napster and Gnutella using crawlers to scour the networks 
for a interval of four to eight days.  The paper comes away with two 
major observations about large multimedia sharing p2p systems.  First, 
although such systems often assume homogeneity in clients' bandwidth, 
latency, and number of files shared, clients across the network often 
differ by three to five orders of magnitude on all those 
characteristics.  Second, to game the system, many clients seem to 
misreport information about themselves, such as connection speed.  By 
doing so, clients have less responsibility for uploading files, and can 
be "free riders," using the service to download files without 
reciprocating the favor.  Without auditing algorithms in place to 
prevent misrepresentation, approximately 25% of all Napster and Gnutella 
users are free riders.

The second paper, by Gummadi, et al., focuses on Kazaa, a later p2p 
multimedia-sharing system.  They tracked Kazaa traffic in and out of the 
University of Washington for a 200-day interval.  Much of this paper is 
dedicated to comparing query distributions in Kazaa versus the Web.  In 
particular, the paper shows that Kazaa queries do not follow a Zipf 
distribution as Web queries do.  The most popular files are not as 
popular as a Zipf distribution would predict, making the popularity 
function "flatten out" at the popular end.  The authors attribute this 
phenomenon to the "download once" model.  Since Kazaa distributes 
immutable multimedia objects, there is little reason to download the 
same file multiple times.  Contrast this with Web pages, which a client 
might check several times every day.  The authors also point out various 
other statistics about Kazaa users, such as their patience as compared 
with Web users, who often give up on downloading a Website after a 
matter of seconds.  Kazaa users treat the network as a "batch download" 
program, waiting hours of days before their requested downloads are 
complete.

Both papers note the heterogeneity among clients and nodes in the three 
studied p2p networks.  Because of the vast differences in machine 
capabilities and network conditions throughout the networks, newer 
systems have begun taking such conditions into account.  Peer-to-peer 
networks, however, despite their dominance on world-wide network 
traffic, are still not as well understood as traditional Web services.

Ryan

From ns253@cornell.edu  Thu Feb 23 10:27:38 2006
Return-Path: <ns253@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NFRbt14218
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 10:27:37 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1NFRZkR024330
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 10:27:36 -0500 (EST)
Received: from 69.207.49.126
        by webmail.cornell.edu with HTTP;
        Thu, 23 Feb 2006 10:27:36 -0500 (EST)
Message-ID: <36076.69.207.49.126.1140708456.squirrel@webmail.cornell.edu>
Date: Thu, 23 Feb 2006 10:27:36 -0500 (EST)
Subject: PAPER 9
From: "Niranjan Sivakumar" <ns253@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Niranjan Sivakumar

A Measurement Study of Peer-to-Peer File Sharing Systems
Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload

Both of these papers deal with the issue of probing popular file-sharing
systems to try to derive some meaningful characteristics.  The first paper
set up crawlers for Napster and Gnutella.  Different findings were made
about bandwidth usage and the contribution of different nodes in the
networks.  It was found that nodes generally had lower upstream bandwidth
than downstream.  It was also found that there were many leechers on the
network.  Despite the appearance of a system where many nodes were each
carrying out similar responsibilities, the models of these systems seemed
to be much more like a modification on the traditional client server model
with much of the load borne by larger, more resourceful nodes with a
significantly smaller load on a large number of other nodes.  Gnutella
showed some resilience to failure, but it was shown that a coordinated
attack would be able to "shatter"; the system because of the uneven load
distribution.  It was also found that there is quite a bit of churn on
these systems, worse on Gnutella than on Napster.

The second paper considered measurements of Kazaa.  First, the authors
compared Kazaa usage to web usage.  They found that Kazaa users are
generally more patient than web users.  Also, Kazaa usage differs from web
usage because Kazaa users will often obtain an object only once from the
system, as opposed to a website that a user may visit many times due to
its changing nature.  Distribution of object popularity was also
considered.  It was shown to be non-Zipf, like websites, but the reasons
for the distribution were different between websites and Kazaa objects. 
Website distribution was driven by the nature of proxy caches, whereas
Kazaa doesn't really have a caching system like that.  The study also
showed that file-sharing effectiveness decreases with client age.  In
turn, this showed that the infusion of new objects rejuvenated the system.
 However, new clients joining the network did not have the same effect. 
The paper also looked at ways to improve performance, and showed that a
locality-aware mechanism may help to reduce bandwidth usage.

Both of these papers may not seem too ground breaking when considered in
retrospect, as some of the findings have now become somewhat commonly
known knowledge.  However, given the time when the measurements were made,
it was probably an important step to understanding such file-sharing
systems.  However, one of the drawbacks of these papers may be the
temporal nature, the papers do not look forward too much with respect to
changing Internet infrastructures, such as the recent development of
significantly higher speed networks in Europe and Asia, where bandwidths
may increase significantly and equalize in terms of up and downstream
capacity.


From km266@cornell.edu  Thu Feb 23 11:53:18 2006
Return-Path: <km266@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.3 required=5.0 tests=AWL autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NGrHt11165
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 11:53:17 -0500 (EST)
Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1NGrGPH027179
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 11:53:17 -0500 (EST)
Message-ID: <000501c63899$a4fbb970$4425cf45@KEVSTOY>
Reply-To: "Kevin" <yobz@cornell.edu>
From: "Kevin" <km266@cornell.edu>
To: <egs+summary@cs.cornell.edu>
Subject: PAPER 9
Date: Thu, 23 Feb 2006 11:53:16 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2527
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527

The first paper (A Measurement Study of Peer-to-Peer File Sharing Systems) 
is a focused study on the usage patterns of people on the Gnutella and 
Napster file-sharing networks.  The authors developed a web-crawling-like 
system to scan the p2p infrastructure of the two networks.  They gathered 
information on number of files shared, reported bandwidth, actual bandwidth 
(by measuring it themselves), current # of uploads, current # of downloads, 
and other information.  They monitored Napster for four days and the 
Gnutella network for eight days.  Their results seemed to show that people 
on the Gnutella network were almost "better" than those people on the 
Napster network.  On average, their latencies were lower, their bandwidths 
were higher, and they misreported less.  Their initial reaction was the 
tech-savvy nature of people who used the Gnutella network (therefore having 
higher bandwidth connections).  One of the big takeaways of this paper was 
the heterogeneity of the users.  Many users (especially those on slower 
connections) acted more like clients while those on faster connections acted 
more like servers.  The authors kept making the argument that future p2p 
system designers should create systems which take this into account and do 
not assume everyone has a similar speed connection.
    One of the arguments the authors also constantly made was that software 
should not ask users for their bandwidth and other information, but rather 
that the software should measure that itself.  It seems almost ironic that 
they first say that high speed connections act like servers and are (on 
average) up more frequently and for longer durations yet they do not trust 
the end user.  It seems like the high speed connections are already doing a 
favor to the network by being up so much, it seems as if the authors do not 
trust the users.  Other problems (such as small sample size) are an issue, 
but this is resolved in the second paper.

The second paper,  Measurement, Modeling, and Analysis of a Peer-to-Peer 
File-Sharing Workload, was a detailed look at the Kazaa file sharing 
network.  This paper had a much longer sampling period (over half a year) 
which allowed the authors to view human-related cycles (such as college 
students going home over summer-break and coming back in the fall).  One of 
the first things the paper seems to point out is that people are willing to 
wait for content.  Unlike web surfing (where people normally don't wait for 
more than a few seconds before moving on to the next, possibly competing, 
site), people who downloaded content through Kazaa had no problem waiting a 
day, a week, or even longer to get files.  People, they discovered, are 
patient!  The next takeaway from this paper was that files on the Kazaa 
network did not follow a Zipf-like distribution.  Unlike DNS queries and 
other internet systems, the content, when graphed on a log-log plot, was 
clearly not Zipf.  They explain this by the fetch-at-most-once concept. 
Basically, once you have a media file you no longer need to download it. 
You have it stored locally and the external network's version is 
immutable -- there is no reason to download it again.  The phenomenon of 
"the new song" which is often immediately popular, might skew the network to 
be more zipf like temporarily, but then quickly dies back down as more and 
more people have the file.  The authors also measure something else of 
interest: the number of requests for small files (those under 10 megabytes) 
greatly outnumber (by over 10x) requests for large (>100 megabytes) files 
while the actual transferred bytes show that 65% of the bytes transferred 
are for big files and closer to 15% for smaller files.
    All in all, the papers were useful for determining the behaviors of the 
system and the users and should give insight to future p2p developers. 
While not all of the schemes are immediately realizable (like locality in 
the second paper, which will be difficult with hashing), they give good 
pointers.

--Kevin 

From victoria@cs.hmc.edu  Thu Feb 23 11:57:00 2006
Return-Path: <victoria@cs.hmc.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NGuxt11725
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 11:56:59 -0500 (EST)
Received: by turing.cs.hmc.edu (Postfix, from userid 34382)
	id 72A6B5325D; Thu, 23 Feb 2006 08:56:53 -0800 (PST)
Date: Thu, 23 Feb 2006 08:56:53 -0800
From: Victoria Krafft <vmk@cs.hmc.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
Message-ID: <20060223165653.GA22544@cs.hmc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i

So far, all the peer-to-peer networks we've looked at have made some
general assumptions about the hosts which will be running the systems.
We've assumed that network bandwidth and latency were the limiting
factors, and not worried too much about CPU time or memory usage.
However, we had no idea how much bandwidth the average node had, or
how much this varied between nodes.  These papers study the
characteristics of the nodes running file-sharing software.

The paper by Saroiu et al. showed that the nodes in both Gnutella and
Napster were heterogeneous, with variations of several orders of
magnitude.  In addition, their work showed that nodes were likely to
misrepresent their characteristics, such as available bandwidth.
They also found that the 'free rider' problem from economics carries
over into peer-to-peer networks; a portion of the nodes in the system
only downloaded files, and never uploaded them.  Furthermore, 50% of
the sessions lasted less than 60 minutes, which gives us a value to
use for network churn. All of these facts suggest that the assumptions
made when creating and evaluating several popular peer-to-peer
networks were not correct.

The second paper by Gummadi et al. presents long-term study of the use
of the Kazaa peer-to-peer file sharing system.  They noted several
important facts about the characteristics of file-sharing networks.
For one thing, unlike web pages, objects in such a network will
generally be downloaded at most once by a single node. Kazaa users
were willing to wait for several days for a file to be downloaded,
making Kazaa more like a batch mode system.  There is also a potential
for significantly reduced bandwidth usage if caches or locality
awareness was used.

Both of these papers present the characteristics of nodes on a file
sharing network; they do not examine any other types of overlay
networks.  It has also been several years since these papers were
published, so the characteristics of nodes in a file-sharing network
may have changed over time.  The file-sharing networks have also
changed as a result of these studies; some networks require their
users to maintain a certain upload/download ratio, and there are
currently several projects which attempt to provide load balancing for
heterogeneous networks.
 
-- Victoria Krafft

From nsg7@cornell.edu  Thu Feb 23 12:00:45 2006
Return-Path: <nsg7@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NH0jt13068
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 12:00:45 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1NH0hUx027119
	for <egs+summary@cs.cornell.edu>; Thu, 23 Feb 2006 12:00:43 -0500 (EST)
Received: from 132.236.227.119
        by webmail.cornell.edu with HTTP;
        Thu, 23 Feb 2006 12:00:44 -0500 (EST)
Message-ID: <1156.132.236.227.119.1140714044.squirrel@webmail.cornell.edu>
Date: Thu, 23 Feb 2006 12:00:44 -0500 (EST)
Subject: PAPER 9
From: "Nicholas S Gerner" <nsg7@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Saroiu et al.'s "A measurement study..." present measurement studies of
the p2p content distribution systems Gnutella and Napster.  A methodology
for colling information related to IP address/port, upstream and
downstream bandwidth (as reported and directly measured), node lifetimes
(and statuses throughout lifetimes) and workload generation is presented. 
This methodology is used to collect and present data regarding the node
composition of these networks: bandwidth capabilities as reported vs
measured, which can differ significantly for a significant number of peers
and varies considerably across peers, workload generation distribution and
the distribution of nodes which service that workload.  The conclusion
made from this presentation is that nodes in these networks can vary
considerably in bandwidth and workload generation and servicing.  And
nodes may often misrepresent themselves or provide no benefit to system
(such as serving content).

Gummadi et al.'s "Measurement, Modelling..." presents another study of a
p2p content distribution system, Kazaa on the University of Washington
campus.  Specifically the system considered the Kazaa traffic leaving out
of and coming into the campus network.  Similar statistics to "A
measurement Study..." were taken over a 200-day period.  Additionally,
content classes were identified.  The presentation here highlights the
significant difference between the workload distribution between a p2p
content distribution network and the web.  The notion of
"fetch-at-most-once" behavior in Kazaa is compared to "fetch-repeatedly"
behavior in the web.  This difference is pointed to as driving the
divergence from Zipf workload distribution in p2p networks.  A model of
this "fetch-at-most-once" behavior is presented which highlights that
object and client births drive the workload in a p2p file sharing
application versus object change which drives web workloads.  Finally, the
notion of locality aware routing is considered as a possible approach to
address bandwidth consumption experienced by oganizational networks like
Universities.

From tudorm@cs.cornell.edu  Thu Feb 23 12:33:47 2006
Return-Path: <tudorm@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NHXlt23090
	for <egs@unix.cucs>; Thu, 23 Feb 2006 12:33:47 -0500 (EST)
Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 12:33:47 -0500
Received: from [128.84.223.148] ([128.84.223.148]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 12:33:46 -0500
Message-ID: <43FDF1FA.6080603@cs.cornell.edu>
Date: Thu, 23 Feb 2006 12:33:46 -0500
From: Tudor Marian <tudorm@cs.cornell.edu>
User-Agent: Thunderbird 1.5 (X11/20051201)
MIME-Version: 1.0
To: egs+summary@cs.cornell.edu
Subject: PAPER 9
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 23 Feb 2006 17:33:46.0648 (UTC) FILETIME=[4CD2D980:01C6389F]

<A Measurement Study...>

This paper starts from the premise that p2p systems should take under 
consideration the characteristics of the peers that participate in the 
system, reports on measurements of bottleneck bandwidth, latency, 
availability and file sharing patterns of peers performed by crawling 
the Napster and Gnutella networks, and conclude by providing 
recommendations to p2p systems designers. The study showed that most of 
the users are behind modest bandwidth connections and have asymmetric 
connections with higher downstream then upstream capacity. Moreover the 
study revealed that peers tend to misreport their bandwidth capacity in 
order to avoid upstream bandwidth consumption, they are not willing to 
cooperate and the systems provide  no incentives for them to do so to 
the point where 25% of the peers are free-riders. Another key 
observation is that even if designed with symmetry in mind, there is a 
large discrepancy between the peers. The authors propose that effective 
delegation  of responsibility should take under consideration accurate 
characteristics of the peers for the whole system to perform better.

<Measurement, Modeling, Analysis...>

The second paper reports on a measurement study that was performed 
against the Kazaa p2p system over a period of 200 days. The most notable 
observation is the fact that Kazaa shared file distribution, unlike the 
web, does not obey a power law. This is attributed to the immutable, 
fetch-it-once nature of the relatively large files (objects) shared. 
Popularity of objects decreases as they get to be downloaded more, since 
there's no reason to get the same file multiple times, yet this 
phenomenon is counter-balanced to some extent by the injection of new 
files in the system. New nodes joining do not have the same effect 
however, since they cannot counteract the penalty incurred by aging -- 
as Kazaa clients age they demand less from the system. The paper also 
shows that even if the popularity distribution is not bound by a power 
law there exists enough locality in the workload for caching to reduce 
the bandwidth consumption.


Tudor

From okennedy@cs.cornell.edu  Thu Feb 23 12:53:13 2006
Return-Path: <okennedy@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1NHrCt28867
	for <egs@unix.cucs>; Thu, 23 Feb 2006 12:53:12 -0500 (EST)
Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 12:53:12 -0500
Received: from [128.84.98.36] ([128.84.98.36]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 12:53:12 -0500
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Transfer-Encoding: 7bit
Message-Id: <56DB5146-FFBC-4C1C-BCB3-91FC1A7CA5CB@cs.cornell.edu>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: egs+summary@cs.cornell.edu
From: Oliver Kennedy <okennedy@cs.cornell.edu>
Subject: PAPER 9
Date: Thu, 23 Feb 2006 12:53:21 -0500
X-Mailer: Apple Mail (2.746.2)
X-OriginalArrivalTime: 23 Feb 2006 17:53:12.0217 (UTC) FILETIME=[038E9090:01C638A2]

This week's papers (both written by the same group) attempt to  
direction to people creating P2P network applications through  
statistics gathered from real world cases.  The first paper uses  
Napster and Gnutella as case studies, while the second paper focuses  
on Kazaa.  The interesting conclusions, in no particular order are:
1) A large portion of P2P users use more system resources than they  
provide.
2) A small portion of P2P users provide a large portion of the  
resources available on the network.
3) People disable their P2P applications when they are not using them
4) While Gnutella is resilient in the face of most failures, there  
are a handful of nodes that act as key routers.  A concerted attack  
could cripple the Gnutella network.
5) Most files distributed by P2P applications do not follow a zipf  
distribution, as clients will only request them once.  Thus, an  
effective limit on download count is imposed.
6) Large objects represent the largest portion of P2P traffic,  
despite representing the smallest portion of P2P requests.  By  
comparison, small objects are requested frequently but represent only  
a small portion of the traffic.

Many of these points are facts that we take for granted nowadays.   
Particularly interested is the note about Gnutella's weaknesses.   
Their trace demonstrates a very strong tree-structure, though they  
neglect to show where their node is located.  Depending on how the  
search was conducted, it is possible that these graphs are not  
complete.  Gnutella's failure points may be artifacts of an  
incomplete perception of the entire state of the network, combined  
with a demi-deterministic random walk algorithm.  The second paper  
could have shown series of graphs demonstrating object downloads as a  
function of both time and rank.  One would expect to see a near-zipf  
distribution over a small window, turning into the graph they showed  
as the window increased.  Given the depth of their analysis,  
confirmation of this behavior would be a useful addition.


-Oliver Kennedy

Computer science isn't about computers any more than astronomy is about
telescopes.
                 -- Dijkstra

From kash@cs.cornell.edu  Fri Feb 24 18:31:36 2006
Return-Path: <kash@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-0.8 required=5.0 tests=ALL_TRUSTED,AWL,HTML_10_20,
	HTML_MESSAGE autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from exchfe1.cs.cornell.edu (exchfenlb-1.cs.cornell.edu [128.84.97.33])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1ONVat21926
	for <egs@unix.cucs>; Fri, 24 Feb 2006 18:31:36 -0500 (EST)
Received: from EXCHVS1.cs.cornell.edu ([128.84.97.24]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 23 Feb 2006 01:07:48 -0500
Subject: PAPER 9
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C6383E.B6546D02"
Date: Thu, 23 Feb 2006 01:02:22 -0500
Content-class: urn:content-classes:message
X-MimeOLE: Produced By Microsoft Exchange V6.5
Message-ID: <2EE48095D8C21643B0B70EC95F9BFBAF0EE643@EXCHVS1.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: PAPER 9
Thread-Index: AcY4PrZK79qDy5P9QRC3d6l1IYFTWw==
From: "Ian Kash" <kash@cs.cornell.edu>
To: <egs+summary@cs.cornell.edu>
X-OriginalArrivalTime: 23 Feb 2006 06:07:48.0227 (UTC) FILETIME=[787E3530:01C6383F]

This is a multi-part message in MIME format.

------_=_NextPart_001_01C6383E.B6546D02
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

The measurement study reports the characteristics of peers in the =
Napster and
Gnutella networks based on a pair of short traces.  They find there is a
significant discrepancy between the upstream bandwidth available to well
provisioned and poorly provisioned peers.  Another interesting =
observation is
that many peers are uncooperative, both by sharing no files and by
misreporting the type of connection they had (underreporting capacity to
discourage others from making requests).  There are several flaws with =
the
methodology of this study.  The Gnutella trace ran only 8 days and the
Napster trace ran only 4 days, which means that they have data only from =
a
short time period that may be non-representative or contain misleading =
trends
(for example weekday / weekend cycles will not be adequately captured).  =
For
the Napster trace, they gathered peers by seaching for popular files.  =
This
biases the data because (for example) it seems like people with =
collections
of rare things rather than popular ones are less likely to be free =
riders
(because they may well be the only source for the items they will be the
target of requests no matter what they report) and people who do not =
share at
all will not be captured (and they may be motivated to free ride by poor
connections).  Their technique for bottleneck bandwidth measurement =
raises
ethical issues.  Even if it is only done for a short time, it appears =
they
are essentially launching a DoS attack on the person's connection for =
that
time.  In 3.3.3, their explanation of a disproportionate percentage of
downloads by low bandiwdth peers does not make sense.  They claim that =
low
bandwidth peers tend to be free riders and this explains why they make =
more
than their share of downloads.  However, there doesn't seem to be any =
reason
why the number of uploads provided should have any influence on the =
desire
for downloads of these peers.  If anything it would seem that they =
should be
consuming less because it takes longer for them to complete a given =
download.
Finally, their numbers for the removal of nodes from Gnutella have an =
error
somewhere.  They claim they removed 30% of 1771 and left 1300, but 70% =
of
1771 is 1240, which even if rounded is closer to 1200.

The workload study reports the results of a 200 day trace of requests =
made by
users at the University of Washington on the Kazaa network and some
simulations based on this workload.  The most interesting result of the =
trace
is that the distribution of requests for large media files is not Zipf, =
a
fact that they are able to explain by the fact that files a requested =
once
rather than repeatedly as web pages are.  They draw a distinction =
between two
main classes of files on Kazaa: small files < 10 MB (typically music) =
and
large files > 100 MB (typically movies).  Most requests are from the =
former
but most bandwidth is used by the later.  This means that to optimize =
user
experience a system should focus on the former while to optimize =
bandwidth
use it should focus on the latter.  They also propose and simulate a =
system
for decreasing the bandwidth used by a file sharing system for a =
community
sharing an external connection (for example a university).  They dismiss
caching because of potential policy and legal issues.  Instead they =
propose
directing requests to internal peers whenever possible, effectively =
using
them as a distributed local cache.  Their simulations suggest that this =
can
make a significant difference in external bandwidth consumption.

------_=_NextPart_001_01C6383E.B6546D02
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<DEFANGED_META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<DEFANGED_META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
6.5.7638.1">
<TITLE>PAPER 9</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->

<P><FONT SIZE=3D2>The measurement study reports the characteristics of =
peers in the Napster and Gnutella networks based on a pair of short =
traces.&nbsp; They find there is a significant discrepancy between the =
upstream bandwidth available to well provisioned and poorly provisioned =
peers.&nbsp; Another interesting observation is that many peers are =
uncooperative, both by sharing no files and by misreporting the type of =
connection they had (underreporting capacity to discourage others from =
making requests).&nbsp; There are several flaws with the methodology of =
this study.&nbsp; The Gnutella trace ran only 8 days and the Napster =
trace ran only 4 days, which means that they have data only from a short =
time period that may be non-representative or contain misleading trends =
(for example weekday / weekend cycles will not be adequately =
captured).&nbsp; For the Napster trace, they gathered peers by seaching =
for popular files.&nbsp; This biases the data because (for example) it =
seems like people with collections of rare things rather than popular =
ones are less likely to be free riders (because they may well be the =
only source for the items they will be the target of requests no matter =
what they report) and people who do not share at all will not be =
captured (and they may be motivated to free ride by poor =
connections).&nbsp; Their technique for bottleneck bandwidth measurement =
raises ethical issues.&nbsp; Even if it is only done for a short time, =
it appears they are essentially launching a DoS attack on the person's =
connection for that time.&nbsp; In 3.3.3, their explanation of a =
disproportionate percentage of downloads by low bandiwdth peers does not =
make sense.&nbsp; They claim that low bandwidth peers tend to be free =
riders and this explains why they make more than their share of =
downloads.&nbsp; However, there doesn't seem to be any reason why the =
number of uploads provided should have any influence on the desire for =
downloads of these peers.&nbsp; If anything it would seem that they =
should be consuming less because it takes longer for them to complete a =
given download.&nbsp; Finally, their numbers for the removal of nodes =
from Gnutella have an error somewhere.&nbsp; They claim they removed 30% =
of 1771 and left 1300, but 70% of 1771 is 1240, which even if rounded is =
closer to 1200.<BR>
<BR>
The workload study reports the results of a 200 day trace of requests =
made by users at the University of Washington on the Kazaa network and =
some simulations based on this workload.&nbsp; The most interesting =
result of the trace is that the distribution of requests for large media =
files is not Zipf, a fact that they are able to explain by the fact that =
files a requested once rather than repeatedly as web pages are.&nbsp; =
They draw a distinction between two main classes of files on Kazaa: =
small files &lt; 10 MB (typically music) and large files &gt; 100 MB =
(typically movies).&nbsp; Most requests are from the former but most =
bandwidth is used by the later.&nbsp; This means that to optimize user =
experience a system should focus on the former while to optimize =
bandwidth use it should focus on the latter.&nbsp; They also propose and =
simulate a system for decreasing the bandwidth used by a file sharing =
system for a community sharing an external connection (for example a =
university).&nbsp; They dismiss caching because of potential policy and =
legal issues.&nbsp; Instead they propose directing requests to internal =
peers whenever possible, effectively using them as a distributed local =
cache.&nbsp; Their simulations suggest that this can make a significant =
difference in external bandwidth consumption.</FONT></P>

</BODY>
</HTML>
------_=_NextPart_001_01C6383E.B6546D02--

From nsg7@cornell.edu  Tue Feb 28 11:09:46 2006
Return-Path: <nsg7@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1SG9jt03103
	for <egs+summary@cs.cornell.edu>; Tue, 28 Feb 2006 11:09:46 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1SG9itu014240
	for <egs+summary@cs.cornell.edu>; Tue, 28 Feb 2006 11:09:44 -0500 (EST)
Received: from 132.236.227.119
        by webmail.cornell.edu with HTTP;
        Tue, 28 Feb 2006 11:09:45 -0500 (EST)
Message-ID: <1859.132.236.227.119.1141142985.squirrel@webmail.cornell.edu>
Date: Tue, 28 Feb 2006 11:09:45 -0500 (EST)
Subject: PAPER 9
From: "Nicholas S Gerner" <nsg7@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Samsara, PPay, SHARP and KARMA each support p2p resource exchange. 
Symmetric storage exchange allows one node may store data at a second if
and only if it stores an equal amount for the second.  Samsarsa supports
this through claims which are uncompressible sequences of data created and
validated by a storage node and stored by a storing node.  Periodically a
storage node B storing data for node A can validate that A still stores a
claim for B.  This claim may later be replaced by actual data from B as
long as B still stores data for A.  Samsara extends this notion to
dependency chains where three or more nodes store data in a chain (A
stores data on B who stores data on C), where there is only one claim
(storage overhead) per chain at the root of the chain (so A stores a C's
claim for B).  If a claim fails a validation query, the node performing
the query probabalistically ejects a storage block from the node failing
the query.  A might fail a validation query on B's claim, B might respond
by ejecting A's storage block.  While a simple system, there are several
weaknesses.  First if chains are avoided then the storage overhead for the
system equals the amount of data stored in the system.  Second, dependency
chains are only as strong as the weakest link and a malicious or failed
node in a dependency chain can trigger a cascading failure along the chain
(in one direction).  While the paper points out that dependency cycles are
much more resilient to failures, in fact cycles are only resilient to one
failure, after which they become a dependency chain.

"PPay..." presents a coin exchange system where a set of trusted brokers
issue coins to nodes.  These coins can then be securely transferred from
the owning node to another and delegated from node to node by layering
secure delegations on the original coin (or previous delegation layers). 
These coins can later be cashed in at the brokers.  This system is aimed
at micropayments where the risk of each transaction is very small, so
detection of illegal transactions can be deferred for some reasonable
period (so load at the centralized brokers can be performed offline). 
Additionally a scheme using PPay coins to support pico payments (even
smaller, less risky transactions) is presented which assumes that a node
can extend a fixed amount of credit (divided amongst other nodes in
transactions with the first).  In this scheme transactions need not be
collected on (by using PPay coins) until the fixed amount of credit is
exceeded.  The centralized nature of the brokers and implicit trust that
all nodes must simultaneously share in the brokers goes against the spirit
of completely decentralized systems.  The picopayment scheme presented is
well decoupled from the PPay system and might be adapted to use other
payment schemes.

"SHARP..." presents a system of resource claim advertising where nodes
issue tickets which are (probabalistic) claims on resources at the node. 
Other nodes form a network and listen to each other's advertisements
forwarding requests for these claims.  These claims are validated by site
domain authorities, which are named in the tickets, are trusted by nodes
issuing tickets with that site domain.  Resource managers cash in tickets
across domains by contacting the site authority which in turn issues a
concrete claim on the resources listed in the claim (or rejects the ticket
if it conflicts with a previously cashed ticket).  Tickets can be
subdivided arbitrarily so one agent might obtain a ticket for a large
block of resources advertised by another node.  This large block might
then be divided amongst many nodes on whose behalf the original agent
acts.  To maximize resource utilization, resources can be overbooked (much
like airline flights) to handle tickets which are never claimed (due to
holder failure, etc.).  A study on PlanetLab is presented which uses SHARP
to securely allocate PlanetLab computing resources amongst the PlanetLab
constituents (including future resource reservations etc.).  SHARP has
many roles to be played by different agents and includes some
semi-centralization within domains.  Each domain must have a single site
authority who is authorized to validate all tickets for this domain
including keeping state on currently validated tickets and potentially
future resource reservations.  This model might be appropriate for some
problems exhibiting nodes to which other nodes are willing to delegate
this responsibility.  And while we've seen that many p2p systems exhibit
nodes with greatly different performance characteristics, it's not clear
that such trust clusters exist generally.

"Karma..." presents a system for completely distributed credit.  A
transaction between two nodes A and B involves A and B two additional sets
of nodes acting as the bank sets for A and B.  The bank set of a node A is
responsible for keeping (by majority rule to support failures and
disagreements, legitimate and malicious) the "karma" balance of A.  When A
requests resources from B costing some amount of karma (set by B), the
bank sets for A and B transfer that amount of karma from A's account to
B's account.  The transaction involves seven steps starting with a request
from A to B leading to communication between all memebers of A's bank set
to B's bank set and to and from A and B and ending with the transfer of
resources and a validatable receipt.  This protocol ensures that B
receives both the karma it is due and A receives it's resources and a
receipt that ensures that B received payment.  Several potential attacks
are examined and ruled out by the protocol.  No evaluation is presented
and a mechanism to offset inflation (in a distributed manner) is
presented, but poorly motivated.  It's not clear what problems inflation
will cause and it's not clear that additional complexity added by this
anti-inflation protocol is worth the cost that inflation presents. 
Additionally, nodes are granted a certain amount of karma upon entering
the system (in exchange for solving a puzzle, verified by the new node's
bank set).  It's not clear that this is the best choice and a zero balance
initialization might be more natural (of course a method for introducing
currency into the market would be needed in this case).  No analysis of
this initialization problem is presented.


From gp72@cornell.edu  Fri May 12 21:38:12 2006
Return-Path: <gp72@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe1.cs.cornell.edu (exchfe1.cs.cornell.edu [128.84.97.27])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k4D1cC210526
	for <egs@unix.cucs>; Fri, 12 May 2006 21:38:12 -0400 (EDT)
Received: from penguin.cs.cornell.edu ([128.84.96.11]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Fri, 12 May 2006 21:38:03 -0400
Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by penguin.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Fri, 12 May 2006 21:38:00 -0400
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k4D1bwE6017066;
	Fri, 12 May 2006 21:37:59 -0400 (EDT)
Received: from 128.84.98.245
        by webmail.cornell.edu with HTTP;
        Fri, 12 May 2006 21:37:58 -0400 (EDT)
Message-ID: <4952.128.84.98.245.1147484278.squirrel@webmail.cornell.edu>
In-Reply-To: <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu>
References: <2537.128.84.98.245.1147406914.squirrel@webmail.cornell.edu>
    <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu>
Date: Fri, 12 May 2006 21:37:58 -0400 (EDT)
Subject: PAPER 9
From: "Gopal Parameswaran" <gp72@cornell.edu>
To: egs+summary@cs.cornell.edu
Cc: gp72@cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
X-OriginalArrivalTime: 13 May 2006 01:38:00.0117 (UTC) FILETIME=[DE42E650:01C6762D]

Measurements Study of peer to Peer file sharing Systems
In This paper the authors analyze two Peer-to-Peer networks Napster and
Gnutella and do detailed measurements on the bandwidths and the number of
files shared and also on the latencies between hosts. Also the number of
nodes that shows a lower upload speed is also studied. The bottleneck
bandwidth that does not only depend on the upload speed of the node but
also depends on the bandwidth of the routing path and so affects the
bandwidth measurements. Also it was observed that 25% of the nodes did not
choose to report their bandwidth and this is attributed to the lack of a
penalizing mechanism for providing low upload speeds. Also the
observations show that both Napster and Gnutella networks show highly
similar percentage distribution of bandwidth for their hosts though
Gnutella nodes have in general more bandwidth than those of Napster
because 50% of users in Napster and 60% of users in Gnutella use broadband
connections. Also the study shows the number of nodes required to break
down the Gnutella network and how the network responds under attack. It
was observed that for a node degree of 20, 60% of the nodes would need to
be taken down for the overlay to be fragmented. Another interesting take
from this paper is the skew shown in Napster downloads where though low
bandwidth nodes have higher percentage of downloads than their actual
participation in the network showing that high bandwidth nodes provide
much the download bandwidth for downloading content.

Measurement, Modeling, and Analysis of a Peer-to-Peer File-Sharing Workload
	In this paper the authors analyze the Kazaa network for effect of file
sharing workloads and the effect of system parameters and the impact of
locality awareness in Kazaa. The most important observation is made on
the behavior of users who show patience for long downloads spanning days
due to the slow nature of the network. Also the authors show that the
popularity of objects have a short lifespan and the popularity keeps on
transferring to new objects introduced into the system. Thus the users
responses to the Kazaa network is widely different from the Internet
where spontaneous responses are expected as compared to the Kazaa
network. Also it was observed in the study that larger objects tend to
age faster than smaller objects and arrival of newer objects play an
important role in peer-to-peer systems as does newer content play an
important role in the dynamics of the web and performance in the file
sharing systems decrease over time in the absence of introduction of
newer objects. Another interesting observation made by the author is that
Kazaa does not follow Zip�s distribution.