From wbell@CS.Cornell.EDU Wed Nov 28 21:22:40 2001
Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT2MdT11481
	for <egs@cs.cornell.edu>; Wed, 28 Nov 2001 21:22:39 -0500 (EST)
Received: from [192.168.1.100] (syr-66-24-16-64.twcny.rr.com [66.24.16.64])
	by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id VAA29493
	for <egs@cs.cornell.edu>; Wed, 28 Nov 2001 21:22:38 -0500 (EST)
Subject: 615 PAPER #62
From: Walter Bell <wbell@CS.Cornell.EDU>
To: egs@CS.Cornell.EDU
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
X-Mailer: Evolution/0.99.1+cvs.2001.11.07.16.47 (Preview Release)
Date: 28 Nov 2001 21:22:16 -0500
Message-Id: <1007000558.2009.23.camel@brute>
Mime-Version: 1.0

62) Pastry: Scalable, decentralized object location and routing for
    large-scale peer-to-peer systems

    Storage management and caching in PAST, a large-scale persistent
    peer-to-peer storage utility

These papers introduce Pastry, a distributed object location system
and one of it's applications, PAST, which implements a distributed
persistent storage system. 

The goal of pastry is to build a self-organizing decentralized
location service. Pastry does this by forming an overlay network of
nodes each with randomly chosen (yet unique) node id's. Objects on a
given node are prefixed with the node's id, and routing is done via
always forwarding to nodes whose node id is closer to the destination
(either has more prefix bits that are the same, or is numerically
closer.) These properties allow Pastry to be loop free and do
reasonable global routing with only local information. Nodes announce
themselves to neighbors, and periodically broadcast hello messages to
neighbors in the node id space. Their assumption is that because node
ids are assigned randomly, nodes in the node id space will be
physically diverse, and therefore more likely to have at least a
single live neighbor node. This has advantages and disadvantages; it
allows PAST to distribute copies of a given file to neighbors in the
node id space with good guarantees of reliability, but also causes the
periodic hello messages to travel great distances on the underlying
network. I like the idea overall, but it seems that by having
neighbors announce themselves in the nodeid space, they open
themselves up to many problems in the underlying network.

Since routing is done via forwarding in the nodeid space, the devote
some attention to making sure that people choose forwarding neighbors
in the nodeid space that are local in the underlying network. I was
not so convinced of their proof, but I do believe this can be made to
work. Aside from some underlying network scalability concerns, I found
Pastry to be very compelling.

PAST was written to take advantage of pastry's nodeid routing, and
provides a service that distributes read only copies of a file to
multiple pastry locations for redundancy. They distribute copies of a
file to the k nearest neighbors in the Pastry nodeid space, which
presumably gives good reliability because the node ids are chosen at
random. They motivate that space allocation and distribution is a
fundamental problem in that naive algorithms to distribute replicated
copies of files quickly reaches capacity of key nods because of their
nearest neighbor scheme. They utilize a form of indirection to only
store pointer information at all the k nearest neighbors in the Pastry
nodeid space, and allow the actual file data to be more dispersed
throughout the node space. They also examine how local caching can
improve speed of access for files.

The PAST system supports conventional peer to peer applications such
as napster and gnutella, but does not provide conventional file system
semantics. It's reasonable to believe that future peer to peer
applications are going to require better primitives than just
distributed read only stores, but less expressive than general
read/write primitives of file systems. It would be interesting work to
evaluate what is the right performance/expressiveness boundary for
primitives to draw that allows for more useful applications than just
sharing.


From eyh5@ee.cornell.edu Wed Nov 28 23:31:55 2001
Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT4VtT03543
	for <egs@cs.cornell.edu>; Wed, 28 Nov 2001 23:31:55 -0500 (EST)
Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138])
	by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fAT4T4M06583
	for <egs@cs.cornell.edu>; Wed, 28 Nov 2001 23:29:04 -0500
Date: Wed, 28 Nov 2001 23:29:07 -0500 (EST)
From: Edward Hua <eyh5@ece.cornell.edu>
X-X-Sender:  <eyh5@photon.ece.cornell.edu>
To: <egs@CS.Cornell.EDU>
Subject: 615 Paper 62
Message-ID: <Pine.LNX.4.33.0111282328290.29056-100000@photon.ece.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Storage Management and Caching in PAST

Antony Rowstron, Peter Druschel

	This paper introduces PAST, a large-scale, persistent peer-to-peer 
storage utility  based on Pastry. A PAST system consists of nodes capable 
of initiating and routing client requests to insert or retrieve files. It 
forms a self-organizing overlay network, in which inserted files are 
replicated across multiple nodes for retrieval availability. PAST exports 
the following operations to the clients:

	1)field Id, which stores a file at a user-specified number k of 
diverse nodes within the PAST network,
	2)data file, which can be retrieved from one of the k nodes that 
stores it,
	3)reclamation of file, which frees up storage occupied by the 
file, but does not necessarily make the filew unavailable.

	Each PAST node has a 128-bit nodeId that indicates the node's 
posision in a circular namespace. When a new file is inserted into the 
network, PAST stores it on the k PAST nodes whose nodeIds are numerically 
closest to the 128 most-significant-bits of the file's ID. This therefore 
establishes an inherent association between the file and the node that 
houses it, and this invariant is maintained over the lifetime of the file. 
However, if a file is very popular, more than k nodes may be needed for 
faster distribution and retrieval. 

	PAST's storage management aims at allowing high global storage 
utilization and graceful degradation as the system approaches its maximal 
utilization. It has a feature, called replica diversion, that is designed 
to balance the remaining free storage space among the nodes. This is 
achieved by letting a node that is not one of the k numerically closest 
nodes to the file fieldId store the file if it is in the leaf set of one 
of the original k nodes. In addition, file diversion is performed when a 
node's entire leaf set is reaching maximal capacity. However, in 
exercising these management techniques, some policies governing their 
invocation are specified. These include:

	1)It is not necessary to balance the remaining free storage space 
among nodes as long as the utilization of all nodes is low.
	2)It is preferable to divert a large file rather than multiple 
small ones.
	3)A replicat should always be diverted from a node whose remaining 
free space is significantly below average to a node whose free space is 
significantly above average.

	The caching mechanism in PAST minimizes client access latency, 
maximizes the query throughput, and balances the query load in the system. 
The files are cached in the unused portion of the advertised disk space of 
a node, and they can be evicted at anytime to make room for a new primary 
or replicated file storage. The policy that governs the insertion of a 
file into the cache is that the file size is less than a fraction of the 
node's currently unused cache size. 

	The successful operation of PAST very much depends on the total 
amount of available storage disk space in the network. Maintaining k or 
more copies of a file in a PAST system with high utilization is only 
possibly if the total amount of disk storage in the system does not 
decrease. To satisfy this pre-requisite, certain quota of disk space is 
assigned to ensure the storage of file is not abused. On the other hand, 
another possible way to deal with this problem is to impose a certain size 
limit on the cached file, as the cached file may take up a large chunk of 
disk space that would have been otherwise used for storing files. Thus, 
there's a trade-off that exists between reducing the file access latency 
and storage capacity. In my opinion, the latter should be favored.

	
From ramasv@CS.Cornell.EDU Thu Nov 29 01:28:22 2001
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT6SMT22854
	for <egs@popsrv.cs.cornell.edu>; Thu, 29 Nov 2001 01:28:22 -0500 (EST)
X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0
content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Subject: cs615 PAPER 62
Date: Thu, 29 Nov 2001 01:28:21 -0500
Message-ID: <706871B20764CD449DB0E8E3D81C4D4301E7F29B@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: cs615 PAPER 62
Thread-Index: AcF4nwplfBLdaOdYTq2S6/ZgQwBuNQ==
From: "Venu Ramasubramanian" <ramasv@CS.Cornell.EDU>
To: "Emin Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fAT6SMT22854

Pastry: Scalable Decentralized Object Location and routing for large
scale peer-peer systems

	PAST is a decentralized content storage and dsitribution system
built fot large scale peer-peer systems.  Pastry is the underlying
routing used to locate the content based on key value obtained by
hashing the content id.  Much similar to CFS and chord in its design
there are two significant differences.  CFS stores data in block level
granularity while PAST stores them in file level granularity.  This
gives a better load balancing performance for CFS at the expense of
increased latency.  Pastry attempts to locate physically closer data
dources while Chord does not.

	PAST makes several heuristic efforts towards load balancing.
The most important of which involves replica diversion and file
diversion.  Since the file sizes follow a long tail distribution and
since participating nodes offer wide ranging amounts of storage load
balancing is an extremely difficult task.  Replica diversion takes the
appraoch that if the empty space available for storage is less than a
threshold then the replica will be instead stored in a different node.
It is surprising to see that a threshold as low as 0.1 is required to
make load balancing effective.  As a result bigger files find it more
difficult to find space.  Although experimental results show that PAST
does not do very badly on load balancing, I believe this issue is going
to be a thorn in the flesh.

	The second interesting feature of pastry is the maintanance of
neighborhood sets.  These help pastry to locate nodes actually closer in
the networl based on a metric like round trip time or number of hops.
Although no guarantees are given pastry seems to be quite successful in
finding closer servers for obtaining the file.  However, the effort
needed to maintain the neighborhood sets seems to be enormous (periodic
queries to several other nodes) and might affect the scalability of the
system.  Pastry also does lazy maintanance of routing tables avoiding to
a certain extant the thrashing problem faced by CFS.  

From papadp@ece.cornell.edu Thu Nov 29 05:44:14 2001
Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATAiET06626
	for <egs@CS.Cornell.EDU>; Thu, 29 Nov 2001 05:44:14 -0500 (EST)
Received: from kiki.ece.cornell.edu (kiki.ece.cornell.edu [128.84.83.13])
	by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATAfLM11786;
	Thu, 29 Nov 2001 05:41:21 -0500
Date: Thu, 29 Nov 2001 05:47:30 -0500 (EST)
From: "Panagiotis (Panos) Papadimitratos" <papadp@ece.cornell.edu>
To: Emin Gun Sirer <egs@CS.Cornell.EDU>
cc: Panagiotis Papadimitratos <papadp@ece.cornell.edu>
Subject: 615 PAPER 62
Message-ID: <Pine.GSO.4.05.10111290503410.6948-100000@kiki.ece.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Review of:"Storage management and caching in PAST, a large-scale,
persistent peer-to-peer storage utility" & "Pastry: Scalable,
decentralized object location and routing for large-scale peer-to-peer
systems," by A. Rowstron and P. Druschel

Panagiotis Papadimtratos papadp@ece.cornell.edu

Pastry is a look-up and query forwarding service at the level of an
overlaid network comprised of system nodes distributed throughout the
Internet. It acts as the basis for PAST, a storage utility that distibutes
files to peer nodes and primarily aims at high availability. 

The lookup and routing service organizes the nodes into a logical circle
of node ID's chosen out of a 128-bit space, with node Ids generated so
that uniqueness is achieved with high probability. This detaches the node
ID from the node's network location and proximity at the Pastry is
uncorrelated to proximity at the network layer. This is exactly addressed
by the maintenance of a neighbor list and their IP addresses so that the
'closest' node can be selected. 

When a file is inserted into the system, the k nodes with IDs numerically
closest to the inserted file ID are chosen. Similarly, Pasrty will route a
request towards the nodes whose nodeID is closest to the 128 most
significant bits of the fileID. This technique leads to the claim of
an average logarithmic cost under favorable network conditions and correct
state at all nodes. 

Furthermore, replication is used in order to increase fault tolerance. At
the same time the issue of load balancing arises, in conjuction with the
file insertion mechanism. PAST addresses such issues by employing the
replica and file diversion technicques, which are controlled by specific
policies, while maintaining the system invariant that a file has to be
stored to k nodes. Note that in PAST files and not blocks are
replicated/cachced/stored (in contrast with CFS).

The requirement for correct state (routing table, neighbor list) in order
to achieve efficiency could be proven restrictive if malicious nodes were
present. Given the cost to maintain/update this state, the employment of
cryptography in order to safe-guard this traffic would pose a severe
obstacle. It would be interesting to see what might be the solution in the
'forthcoming paper' (sec.2.3.).


From avneesh@csl.cornell.edu Thu Nov 29 10:53:27 2001
Received: from capricorn.ds.csl.cornell.edu (capricorn.csl.cornell.edu [132.236.71.92])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATFrRT01328
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 10:53:27 -0500 (EST)
content-class: urn:content-classes:message
Subject: 615 Paper 62
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Date: Thu, 29 Nov 2001 10:56:16 -0500
X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0
Message-ID: <97C142C1212ED545B0023A177F5349C4056698@capricorn.ds.csl.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615 Paper 62
Thread-Index: AcF47mC2uib7mL3RRsyhI1Jn5cMXsQ==
From: "Avneesh Bhatnagar" <avneesh@csl.cornell.edu>
To: <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fATFrRT01328

Pastry: Scalable, decentralized object location and routing for
large-scale peer to peer systems.

Summary/Critique.

The PAstry protocol provides an object location and routing layer for
peer to peer applications. This protocol is different from Chord in that
first of all, messages are not forwarded on the basis of numerical
difference, but instead attempts to achieve better network locality by
mantaining more data structures  to optimize routing. However the
similarity with chord is that each node is positioned in a circular node
id space, though the node id in this case is 128 bits. The node id can
be generated by computing a cryptographic hash (SHA1 in this case) of
its public key or IP address.

Routing: A message is routed by a node to another node which shares with
a key which contains a prefix that is atleast b bits longer than the
prefix that the key shares with the present node's id. If that does not
work, then the message is forwarded to a node whose nodeId shares a
prefix not only as long s the current node, _but_ also numerically
closer to the present node id. In order to achieve these goals, a node
mantains the following data structures:
a. Routing table: This typically contains log(base 2^b) N rows with
2^b-1 entries each. The nth row of the routing table contains nodes that
share the first n digits of the current node's id. The table entry will
be left empty if no node is known to have the property. There is a
tradeoff involved in the choice of the base b, since smaller b may
increase the number of hops required to get to a node. 
Since routing looks at key prefix information, the set of nodes
decreases by 2^b for each routing step, which implies that optimally the
destination can be reached in log(base 2^b) steps.
b. Neighborhood set: This contains nodeIds and IP addresses of M nodes
that are closest to the local node. This set is used to mantain locality
information, and each node proactively mantains the freshness of this
set.
c. The leaf set is used in routing. This contains |L|/2 numerically
closest larger node IDs and |L|/2 smaller node Ids. The advantage is
that if a node is found within this set, then it can be reached within
one hop.

Join/Leave: When a node joins the network, Pastry tries to route the
join message to the node that is closest (assume Z whose id is
numerically closest) to the joining node. All nodes involved in the path
from joining node to the destination then send their state tables to the
new node, so that it can instantiate its own state. There are two
different things happening here though, Z is numerically closest so it
is used to initialize the leaf set, while the neighborhood set is
initialized using the proximally closest node's neighborhood set. The
newly joined node transmits its state to the other nodes in its neighbor
set, leaf set and routing table. Also each row in the new node's routing
table reflects entries correspoding to nodes that were used to route the
request to Z. So the first node's first row entries correspond to the
first row entry of the new node.
Node departure is handled when a query from a neighbor is not received
in a particular amount of time. In such a case, the leaf set from a node
on the side of the failed node is requested for and queried to create a
new leaf set.
In order to tolerate arbitarary node failures, routing can be randomized
and  or an expanding ring IPmulticast search can be done, to repair a
partitioned network.

The experimental results show that Pastry performs favorably in terms of
number of hops required to route a message as fairly optmial routing
distance. Node state repair facilities improve lookups when arbitrary
failures occur. However some issues are left open:
a. What happens when nodes join concurrently? There seems to be an
explicit mechanism in Chord for this.
b. The relative overheads in terms of usefull data sent and control
traffic to mantain freshness information should have been shown.
However I think that this paper is quite interesting, due to the
gurantees of locality that it provides for locating a node. 

****************

Storage Management and caching in PAST.

Summary/Critique:

This paper presents PAST, a peer to peer storage utility, which
primarily acts as an archival storage utility and content distribution
utility. Thus unlike CFS, it is not intended as a general-purpose file
system. PAST consists of nodes running on the pastry routing mechanism.
File insertion in PAST requires a fieldID to be computed by taking a
SHA1 hash of the file's textual name  (CFS takes content hash), clients
public key and a ransom salt. During insertion, the file is actually
stored on k nodes closest (numerically as in pastry), to the 128 msbs of
the file's id fields. This provides the required scalability and fault
tolerance but also introduces new issues. In order to limit the amount
of data that can be added to the PAST system by a node, a quota
mechanism is used. A file certificate is also used to authenticate the
client. This is verfied before insertion  by the k nodes. The lookup
operation proceeds with requested fileID being used for locating the
closest node. There is also a 'smart card' support which prevents
malicious nodes from arbitrarily directing requests. This feature is
interesting, since this was my main concern with reliance onfinger
tables in CFS.

In order to facilitate efficient storage management, the authors
introduce replica diversion and file diversion to satsify othogonal
constraints of mantaining copies at k adjacent nodes and having
efficient load balancing in the system. Replica diversion occurs when a
node is unable to store a file locally, in that case it chooses a node
in its leaf set, which is not among the k closest nodes and does not
already hold a diverted replica to store a file. A pointer is then set
to point to this new node. Also to improve fault tolerance within this
arrangement a pointer to the storage node is also mantained at a node at
distance k+1 from the field id. Replica diversion should also be done
with care, since it might incur high latencies. PAST employs several
policies, one of which is based upon the size metric Sd/Fn which is the
ratio of the file size vs the remaining disk space. This policy
discriminates against large files being diverted.

During a join operation, a new node might become one of the k nodes of
another node for replica diversion. In this case it might have to
request replicas for all files for which it provides this property. To
optimize this situation, the node installs pointers to a pervious node
that might have ceased to be part of the k set. I am not sure about the
scalability of this, since failure of the pointed to node, leaves
dangling pointers. This means that apart from having pointers, the node
should also query the pointed to node, to verify is its still alive. The
authors also mention file encoding and division of a file into blocks
which can be stored separately as a future research issue, but this
seems to have been well handled in CFS. The final storage management
technique that is used is file caching, with files being cached in the
unused portion of the node's advertised disk space. There is no
consistency/coherency or correctness mechanism for these entries. The
replacment policy is the Greedy-Dual-Size (GDS) policy.

The authors evaluate the performance of PAST under conditions of varying
threshold values of replica and file diversion with four normal
disttribution values. The tradeoffs of maximizing disk utilization vs
availability are highlighted. Increasing the leaf set also maximizes
availability, since it would prevent reverting more often to the fall
back mechanims. Caching improves lookup, but at high utlization for
large files, file cache replacments start dominating. However in the
general case, smaller files are more in number so there is enough cache
space. Finally, it seems that very low thresholds dominate load
balancing performance, which again favors smaller file sizes.

From eyh5@ee.cornell.edu Thu Nov 29 11:21:37 2001
Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGLbT06781
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 11:21:37 -0500 (EST)
Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138])
	by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATGIhM18602
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 11:18:43 -0500
Date: Thu, 29 Nov 2001 11:18:46 -0500 (EST)
From: Edward Hua <eyh5@ece.cornell.edu>
X-X-Sender:  <eyh5@photon.ece.cornell.edu>
To: <egs@CS.Cornell.EDU>
Subject: 615 Paper # 62
Message-ID: <Pine.LNX.4.33.0111291117260.30634-100000@photon.ece.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Pastry: Scalable, Decentralized Object Loacation and Routing for 
Large-scale Peer-to-Peer Systems

Antony Rowstron, Peter Druschel

	This paper introduces Pastry, a scalable, distributed object 
location and routing substrate for wide-area peer-to-peer applications. 
Each node in the Pastry network has a unique numeric nodeId. These nodeIds 
are assigned randomly, using a SHA-1 hash of its IP address, and are 
diverse in geography, ownership, jurisdiction, etc. The node routes a 
client message to the node with a nodeId that is numerically closest to 
the key. Pastry seeks to minimize the distance messages travel, according 
to a scalar proximity metric like the number of IP routing hops. Pastry 
supports a variety of applications such as PAST and SCRIBE. 

	The routing algorithm of Pastry uses a routing table in case the 
key of the received message does not fall within the range of nodeIds 
covered by the node's leaf set. The message, in this case, is forwarded to 
a node listed in the routing table entries that shares a common prefix 
with the key by at least one more digit. In the unlikely event that no 
such entry exists, the message is forwarded to a node that shares a prefix 
with the key at least as long as the local node, and is numerically closer 
to the key than the present node's Id. This procedure guarantees to 
converge, as long as at least half of the leafset nodes maintained in the 
local node have failed simultaneously.

	Several aspects of the Pastry operation are addressed in this 
paper. First, the joining of a new node to the Pastry network is done by 
the new node asking its immediate known neighbor node to send a "join" 
message on its behalf to a node whose nodeId is numerically closest to the 
new node. The new node receives all state tables from all intermediate 
nodes, based on which  it compiles its own initial routing table. Given 
its decentralized nature, Pastry conforms to the notion of network 
proximity based on a scalar proximity metric, such as the routing hops or 
geographic distance. It seeks to ensure that routing table entries are 
chosen to provide good locality properties. In case of node failures or 
network partitions, Pastry addresses this problem by randomizing the the 
choice of a node in the routing process that shares a longer prefix with 
the destination, or shares the same prefix length as the current node but 
numerically closer than the current node. In addition, Pastry IP multicast 
may be used to reach hosts that may not be previously reachable due to 
partitions in the network.

	The Pastry routing algorithm provides a good bound on the number 
of hops a message must traverse in the network before reaching a 
destination node,  improving savings in delay time. Also can be shown in 
the performance evaluation is that Pastry is very adaptive to changes in 
the network, be it node addition or failure. On the other hand, the Pastry 
does not address the load balancing issue, as it does not in itself 
produce file replicas in mulitple nodes of the network. This function is 
in fact taken care of by applications built on top of Pastry. 


From daehyun@csl.cornell.edu Thu Nov 29 11:45:17 2001
Received: from wilkes.csl.cornell.edu (wilkes.csl.cornell.edu [132.236.71.69])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGjHT10629
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 11:45:17 -0500 (EST)
Received: (from daehyun@localhost)
	by wilkes.csl.cornell.edu (8.9.3/8.9.2) id LAA70235
	for egs@cs.cornell.edu; Thu, 29 Nov 2001 11:45:12 -0500 (EST)
	(envelope-from daehyun)
From: Daehyun Kim <daehyun@csl.cornell.edu>
Message-Id: <200111291645.LAA70235@wilkes.csl.cornell.edu>
Subject: 615 PAPER 62
To: egs@CS.Cornell.EDU
Date: Thu, 29 Nov 2001 11:45:11 -0500 (EST)
X-Mailer: ELM [version 2.4ME+ PL54 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

This paper presents PAST which is a large scale peer-to-peer persistent
storage utility. PAST is based on a self-organizing internet-based
overlay network of storage nodes that cooperatively route file queries,
store multiple replicas of files, and cache additional copies of popular
files.

The PAST system is composed of nodes connected to the internet, where
each node is capable of initiating and routing client requests to insert
or retrieve files. Node may also contribute storage to the system.
PAST exports the following operations to clients.
   1) Insert: Stores a file and return a FileID.
   2) Lookup: Retrieves a copy of the file identified by FileID.
   3) Reclaim: Reclaims the storage occupied by the file.
Inserted files are replicated across multiple nodes for availability.
So, the PAST system can provide services though some nodes might fail.
Pastry is an efficient routing scheme used in PAST, which routes client
requests to a node that is close to the client and stores the requested
file. The complexity of Pastry operations is logarithmic, so it is
scalability. The PAST system uses ID system for security. Files are 
assigned quasi-unique FileID when they are stored into the PAST system.
To retrieve a file, a client must know its field and ,if necessary,
its decryption key.

The PAST system is a internet-based peer-to-peer file storage system.
It provides user interface similar to conventional file systems. But,
in my opinion, there are fundamental differences. Most of all, it
does not guarantee the completeness. In conventional files systems, files
inserted once will exist in the system until user delete them explicitly.
But, in the Past system, files may be deleted anytime. Second, load
balance and security is more important in the PAST system than in
conventional file systems, because it is designed for large scale networks. 
So, a special routing scheme is required, and Pastry is a routing
protocol used in the PAST system.
In my opinion, though the PAST system provides a powerful network storage
system, it is not still as convenient as conventional file systems. It
is because there are more requirements such as scalability for the systems
in large scale networks. However, I think the PAST system need to overcome 
those limitation to provide the same level of continence as conventional
file systems..
(The same comments can be applied to the CFS and Chord system.)

From viran@csl.cornell.edu Thu Nov 29 11:54:30 2001
Received: from moore.csl.cornell.edu (moore.csl.cornell.edu [132.236.71.83])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGsUT12567
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 11:54:30 -0500 (EST)
Received: from localhost (viran@localhost)
	by moore.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fATGsKT73884
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 11:54:20 -0500 (EST)
	(envelope-from viran@moore.csl.cornell.edu)
X-Authentication-Warning: moore.csl.cornell.edu: viran owned process doing -bs
Date: Thu, 29 Nov 2001 11:54:20 -0500 (EST)
From: "Virantha N. Ekanayake" <viran@csl.cornell.edu>
To: <egs@CS.Cornell.EDU>
Subject: 615 Paper 62
Message-ID: <Pine.BSF.4.33.0111291147010.73613-100000@moore.csl.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

Pastry is a completely decentralized, fault-resilient peer-to-peer object
location and routing scheme.  It's based on an overlay network of nodes
connected via the Internet.  Each node is assigned a node id that is
randomly generated to fall uniformly between 0 and 2^128-1.  A routing
table is kept that holds node mappings (node id to IP address) that share
common prefixes with the current node id.  Since there are many nodes that
share each prefix, a mapping that is geographically close to the current
node (based on number of hops away) is chosen to be in the routing table.
This ensures some locality, since there is no correspondence between
adjacent node ids and actualy locations, which was a drawback in systems
like Chord.

A leaf set is also kept that keeps mappings of node ids that are
numerically closest (both above and below) to the current node id.  A
neighbourhood set is also kept that holds the mappings of nodes that are
physically closest.  This table is not used for routing, but for
maintaining locality.

Since the routing tables are kept small (for scaling purposes), Pastry
routes are about 30-40% longer than the ideal route.  It routes to any
node in O(log N) steps and the route tables are also O(logN) entries.
Since Pastry takes into account locality, the throughput of the system is
also fairly good.

PAST is a file-storage utility overlayed on the network that leverages
Pastry as the location and routing layer.  Files have a fileID, which is
quasi-unique and generated at the file's creation.  Files cannot be
changed once inserted into the system, and thus the file system lacks the
normal semantics.


From ranveer@CS.Cornell.EDU Thu Nov 29 12:25:24 2001
Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATHPOT17977
	for <egs@popsrv.cs.cornell.edu>; Thu, 29 Nov 2001 12:25:24 -0500 (EST)
Subject: 615 PAPER 62
MIME-Version: 1.0
Content-Type: text/plain;
	charset="utf-8"
Date: Thu, 29 Nov 2001 12:25:24 -0500
content-class: urn:content-classes:message
X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0
Message-ID: <706871B20764CD449DB0E8E3D81C4D430232E6BE@opus.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: 615 PAPER 62
Thread-Index: AcF4+tRnB4APQogyQauEf8ZuJK0A6A==
From: "Ranveer Chandra" <ranveer@CS.Cornell.EDU>
To: "Emin Gun Sirer" <egs@CS.Cornell.EDU>
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id fATHPOT17977

Pastry and PAST
 
A. Rowstrom and P. Druschel
 
PAST is a scalable, peer-to-peer persistent storage utility.  It is
similar to CFS in its objectives, that is to provide a decentralized
storage for large scale colaborative networks.  PAST uses an underlying
routing protocol, called Pastry, which is again similar in functionality
to Chord used by CFS and exceedingly similar to Taperstry developed at
Berkeley.  Pastry maintains a ring like structure similar to Chord.
However, each node maintains different data structures called the leaf
and neighbor table.  The leaf table stores 'l/2' successors and 'l/2'
predecessors along the ring, and the neighbor table stores 'l' closest
nodes according to a proximity metric.  The neighborhood information
helps Pastry to route requests to a node topologically closer to the
requestor, as opposed to Chord. 
 
Another contrasting feature of PAST to CFS is the use of file-level
storage.  Both the schemes, i.e. file-level storage of PAST and
block-level storage of CFS offers its own advantages and disadvantages.
The way PAST handles file-level storage is interesting.  Storing files
rather than blocks requires a large amount of space.  Further, the 'k'
successors in the ring might not have suffcient space to store the file.
PAST proposes Replica and File Diversion to overcome this problem.  This
also requires more maintenance to ensure the utilization of replicas,
but PAST does good work to ensure it.  PAST also does good work in load
balancing.  The proposed scheme has lesser overhead with a weaker load
balancing which might be more appropriate in large scale systems.  
 
A minor problem overlooked by PAST is the overlapping of nodeIds or
FileIds by the hash function.  Although the possibility of this occuring
is remote, if it occurs the result is disasterous.  Additionally, the
design principle of file storage as opposed to block storage of CFS
renders PAST a bigger victim of fragmentation which could be a big
overhead.  Overall,  PAST does a great job of building a scalable
collaborative storage utility.  A great systems work!
 
 
From mh97@cornell.edu Thu Nov 29 13:01:12 2001
Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATI1CT25075
	for <egs@CS.Cornell.EDU>; Thu, 29 Nov 2001 13:01:12 -0500 (EST)
Received: from mars (syr-66-24-28-66.twcny.rr.com [66.24.28.66])
	by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA27908
	for <egs@CS.Cornell.EDU>; Thu, 29 Nov 2001 13:01:10 -0500 (EST)
From: "hao ming" <mh97@cornell.edu>
To: "'Emin Gun Sirer'" <egs@CS.Cornell.EDU>
Subject: 615 PAPER 62
Date: Thu, 29 Nov 2001 13:00:59 -0500
Message-ID: <000901c178ff$cda91b50$6901a8c0@mars>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: 7bit
X-Priority: 3 (Normal)
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook, Build 10.0.2627
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000
In-Reply-To: <200111281523.fASFN1q19899@zinger.cs.cornell.edu>
Importance: Normal

Pastry: Scalable, decentralized object location and
routing for large-scale peer-to-peer systems 
By Antony Rowstron and Peter Druschel

this paper presents a distributed and scalable system  providing 
services like locating  objects and naming. it is on top of IP 
address scheme. Essentially, it provides a mapping between key and
IP address via node ID. 

the system works in the following way:
1. each node has a unique id which is 128 bits long
2. each node maintains a routing table which is organized in this way:
   a. it has a left set which contains numerically most closest nodes to

      local node. 
   b. a routing table with each row stores entries which has one longer
      similar prefix with local node. i.e entries in row 0 has no
similar 
      prefix to local node. And entries in routing table become farther
away 
      from local node as increase of row number. i.e. entries n row 0 is

      most closest to local
3. when routing, leaf set is checked first, then the row with
appropriate
      prefix. This basically guarantee message forwarded adopts a bigger
and
      bigger stride to the  destination.
     
Some nice properties of this system :
1. Locality is ensured by routing the nodes to closest nodes and new
nodes get Routing table info from appropriate rows at different nodes
along a Routing path.

2. fault tolerance is maintained by storing info no K nodes with most
closet
node Ids.  
3 .scalability is achieved by spreading K nodes with most closest node
Ids physically far away. 

final comments: 
1. is it possible to map key directly to the participating IP nodes
without this middle substrate?

2.it is hard to compare all these big system without
thorough simulation or practical deployment. But this system was
developed by Microsoft and probably will become a standard API even
its performance is not best. 

-ming

From teifel@csl.cornell.edu Thu Nov 29 13:55:39 2001
Received: from disney.csl.cornell.edu (disney.csl.cornell.edu [132.236.71.87])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATItdT05146
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 13:55:39 -0500 (EST)
Received: from localhost (teifel@localhost)
	by disney.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fATItTe50048
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 13:55:29 -0500 (EST)
	(envelope-from teifel@disney.csl.cornell.edu)
X-Authentication-Warning: disney.csl.cornell.edu: teifel owned process doing -bs
Date: Thu, 29 Nov 2001 13:55:29 -0500 (EST)
From: "John R. Teifel" <teifel@csl.cornell.edu>
To: <egs@CS.Cornell.EDU>
Subject: 615 PAPER 62
Message-ID: <20011129124615.L38847-100000@disney.csl.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII

PAST & Pastry:

Pastry is a peer-to-peer routing substrate that is efficient, scalable,
fault resilient and self-organizing.  It performs application-level
routing and object location in a potentially very large overlay network of
nodes connected by the internet.  It is used to support peer-to-peer
applications such as global data storage, data sharing, group
communication and naming.  The main technical feature of Pastry is each
node has a unique identifier, which the nodes use to set up nodeId spaces
that are used in routing.  Each node tracks its local neighbors to keep
upto date on network topology.  Pastry is completely decentralized and
experimental implementation results on a 100,000 node setup show Pastry's
efficiency and scalability.  Simulation results and figures were
satisfactory in demonstrating the feasibility of Pastry.

PAST is a large-scale peer-to-peer persistent storage utility.  It is
self-organizing and consists of an internet-based overlay network of
storage nodes that perform distributed file queries, store multiple
replicas of files, and cache popular files.  The PAST paper concentrated
on evaluating storage management techniques in this environment, mainly on
how non-uniform nodal storage and file sizes affects load balancing
performance on high capacity systems.  Except for the fact that the code
was written in Java, I felt that the experimental results and evaluation
sections were rather good and demonstrated well that the PAST storage
management scheme is feasible and sound from a technological perspective.

From samar@ece.cornell.edu Thu Nov 29 14:21:59 2001
Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATJLwT11517
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 14:21:59 -0500 (EST)
Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138])
	by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATJJ3M24438
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 14:19:03 -0500
Date: Thu, 29 Nov 2001 14:19:06 -0500 (EST)
From: Prince Samar <samar@ece.cornell.edu>
To: <egs@CS.Cornell.EDU>
Subject: 615 PAPER 62
Message-ID: <Pine.LNX.4.33.0111291418070.31705-100000@photon.ece.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


Pastry: Scalable, decentralized object location and routing for
large-scale peer-to-peer systems
				&
Storage management and caching in PAST, a large-scale persistent
peer-to-peer storage utility


PAST is a peer-to-peer global storage utility for the internet. It builds
on top of Pastry, which is a content location and routing system based on
a self-organizing overlay network of nodes.

Each PAST node is assigned a 128-bit node identifier which are arranged in
a circular structure. This nodeid arrangement is quasi-random, implying
that the nodes with proximity in the nodeid circular arrangement are
probably diverse in geographic location, network connectivity, ownership
or jurisdiction. This property makes these nodes a good choice for storing
the replicas of a file, as the node in the set are unlikely to conspire or
be subject to correlated failures or threats. k PAST nodes with
numerically closest nodeid's are chosen during the insert operation for
caching the files. k is chosen to meet the availability needs of the file,
relative to the expected failure rates of individual nodes.

PAST is layered on top of Pastry, which is designed to be efficient,
scalable, fault resilient and self organizing. Given a fileid, Pastry
routes the corresponding messages towards the node whose nodeid is
numerically closest to the 128 msbs of the fileid, among all live nodes.
Pastry maintains a neighborhood set based on a metric like round trip time
or number of hops, which helps in locating a node actually closer to
itself in the network. This is an interesting idea which is not present in
Chord, though it is associated with the cost of making periodic queries to
several nodes. A point to be noted is that in PAST actual files (and not
blocks as in CFS) are cached or replicated. This could lead to unbalanced
loads on the nodes, especially if the file sizes are disproportionate.


From jcb35@cornell.edu Thu Nov 29 14:29:19 2001
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATJTJT12536
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 14:29:19 -0500 (EST)
Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13])
	by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id OAA29868
	for <egs@cs.cornell.edu>; Thu, 29 Nov 2001 14:29:16 -0500 (EST)
From: jcb35@cornell.edu
Date: Thu, 29 Nov 2001 14:29:15 -0500 (EST)
X-Sender: jcb35@travelers.mail.cornell.edu
To: egs@CS.Cornell.EDU
Subject: 615 PAPER 62
Message-ID: <Pine.SOL.3.91.1011129135504.4214A-100000@travelers.mail.cornell.edu>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII


PAST provides a large-scale peer-to-peer persistent storage utility.  
PAST operates on top of Pastry, which is a peer-to-peer request routing 
and content location scheme.

Pastry strives to design a scalable distributed object location for 
wide-area and peer-to-peer applications.  Each node keeps track of 
certain tables which contain the following:

routing table - entries at row n contain a node whose nodeId shares a n 
digits, but the next digit has one of 2^b - 1 possible values.  This is 
used to ensure that a node is no more than log_b N hops away (for a 
pre-determined b and N nodes).

leaf set - nodes with the closest node ids.

neighborhood set - these nodes are the nodes which are closest to the 
local node - these are used in locality properties.


Pastry provides mechanisms for joining (it replicates information about 
its routing table to other nodes before it begins routing packets) and 
assumes nodes will depart or fail without warning.

Using this, PAST systems allow clients to insert, lookup, and reclaim a 
file.  PAST will distribute the file over k different nodes, determined 
by the client when they insert the file.  The reclaim function basically 
deletes the file, but does not guarantee that future lookups will fail.


I thought they provided another interesting approach to peer-to-peer 
object location, offering another method to achieve location within logN 
steps.  I wonder about some of the specifics like what happens when 
multiple nodes try to join simulateously, and also how the PAST system 
leaves much of the replication up to the application - could this be 
better solved at a system level?