From wbell@CS.Cornell.EDU Wed Nov 28 21:22:40 2001 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT2MdT11481 for ; Wed, 28 Nov 2001 21:22:39 -0500 (EST) Received: from [192.168.1.100] (syr-66-24-16-64.twcny.rr.com [66.24.16.64]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id VAA29493 for ; Wed, 28 Nov 2001 21:22:38 -0500 (EST) Subject: 615 PAPER #62 From: Walter Bell To: egs@CS.Cornell.EDU Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.99.1+cvs.2001.11.07.16.47 (Preview Release) Date: 28 Nov 2001 21:22:16 -0500 Message-Id: <1007000558.2009.23.camel@brute> Mime-Version: 1.0 62) Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems Storage management and caching in PAST, a large-scale persistent peer-to-peer storage utility These papers introduce Pastry, a distributed object location system and one of it's applications, PAST, which implements a distributed persistent storage system. The goal of pastry is to build a self-organizing decentralized location service. Pastry does this by forming an overlay network of nodes each with randomly chosen (yet unique) node id's. Objects on a given node are prefixed with the node's id, and routing is done via always forwarding to nodes whose node id is closer to the destination (either has more prefix bits that are the same, or is numerically closer.) These properties allow Pastry to be loop free and do reasonable global routing with only local information. Nodes announce themselves to neighbors, and periodically broadcast hello messages to neighbors in the node id space. Their assumption is that because node ids are assigned randomly, nodes in the node id space will be physically diverse, and therefore more likely to have at least a single live neighbor node. This has advantages and disadvantages; it allows PAST to distribute copies of a given file to neighbors in the node id space with good guarantees of reliability, but also causes the periodic hello messages to travel great distances on the underlying network. I like the idea overall, but it seems that by having neighbors announce themselves in the nodeid space, they open themselves up to many problems in the underlying network. Since routing is done via forwarding in the nodeid space, the devote some attention to making sure that people choose forwarding neighbors in the nodeid space that are local in the underlying network. I was not so convinced of their proof, but I do believe this can be made to work. Aside from some underlying network scalability concerns, I found Pastry to be very compelling. PAST was written to take advantage of pastry's nodeid routing, and provides a service that distributes read only copies of a file to multiple pastry locations for redundancy. They distribute copies of a file to the k nearest neighbors in the Pastry nodeid space, which presumably gives good reliability because the node ids are chosen at random. They motivate that space allocation and distribution is a fundamental problem in that naive algorithms to distribute replicated copies of files quickly reaches capacity of key nods because of their nearest neighbor scheme. They utilize a form of indirection to only store pointer information at all the k nearest neighbors in the Pastry nodeid space, and allow the actual file data to be more dispersed throughout the node space. They also examine how local caching can improve speed of access for files. The PAST system supports conventional peer to peer applications such as napster and gnutella, but does not provide conventional file system semantics. It's reasonable to believe that future peer to peer applications are going to require better primitives than just distributed read only stores, but less expressive than general read/write primitives of file systems. It would be interesting work to evaluate what is the right performance/expressiveness boundary for primitives to draw that allows for more useful applications than just sharing. From eyh5@ee.cornell.edu Wed Nov 28 23:31:55 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT4VtT03543 for ; Wed, 28 Nov 2001 23:31:55 -0500 (EST) Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fAT4T4M06583 for ; Wed, 28 Nov 2001 23:29:04 -0500 Date: Wed, 28 Nov 2001 23:29:07 -0500 (EST) From: Edward Hua X-X-Sender: To: Subject: 615 Paper 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Storage Management and Caching in PAST Antony Rowstron, Peter Druschel This paper introduces PAST, a large-scale, persistent peer-to-peer storage utility based on Pastry. A PAST system consists of nodes capable of initiating and routing client requests to insert or retrieve files. It forms a self-organizing overlay network, in which inserted files are replicated across multiple nodes for retrieval availability. PAST exports the following operations to the clients: 1)field Id, which stores a file at a user-specified number k of diverse nodes within the PAST network, 2)data file, which can be retrieved from one of the k nodes that stores it, 3)reclamation of file, which frees up storage occupied by the file, but does not necessarily make the filew unavailable. Each PAST node has a 128-bit nodeId that indicates the node's posision in a circular namespace. When a new file is inserted into the network, PAST stores it on the k PAST nodes whose nodeIds are numerically closest to the 128 most-significant-bits of the file's ID. This therefore establishes an inherent association between the file and the node that houses it, and this invariant is maintained over the lifetime of the file. However, if a file is very popular, more than k nodes may be needed for faster distribution and retrieval. PAST's storage management aims at allowing high global storage utilization and graceful degradation as the system approaches its maximal utilization. It has a feature, called replica diversion, that is designed to balance the remaining free storage space among the nodes. This is achieved by letting a node that is not one of the k numerically closest nodes to the file fieldId store the file if it is in the leaf set of one of the original k nodes. In addition, file diversion is performed when a node's entire leaf set is reaching maximal capacity. However, in exercising these management techniques, some policies governing their invocation are specified. These include: 1)It is not necessary to balance the remaining free storage space among nodes as long as the utilization of all nodes is low. 2)It is preferable to divert a large file rather than multiple small ones. 3)A replicat should always be diverted from a node whose remaining free space is significantly below average to a node whose free space is significantly above average. The caching mechanism in PAST minimizes client access latency, maximizes the query throughput, and balances the query load in the system. The files are cached in the unused portion of the advertised disk space of a node, and they can be evicted at anytime to make room for a new primary or replicated file storage. The policy that governs the insertion of a file into the cache is that the file size is less than a fraction of the node's currently unused cache size. The successful operation of PAST very much depends on the total amount of available storage disk space in the network. Maintaining k or more copies of a file in a PAST system with high utilization is only possibly if the total amount of disk storage in the system does not decrease. To satisfy this pre-requisite, certain quota of disk space is assigned to ensure the storage of file is not abused. On the other hand, another possible way to deal with this problem is to impose a certain size limit on the cached file, as the cached file may take up a large chunk of disk space that would have been otherwise used for storing files. Thus, there's a trade-off that exists between reducing the file access latency and storage capacity. In my opinion, the latter should be favored. From ramasv@CS.Cornell.EDU Thu Nov 29 01:28:22 2001 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAT6SMT22854 for ; Thu, 29 Nov 2001 01:28:22 -0500 (EST) X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: cs615 PAPER 62 Date: Thu, 29 Nov 2001 01:28:21 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4301E7F29B@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: cs615 PAPER 62 Thread-Index: AcF4nwplfBLdaOdYTq2S6/ZgQwBuNQ== From: "Venu Ramasubramanian" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fAT6SMT22854 Pastry: Scalable Decentralized Object Location and routing for large scale peer-peer systems PAST is a decentralized content storage and dsitribution system built fot large scale peer-peer systems. Pastry is the underlying routing used to locate the content based on key value obtained by hashing the content id. Much similar to CFS and chord in its design there are two significant differences. CFS stores data in block level granularity while PAST stores them in file level granularity. This gives a better load balancing performance for CFS at the expense of increased latency. Pastry attempts to locate physically closer data dources while Chord does not. PAST makes several heuristic efforts towards load balancing. The most important of which involves replica diversion and file diversion. Since the file sizes follow a long tail distribution and since participating nodes offer wide ranging amounts of storage load balancing is an extremely difficult task. Replica diversion takes the appraoch that if the empty space available for storage is less than a threshold then the replica will be instead stored in a different node. It is surprising to see that a threshold as low as 0.1 is required to make load balancing effective. As a result bigger files find it more difficult to find space. Although experimental results show that PAST does not do very badly on load balancing, I believe this issue is going to be a thorn in the flesh. The second interesting feature of pastry is the maintanance of neighborhood sets. These help pastry to locate nodes actually closer in the networl based on a metric like round trip time or number of hops. Although no guarantees are given pastry seems to be quite successful in finding closer servers for obtaining the file. However, the effort needed to maintain the neighborhood sets seems to be enormous (periodic queries to several other nodes) and might affect the scalability of the system. Pastry also does lazy maintanance of routing tables avoiding to a certain extant the thrashing problem faced by CFS. From papadp@ece.cornell.edu Thu Nov 29 05:44:14 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATAiET06626 for ; Thu, 29 Nov 2001 05:44:14 -0500 (EST) Received: from kiki.ece.cornell.edu (kiki.ece.cornell.edu [128.84.83.13]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATAfLM11786; Thu, 29 Nov 2001 05:41:21 -0500 Date: Thu, 29 Nov 2001 05:47:30 -0500 (EST) From: "Panagiotis (Panos) Papadimitratos" To: Emin Gun Sirer cc: Panagiotis Papadimitratos Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Review of:"Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility" & "Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems," by A. Rowstron and P. Druschel Panagiotis Papadimtratos papadp@ece.cornell.edu Pastry is a look-up and query forwarding service at the level of an overlaid network comprised of system nodes distributed throughout the Internet. It acts as the basis for PAST, a storage utility that distibutes files to peer nodes and primarily aims at high availability. The lookup and routing service organizes the nodes into a logical circle of node ID's chosen out of a 128-bit space, with node Ids generated so that uniqueness is achieved with high probability. This detaches the node ID from the node's network location and proximity at the Pastry is uncorrelated to proximity at the network layer. This is exactly addressed by the maintenance of a neighbor list and their IP addresses so that the 'closest' node can be selected. When a file is inserted into the system, the k nodes with IDs numerically closest to the inserted file ID are chosen. Similarly, Pasrty will route a request towards the nodes whose nodeID is closest to the 128 most significant bits of the fileID. This technique leads to the claim of an average logarithmic cost under favorable network conditions and correct state at all nodes. Furthermore, replication is used in order to increase fault tolerance. At the same time the issue of load balancing arises, in conjuction with the file insertion mechanism. PAST addresses such issues by employing the replica and file diversion technicques, which are controlled by specific policies, while maintaining the system invariant that a file has to be stored to k nodes. Note that in PAST files and not blocks are replicated/cachced/stored (in contrast with CFS). The requirement for correct state (routing table, neighbor list) in order to achieve efficiency could be proven restrictive if malicious nodes were present. Given the cost to maintain/update this state, the employment of cryptography in order to safe-guard this traffic would pose a severe obstacle. It would be interesting to see what might be the solution in the 'forthcoming paper' (sec.2.3.). From avneesh@csl.cornell.edu Thu Nov 29 10:53:27 2001 Received: from capricorn.ds.csl.cornell.edu (capricorn.csl.cornell.edu [132.236.71.92]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATFrRT01328 for ; Thu, 29 Nov 2001 10:53:27 -0500 (EST) content-class: urn:content-classes:message Subject: 615 Paper 62 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Thu, 29 Nov 2001 10:56:16 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <97C142C1212ED545B0023A177F5349C4056698@capricorn.ds.csl.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 Paper 62 Thread-Index: AcF47mC2uib7mL3RRsyhI1Jn5cMXsQ== From: "Avneesh Bhatnagar" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fATFrRT01328 Pastry: Scalable, decentralized object location and routing for large-scale peer to peer systems. Summary/Critique. The PAstry protocol provides an object location and routing layer for peer to peer applications. This protocol is different from Chord in that first of all, messages are not forwarded on the basis of numerical difference, but instead attempts to achieve better network locality by mantaining more data structures to optimize routing. However the similarity with chord is that each node is positioned in a circular node id space, though the node id in this case is 128 bits. The node id can be generated by computing a cryptographic hash (SHA1 in this case) of its public key or IP address. Routing: A message is routed by a node to another node which shares with a key which contains a prefix that is atleast b bits longer than the prefix that the key shares with the present node's id. If that does not work, then the message is forwarded to a node whose nodeId shares a prefix not only as long s the current node, _but_ also numerically closer to the present node id. In order to achieve these goals, a node mantains the following data structures: a. Routing table: This typically contains log(base 2^b) N rows with 2^b-1 entries each. The nth row of the routing table contains nodes that share the first n digits of the current node's id. The table entry will be left empty if no node is known to have the property. There is a tradeoff involved in the choice of the base b, since smaller b may increase the number of hops required to get to a node. Since routing looks at key prefix information, the set of nodes decreases by 2^b for each routing step, which implies that optimally the destination can be reached in log(base 2^b) steps. b. Neighborhood set: This contains nodeIds and IP addresses of M nodes that are closest to the local node. This set is used to mantain locality information, and each node proactively mantains the freshness of this set. c. The leaf set is used in routing. This contains |L|/2 numerically closest larger node IDs and |L|/2 smaller node Ids. The advantage is that if a node is found within this set, then it can be reached within one hop. Join/Leave: When a node joins the network, Pastry tries to route the join message to the node that is closest (assume Z whose id is numerically closest) to the joining node. All nodes involved in the path from joining node to the destination then send their state tables to the new node, so that it can instantiate its own state. There are two different things happening here though, Z is numerically closest so it is used to initialize the leaf set, while the neighborhood set is initialized using the proximally closest node's neighborhood set. The newly joined node transmits its state to the other nodes in its neighbor set, leaf set and routing table. Also each row in the new node's routing table reflects entries correspoding to nodes that were used to route the request to Z. So the first node's first row entries correspond to the first row entry of the new node. Node departure is handled when a query from a neighbor is not received in a particular amount of time. In such a case, the leaf set from a node on the side of the failed node is requested for and queried to create a new leaf set. In order to tolerate arbitarary node failures, routing can be randomized and or an expanding ring IPmulticast search can be done, to repair a partitioned network. The experimental results show that Pastry performs favorably in terms of number of hops required to route a message as fairly optmial routing distance. Node state repair facilities improve lookups when arbitrary failures occur. However some issues are left open: a. What happens when nodes join concurrently? There seems to be an explicit mechanism in Chord for this. b. The relative overheads in terms of usefull data sent and control traffic to mantain freshness information should have been shown. However I think that this paper is quite interesting, due to the gurantees of locality that it provides for locating a node. **************** Storage Management and caching in PAST. Summary/Critique: This paper presents PAST, a peer to peer storage utility, which primarily acts as an archival storage utility and content distribution utility. Thus unlike CFS, it is not intended as a general-purpose file system. PAST consists of nodes running on the pastry routing mechanism. File insertion in PAST requires a fieldID to be computed by taking a SHA1 hash of the file's textual name (CFS takes content hash), clients public key and a ransom salt. During insertion, the file is actually stored on k nodes closest (numerically as in pastry), to the 128 msbs of the file's id fields. This provides the required scalability and fault tolerance but also introduces new issues. In order to limit the amount of data that can be added to the PAST system by a node, a quota mechanism is used. A file certificate is also used to authenticate the client. This is verfied before insertion by the k nodes. The lookup operation proceeds with requested fileID being used for locating the closest node. There is also a 'smart card' support which prevents malicious nodes from arbitrarily directing requests. This feature is interesting, since this was my main concern with reliance onfinger tables in CFS. In order to facilitate efficient storage management, the authors introduce replica diversion and file diversion to satsify othogonal constraints of mantaining copies at k adjacent nodes and having efficient load balancing in the system. Replica diversion occurs when a node is unable to store a file locally, in that case it chooses a node in its leaf set, which is not among the k closest nodes and does not already hold a diverted replica to store a file. A pointer is then set to point to this new node. Also to improve fault tolerance within this arrangement a pointer to the storage node is also mantained at a node at distance k+1 from the field id. Replica diversion should also be done with care, since it might incur high latencies. PAST employs several policies, one of which is based upon the size metric Sd/Fn which is the ratio of the file size vs the remaining disk space. This policy discriminates against large files being diverted. During a join operation, a new node might become one of the k nodes of another node for replica diversion. In this case it might have to request replicas for all files for which it provides this property. To optimize this situation, the node installs pointers to a pervious node that might have ceased to be part of the k set. I am not sure about the scalability of this, since failure of the pointed to node, leaves dangling pointers. This means that apart from having pointers, the node should also query the pointed to node, to verify is its still alive. The authors also mention file encoding and division of a file into blocks which can be stored separately as a future research issue, but this seems to have been well handled in CFS. The final storage management technique that is used is file caching, with files being cached in the unused portion of the node's advertised disk space. There is no consistency/coherency or correctness mechanism for these entries. The replacment policy is the Greedy-Dual-Size (GDS) policy. The authors evaluate the performance of PAST under conditions of varying threshold values of replica and file diversion with four normal disttribution values. The tradeoffs of maximizing disk utilization vs availability are highlighted. Increasing the leaf set also maximizes availability, since it would prevent reverting more often to the fall back mechanims. Caching improves lookup, but at high utlization for large files, file cache replacments start dominating. However in the general case, smaller files are more in number so there is enough cache space. Finally, it seems that very low thresholds dominate load balancing performance, which again favors smaller file sizes. From eyh5@ee.cornell.edu Thu Nov 29 11:21:37 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGLbT06781 for ; Thu, 29 Nov 2001 11:21:37 -0500 (EST) Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATGIhM18602 for ; Thu, 29 Nov 2001 11:18:43 -0500 Date: Thu, 29 Nov 2001 11:18:46 -0500 (EST) From: Edward Hua X-X-Sender: To: Subject: 615 Paper # 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Pastry: Scalable, Decentralized Object Loacation and Routing for Large-scale Peer-to-Peer Systems Antony Rowstron, Peter Druschel This paper introduces Pastry, a scalable, distributed object location and routing substrate for wide-area peer-to-peer applications. Each node in the Pastry network has a unique numeric nodeId. These nodeIds are assigned randomly, using a SHA-1 hash of its IP address, and are diverse in geography, ownership, jurisdiction, etc. The node routes a client message to the node with a nodeId that is numerically closest to the key. Pastry seeks to minimize the distance messages travel, according to a scalar proximity metric like the number of IP routing hops. Pastry supports a variety of applications such as PAST and SCRIBE. The routing algorithm of Pastry uses a routing table in case the key of the received message does not fall within the range of nodeIds covered by the node's leaf set. The message, in this case, is forwarded to a node listed in the routing table entries that shares a common prefix with the key by at least one more digit. In the unlikely event that no such entry exists, the message is forwarded to a node that shares a prefix with the key at least as long as the local node, and is numerically closer to the key than the present node's Id. This procedure guarantees to converge, as long as at least half of the leafset nodes maintained in the local node have failed simultaneously. Several aspects of the Pastry operation are addressed in this paper. First, the joining of a new node to the Pastry network is done by the new node asking its immediate known neighbor node to send a "join" message on its behalf to a node whose nodeId is numerically closest to the new node. The new node receives all state tables from all intermediate nodes, based on which it compiles its own initial routing table. Given its decentralized nature, Pastry conforms to the notion of network proximity based on a scalar proximity metric, such as the routing hops or geographic distance. It seeks to ensure that routing table entries are chosen to provide good locality properties. In case of node failures or network partitions, Pastry addresses this problem by randomizing the the choice of a node in the routing process that shares a longer prefix with the destination, or shares the same prefix length as the current node but numerically closer than the current node. In addition, Pastry IP multicast may be used to reach hosts that may not be previously reachable due to partitions in the network. The Pastry routing algorithm provides a good bound on the number of hops a message must traverse in the network before reaching a destination node, improving savings in delay time. Also can be shown in the performance evaluation is that Pastry is very adaptive to changes in the network, be it node addition or failure. On the other hand, the Pastry does not address the load balancing issue, as it does not in itself produce file replicas in mulitple nodes of the network. This function is in fact taken care of by applications built on top of Pastry. From daehyun@csl.cornell.edu Thu Nov 29 11:45:17 2001 Received: from wilkes.csl.cornell.edu (wilkes.csl.cornell.edu [132.236.71.69]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGjHT10629 for ; Thu, 29 Nov 2001 11:45:17 -0500 (EST) Received: (from daehyun@localhost) by wilkes.csl.cornell.edu (8.9.3/8.9.2) id LAA70235 for egs@cs.cornell.edu; Thu, 29 Nov 2001 11:45:12 -0500 (EST) (envelope-from daehyun) From: Daehyun Kim Message-Id: <200111291645.LAA70235@wilkes.csl.cornell.edu> Subject: 615 PAPER 62 To: egs@CS.Cornell.EDU Date: Thu, 29 Nov 2001 11:45:11 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit This paper presents PAST which is a large scale peer-to-peer persistent storage utility. PAST is based on a self-organizing internet-based overlay network of storage nodes that cooperatively route file queries, store multiple replicas of files, and cache additional copies of popular files. The PAST system is composed of nodes connected to the internet, where each node is capable of initiating and routing client requests to insert or retrieve files. Node may also contribute storage to the system. PAST exports the following operations to clients. 1) Insert: Stores a file and return a FileID. 2) Lookup: Retrieves a copy of the file identified by FileID. 3) Reclaim: Reclaims the storage occupied by the file. Inserted files are replicated across multiple nodes for availability. So, the PAST system can provide services though some nodes might fail. Pastry is an efficient routing scheme used in PAST, which routes client requests to a node that is close to the client and stores the requested file. The complexity of Pastry operations is logarithmic, so it is scalability. The PAST system uses ID system for security. Files are assigned quasi-unique FileID when they are stored into the PAST system. To retrieve a file, a client must know its field and ,if necessary, its decryption key. The PAST system is a internet-based peer-to-peer file storage system. It provides user interface similar to conventional file systems. But, in my opinion, there are fundamental differences. Most of all, it does not guarantee the completeness. In conventional files systems, files inserted once will exist in the system until user delete them explicitly. But, in the Past system, files may be deleted anytime. Second, load balance and security is more important in the PAST system than in conventional file systems, because it is designed for large scale networks. So, a special routing scheme is required, and Pastry is a routing protocol used in the PAST system. In my opinion, though the PAST system provides a powerful network storage system, it is not still as convenient as conventional file systems. It is because there are more requirements such as scalability for the systems in large scale networks. However, I think the PAST system need to overcome those limitation to provide the same level of continence as conventional file systems.. (The same comments can be applied to the CFS and Chord system.) From viran@csl.cornell.edu Thu Nov 29 11:54:30 2001 Received: from moore.csl.cornell.edu (moore.csl.cornell.edu [132.236.71.83]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATGsUT12567 for ; Thu, 29 Nov 2001 11:54:30 -0500 (EST) Received: from localhost (viran@localhost) by moore.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fATGsKT73884 for ; Thu, 29 Nov 2001 11:54:20 -0500 (EST) (envelope-from viran@moore.csl.cornell.edu) X-Authentication-Warning: moore.csl.cornell.edu: viran owned process doing -bs Date: Thu, 29 Nov 2001 11:54:20 -0500 (EST) From: "Virantha N. Ekanayake" To: Subject: 615 Paper 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Pastry is a completely decentralized, fault-resilient peer-to-peer object location and routing scheme. It's based on an overlay network of nodes connected via the Internet. Each node is assigned a node id that is randomly generated to fall uniformly between 0 and 2^128-1. A routing table is kept that holds node mappings (node id to IP address) that share common prefixes with the current node id. Since there are many nodes that share each prefix, a mapping that is geographically close to the current node (based on number of hops away) is chosen to be in the routing table. This ensures some locality, since there is no correspondence between adjacent node ids and actualy locations, which was a drawback in systems like Chord. A leaf set is also kept that keeps mappings of node ids that are numerically closest (both above and below) to the current node id. A neighbourhood set is also kept that holds the mappings of nodes that are physically closest. This table is not used for routing, but for maintaining locality. Since the routing tables are kept small (for scaling purposes), Pastry routes are about 30-40% longer than the ideal route. It routes to any node in O(log N) steps and the route tables are also O(logN) entries. Since Pastry takes into account locality, the throughput of the system is also fairly good. PAST is a file-storage utility overlayed on the network that leverages Pastry as the location and routing layer. Files have a fileID, which is quasi-unique and generated at the file's creation. Files cannot be changed once inserted into the system, and thus the file system lacks the normal semantics. From ranveer@CS.Cornell.EDU Thu Nov 29 12:25:24 2001 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATHPOT17977 for ; Thu, 29 Nov 2001 12:25:24 -0500 (EST) Subject: 615 PAPER 62 MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Date: Thu, 29 Nov 2001 12:25:24 -0500 content-class: urn:content-classes:message X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <706871B20764CD449DB0E8E3D81C4D430232E6BE@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 62 Thread-Index: AcF4+tRnB4APQogyQauEf8ZuJK0A6A== From: "Ranveer Chandra" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id fATHPOT17977 Pastry and PAST A. Rowstrom and P. Druschel PAST is a scalable, peer-to-peer persistent storage utility. It is similar to CFS in its objectives, that is to provide a decentralized storage for large scale colaborative networks. PAST uses an underlying routing protocol, called Pastry, which is again similar in functionality to Chord used by CFS and exceedingly similar to Taperstry developed at Berkeley. Pastry maintains a ring like structure similar to Chord. However, each node maintains different data structures called the leaf and neighbor table. The leaf table stores 'l/2' successors and 'l/2' predecessors along the ring, and the neighbor table stores 'l' closest nodes according to a proximity metric. The neighborhood information helps Pastry to route requests to a node topologically closer to the requestor, as opposed to Chord. Another contrasting feature of PAST to CFS is the use of file-level storage. Both the schemes, i.e. file-level storage of PAST and block-level storage of CFS offers its own advantages and disadvantages. The way PAST handles file-level storage is interesting. Storing files rather than blocks requires a large amount of space. Further, the 'k' successors in the ring might not have suffcient space to store the file. PAST proposes Replica and File Diversion to overcome this problem. This also requires more maintenance to ensure the utilization of replicas, but PAST does good work to ensure it. PAST also does good work in load balancing. The proposed scheme has lesser overhead with a weaker load balancing which might be more appropriate in large scale systems. A minor problem overlooked by PAST is the overlapping of nodeIds or FileIds by the hash function. Although the possibility of this occuring is remote, if it occurs the result is disasterous. Additionally, the design principle of file storage as opposed to block storage of CFS renders PAST a bigger victim of fragmentation which could be a big overhead. Overall, PAST does a great job of building a scalable collaborative storage utility. A great systems work! From mh97@cornell.edu Thu Nov 29 13:01:12 2001 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATI1CT25075 for ; Thu, 29 Nov 2001 13:01:12 -0500 (EST) Received: from mars (syr-66-24-28-66.twcny.rr.com [66.24.28.66]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id NAA27908 for ; Thu, 29 Nov 2001 13:01:10 -0500 (EST) From: "hao ming" To: "'Emin Gun Sirer'" Subject: 615 PAPER 62 Date: Thu, 29 Nov 2001 13:00:59 -0500 Message-ID: <000901c178ff$cda91b50$6901a8c0@mars> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.2627 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 In-Reply-To: <200111281523.fASFN1q19899@zinger.cs.cornell.edu> Importance: Normal Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems By Antony Rowstron and Peter Druschel this paper presents a distributed and scalable system providing services like locating objects and naming. it is on top of IP address scheme. Essentially, it provides a mapping between key and IP address via node ID. the system works in the following way: 1. each node has a unique id which is 128 bits long 2. each node maintains a routing table which is organized in this way: a. it has a left set which contains numerically most closest nodes to local node. b. a routing table with each row stores entries which has one longer similar prefix with local node. i.e entries in row 0 has no similar prefix to local node. And entries in routing table become farther away from local node as increase of row number. i.e. entries n row 0 is most closest to local 3. when routing, leaf set is checked first, then the row with appropriate prefix. This basically guarantee message forwarded adopts a bigger and bigger stride to the destination. Some nice properties of this system : 1. Locality is ensured by routing the nodes to closest nodes and new nodes get Routing table info from appropriate rows at different nodes along a Routing path. 2. fault tolerance is maintained by storing info no K nodes with most closet node Ids. 3 .scalability is achieved by spreading K nodes with most closest node Ids physically far away. final comments: 1. is it possible to map key directly to the participating IP nodes without this middle substrate? 2.it is hard to compare all these big system without thorough simulation or practical deployment. But this system was developed by Microsoft and probably will become a standard API even its performance is not best. -ming From teifel@csl.cornell.edu Thu Nov 29 13:55:39 2001 Received: from disney.csl.cornell.edu (disney.csl.cornell.edu [132.236.71.87]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATItdT05146 for ; Thu, 29 Nov 2001 13:55:39 -0500 (EST) Received: from localhost (teifel@localhost) by disney.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fATItTe50048 for ; Thu, 29 Nov 2001 13:55:29 -0500 (EST) (envelope-from teifel@disney.csl.cornell.edu) X-Authentication-Warning: disney.csl.cornell.edu: teifel owned process doing -bs Date: Thu, 29 Nov 2001 13:55:29 -0500 (EST) From: "John R. Teifel" To: Subject: 615 PAPER 62 Message-ID: <20011129124615.L38847-100000@disney.csl.cornell.edu> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII PAST & Pastry: Pastry is a peer-to-peer routing substrate that is efficient, scalable, fault resilient and self-organizing. It performs application-level routing and object location in a potentially very large overlay network of nodes connected by the internet. It is used to support peer-to-peer applications such as global data storage, data sharing, group communication and naming. The main technical feature of Pastry is each node has a unique identifier, which the nodes use to set up nodeId spaces that are used in routing. Each node tracks its local neighbors to keep upto date on network topology. Pastry is completely decentralized and experimental implementation results on a 100,000 node setup show Pastry's efficiency and scalability. Simulation results and figures were satisfactory in demonstrating the feasibility of Pastry. PAST is a large-scale peer-to-peer persistent storage utility. It is self-organizing and consists of an internet-based overlay network of storage nodes that perform distributed file queries, store multiple replicas of files, and cache popular files. The PAST paper concentrated on evaluating storage management techniques in this environment, mainly on how non-uniform nodal storage and file sizes affects load balancing performance on high capacity systems. Except for the fact that the code was written in Java, I felt that the experimental results and evaluation sections were rather good and demonstrated well that the PAST storage management scheme is feasible and sound from a technological perspective. From samar@ece.cornell.edu Thu Nov 29 14:21:59 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATJLwT11517 for ; Thu, 29 Nov 2001 14:21:59 -0500 (EST) Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fATJJ3M24438 for ; Thu, 29 Nov 2001 14:19:03 -0500 Date: Thu, 29 Nov 2001 14:19:06 -0500 (EST) From: Prince Samar To: Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems & Storage management and caching in PAST, a large-scale persistent peer-to-peer storage utility PAST is a peer-to-peer global storage utility for the internet. It builds on top of Pastry, which is a content location and routing system based on a self-organizing overlay network of nodes. Each PAST node is assigned a 128-bit node identifier which are arranged in a circular structure. This nodeid arrangement is quasi-random, implying that the nodes with proximity in the nodeid circular arrangement are probably diverse in geographic location, network connectivity, ownership or jurisdiction. This property makes these nodes a good choice for storing the replicas of a file, as the node in the set are unlikely to conspire or be subject to correlated failures or threats. k PAST nodes with numerically closest nodeid's are chosen during the insert operation for caching the files. k is chosen to meet the availability needs of the file, relative to the expected failure rates of individual nodes. PAST is layered on top of Pastry, which is designed to be efficient, scalable, fault resilient and self organizing. Given a fileid, Pastry routes the corresponding messages towards the node whose nodeid is numerically closest to the 128 msbs of the fileid, among all live nodes. Pastry maintains a neighborhood set based on a metric like round trip time or number of hops, which helps in locating a node actually closer to itself in the network. This is an interesting idea which is not present in Chord, though it is associated with the cost of making periodic queries to several nodes. A point to be noted is that in PAST actual files (and not blocks as in CFS) are cached or replicated. This could lead to unbalanced loads on the nodes, especially if the file sizes are disproportionate. From jcb35@cornell.edu Thu Nov 29 14:29:19 2001 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fATJTJT12536 for ; Thu, 29 Nov 2001 14:29:19 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id OAA29868 for ; Thu, 29 Nov 2001 14:29:16 -0500 (EST) From: jcb35@cornell.edu Date: Thu, 29 Nov 2001 14:29:15 -0500 (EST) X-Sender: jcb35@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII PAST provides a large-scale peer-to-peer persistent storage utility. PAST operates on top of Pastry, which is a peer-to-peer request routing and content location scheme. Pastry strives to design a scalable distributed object location for wide-area and peer-to-peer applications. Each node keeps track of certain tables which contain the following: routing table - entries at row n contain a node whose nodeId shares a n digits, but the next digit has one of 2^b - 1 possible values. This is used to ensure that a node is no more than log_b N hops away (for a pre-determined b and N nodes). leaf set - nodes with the closest node ids. neighborhood set - these nodes are the nodes which are closest to the local node - these are used in locality properties. Pastry provides mechanisms for joining (it replicates information about its routing table to other nodes before it begins routing packets) and assumes nodes will depart or fail without warning. Using this, PAST systems allow clients to insert, lookup, and reclaim a file. PAST will distribute the file over k different nodes, determined by the client when they insert the file. The reclaim function basically deletes the file, but does not guarantee that future lookups will fail. I thought they provided another interesting approach to peer-to-peer object location, offering another method to achieve location within logN steps. I wonder about some of the specifics like what happens when multiple nodes try to join simulateously, and also how the PAST system leaves much of the replication up to the application - could this be better solved at a system level?