From ag75@cornell.edu Tue Nov 5 00:33:48 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA55Xmc06487 for ; Tue, 5 Nov 2002 00:33:48 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id AAA09596 for ; Tue, 5 Nov 2002 00:33:44 -0500 (EST) Date: Tue, 5 Nov 2002 00:33:44 -0500 (EST) From: ag75@cornell.edu X-Sender: ag75@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII This week we were presented with 2 protocols for large-scale distributed storage. Both of them are based on the ideas provided by Napster and Freenet and they are quite similar to each other. The protocols differ in how they store files and thus, the kind of load balancing and caching they have to do. PAST is a large-scale peer-to-peer persistent storage utility. Its goals are to have strong persistence, high availability, scalability and security. Unlike Freenet, if you put a file in PAST it is guaranteed to be there; however, PAST does not provide anonymity. PAST is intended to be used for archival storage and content distribution; it is not a general purpose operating system. PAST does not allow file to be modified, it does not have an explicit delete operation, and it does not provide facilities for searching, directory lookup or key distribution. PAST manges to achieve high storage utilization while rejecting few file insert requests. It also provides replication to ensure high availability. However, PAST would not work well in an ad-hoc environment where nodes come and go frequently. CFS is similar to PAST in many ways, but there are differences. While PAST uses Pastry for routing, CFS uses Chord, so there are differences in how things are routed. Unlike PAST, CFS stores data for an agreed-upon finite interval. While this does not guarantee file persistence indefinately, publishers can achieve the same effect by asking for extension periodically. This also helps CFS recover automatically from malicious insertions of large quantities of data. Load balancing is also different in the two protocols. CFS uses virtual servers, leaving it to the administrator to figure out the number of virtual servers to be used, while PAST uses a more complex load balancing scheme involving file and replica diversions. The biggest difference between CFS and PAST is in how they store files. CFS splits the files into blocks, while PAST keeps the whole file together. This simplifies load balancing in CFS, but increases the number of messages to fetch the whole file. Like PAST, CFS does not seem suitable in an environment with sensors either because it assumes that storage is not a constrained resource. This is not likely to be the case in a sensor network. From bd39@cornell.edu Tue Nov 5 01:57:02 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA56v2c21657 for ; Tue, 5 Nov 2002 01:57:02 -0500 (EST) Received: from boweilaptop.cornell.edu (r102439.resnet.cornell.edu [128.253.163.42]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id BAA26866 for ; Tue, 5 Nov 2002 01:57:02 -0500 (EST) Message-Id: <5.1.0.14.2.20021105000844.00ba6b88@postoffice2.mail.cornell.edu> X-Sender: bd39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 05 Nov 2002 00:19:23 -0500 To: egs@CS.Cornell.EDU From: Bowei Du Subject: 615 PAPER 62 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed PAPER 62 - PAST - Past is a peer-to-peer persistent storage utility, which performs adaptive services such as replication and caching of files for load balancing and performance. The goals of PAST are similar to the scalability portion of Freenet. PAST uses PASTRY as its overlay network. Files in PAST are identified with a hash key unique to particular insertion of a file into PAST. Files are replicated across the overlay network by storing copies of the inserted file in the k nodes in the PASTRY network with the closest hash values. File are signed by the client using them, and the replication among the overlay network nodes is ack'ed with a storage receipt. PAST load balances the files stored in the system through two mechanisms, replica diversion and file diversion. Both of these try to address the fact that even though files may be distributed uniformly by the random hash function, their sizes maybe very different, inducing load imbalance. Replica diversion involves storing the file not at the node itself, but at the a node in the leaf table of the node. File diversion is simply the process of generating different id's for the same file whenever a file insert fails, in an attempt to fit the file somewhere in the file space. In a replica diversion, what if both the node and its k+1 neighbor fail at the same time. Then the datum will be stranded. Also, it is rather unspecified how forwarding links (when new nodes appear on the network) will eventually have the matching file migrated, paper notes this should be a "background" process. This seems important because it affects the invariant maintainence of k replicas of a file. Also, how does one remove a file from the network? Failure scenarios in general are not well discussed. PAST also performs some manner of caching along nodes that receive high number of requests. The unused portion of their PAST storage can be used to cache file. The metric used for cache eviction is a ratio of cost of the file vs. the size of the file. One question is what happens to performance as the system becomes full, to the point where the caches are no longer useful? Evaluations of PAST show that load balancing is definitely needed to achieve a good utilization of the resources. One notable element that is missing from the evaluation is the affect of node joins/failures. - CFS - CFS is a peer-to-peer file system running on top of Chord. The CFS files system consists of a set of blocks of data which are distributed over the nodes in the overlay network. CFS distinguishes itself from PAST by breaking files into blocks which are more more easily distributed in the network. It seems that block grainularity will have a large effect on the performance of this system, because continguous blocks maybe located on servers that are (geographically) distant. Latency to access the file will the maximum of all of the accesses to the blocks combined. Block in CFS are organized similar to UNIX filesystem blocks. Replication of a block for failure tolerance is done by duplicating the block in the successor nodes. Blocks are cached along the request path, with the intuition that request paths to the same block will intersect towards the end of the search path. For the cache evaluation, it would be nice to know how large the caches were for each of the nodes. The evaluation shows a pretty good resilience to failures, chord averages around 6 lookups overall in the face of up to 50% node failure. It would be interesting to see the graph of lookup latency vs node failure fraction as well. -- What properties differentiate file sharing based on PAST and CFS from Freenet? Can these systems be used to share files securely in an environment with censors? What properties are missing in CFS/PAST that are traditionally found in filesystems? What additional mechanisms are necessary? One important distinction between these filesystems and Freenet is the lack of anonymity. Freenet took great pains to probabilistically route queries and replies so the originator of the query could not be traced. Freenet's design led to properties which are undesirable, such as the potential for existence of a file, yet the query will fail. Also, Freenet offered no guarantees for anything stored in the network, being a giant collective LRU cache. Freenet is sensitive to a DoS attack in which garbage is flooded into the system to eliminate censored files. The timed deletion of CFS file entries maybe a problematic for publishers with no good way of maintaining files periodically in the system. CFS/PAST present a flat namespace filesystem, which like Freenet, is unsearchable. From xz56@cornell.edu Tue Nov 5 10:45:03 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA5Fj3c13744 for ; Tue, 5 Nov 2002 10:45:03 -0500 (EST) Received: from Xin (dhcp-ece-171.ece.cornell.edu [132.236.232.171]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id KAA29976 for ; Tue, 5 Nov 2002 10:45:02 -0500 (EST) Message-ID: <016901c284e2$45092de0$abe8ec84@Xin> From: "Xin Zhang" To: "Emin Gun Sirer" Subject: 615 PAPER 62 Date: Tue, 5 Nov 2002 10:44:45 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="gb2312" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1106 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1106 PAST is an Internet-based global peer-to-peer storage utility. Each node is an access point for users, as well as contributes to file storage and routing. In PAST the nodes and files are assigned uniformly distributed identifiers, and replicas of a file are stored at nodes whose identifier matches most closely the nodes. This approach approximately balances the number of files stored on each node. So the main contributions of PAST are its storage management and caching which rely only on local coordination. To achieve the goal of storage management, namely ensure the availability of files while balancing the load. Besides using the uniform hashing for assignment of identifiers, there are two other additional operations: replica diversion and file diversion. Replica diversion is performed when a node is not the k closest to fileID but is in leaf set of one of those. File diversion is performed when less than k nodes can be found available (both those whose nodeID's are close to fileID and nodes in these nodes' leafset). These two diversions help to achieve global balance. Caching helps to minimize client access latencies and maximize the query throughput. Nodes use the "unused" portion of their advertised disk space to cache files and cached copies can be evicted and discarded at any time. One thing degrades the load balance in PAST storage management is that files sizes may differ greatly among one another. CFS is a read-only storage system. It consists two layers: DHash and Chord. It does not attempt to provide anonymity, but focuses on efficiency and robustness. Also it's vulnerable to malicious participants. The main difference of CFS from PAST is that PAST servers store the whole files while CFS stores blocks and spreads blocks evenly among the available servers. The system is read-only in the sense that only publishers can update the file system. CFS assigns quota to each IP to fight the malicious injection. Compared to Freenet, Freenet focuses more on strong anonymity and anti-censorship and doesn't guarantee permanent file storage, while PAST and CFS are persistent storage utilities and don't offer anonymity. From liuhz@CS.Cornell.EDU Tue Nov 5 11:29:25 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA5GTPc25722 for ; Tue, 5 Nov 2002 11:29:25 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 62 Date: Tue, 5 Nov 2002 11:29:25 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302CEE6C3@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 62 Thread-Index: AcKE6ID7+PfProP6Qd2lxZhEmp58kw== From: "Hongzhou Liu" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id gA5GTPc25722 These two papers present PAST and CFS, two peer-to-peer storage systems. Unlike Freenet, both systems are built on top of some peer-to-peer location and routing subtrates, which guarantees that lookups in both system can be more efficient and likely to find the requested data than that in Freenet. Both systems are more reliable than Freenet and they provides different levels of persistence while Freenet doesn't. A important goal of Freenet is to provide anonimity, while CFS focuses on efficiency and robustness and FAST aims to provide strong persistence, high availability and scability. A big difference between PAST and CFS is the approach to load balance. a PAST server stores whole files. As a result, a server might not have enough disk space to store a large even though the system as a whole has sufficient free space. A PAST server solves this by offloading files it is responsible for to servers that do have spare disk space. PAST handles the load of serving popular files by caching them along the lookup path. CFS stores blocks, rather than whole files, and spreads blocks evenly over the available servers; this prevents large files from causing unbalanced use of storages. CFS's block storage granularity helps it handle the load of serving popular large files, since the serving load is spead over many servers along with the blocks. This is more space -efficient, for large files, than whole-file granularity. However, finer granularity of storage increases the number of messages required to fetch a whole file, since a client must look up each block separately. CFS hides the block lookup latency by prefetching blocks. More specifically, PAST achieves load balance by two different schemes: replica diversion and file diversion. Replica diversion balances load among the nodes in the leaf set of a node. k replicas of a file are supposed to be stored at the k numerially closest nodes to the fileID. However, if one of those k nodes has not enough free space for a large replica, it can choose one node from its leaf set to acomondate the replica. File diversion is performed when a node's entire leaf set is reaching capacity. In such case, a file is diverted to a different part of the nodeID space by choosing a different salt in the generation of its fileID. CFS uses "virtual servers" to banlance load among servers with different capacities. In CFS, a real server acts as multiple virtual servers. A virtual server uses a Chord ID tht is derived from hashing both the real server's IP address and the index of the virtual server within the real server. The number of virtual servers in a real server is proportional to its storage and network capacity. By chaning the number of virtual servers on a real server, we can control the load on the real server flexibly. Both systems provide weak quota schemes, which prevent a malicious node from injecting a large amount of trash data that will exhaut the systems' storage. Both systems guarantee persistence, although at different levels. CFS provides only weak persistence. CFS guarantees that once it commits to storing data, it keeps it available at least an agreed-on interval. After this interval, the publisher of the data has to re-publish the data, otherwise they may be lost. PAST provides very strong persistence. It guarantees that once data are summited to the system, they will be stored in the system forever unless you reclaim them(The semantics of reclaiming is much weaker though). PAST maintains an invariant that copies of each file are stored by the k nodes whose nodeIDs are closest to the fileID, which guarantees the availability of the file at any time. Strictly, we can not say CFS is a file system. It's just a file sharing system because it doesn't guarantee persistence, which is a fundermental goal of conventional file system. While PAST does guarantee strong persistence, it is less flexible compared to tranditional file system. We can not modify and write data to files in PAST system as freely as in a traditional file system. From smw17@cornell.edu Tue Nov 5 11:30:15 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA5GUFc25875 for ; Tue, 5 Nov 2002 11:30:15 -0500 (EST) Received: from cornell.edu (syr-24-161-107-202.twcny.rr.com [24.161.107.202]) by cornell.edu (8.9.3/8.9.3) with ESMTP id LAA26379 for ; Tue, 5 Nov 2002 11:30:15 -0500 (EST) Message-ID: <3DC7F0F9.2010208@cornell.edu> Date: Tue, 05 Nov 2002 11:25:29 -0500 From: Sean Welch User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Emin Gun Sirer Subject: 615 PAPER 62 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit PAST/CFS CFS is a storage system built over the Chord routing layer designed to provide a decentralized file storage system. A read-only file system interface is provided to the system (publishers can modify data through a seperate procedure). CFS breaks individual files into sub-blocks before submitting them to the network to improve load balancing, and distributes the files to k different consecutive hosts such that these hosts are successors of the original successor. Given that the ring ID's are determined from a hash of the IP address, successive node ID's are likely to be as well distributed as an average random collection of nodes. The presented implementation cached data throughout the lookup path with an LRU eviction policy. Cached copies are logically distinct from replicated copies, so that unpopular files are not removed from the network (as in Freenet). CFS makes use of multiple virtual servers on each node to improve load balancing, and to help ameliorate the inequities possible in random key distribution schemes. In addition, the use of virtual servers provides users the ability to tune their level of participation based on the capabilities of their node. CFS provides for a basic per-machine quota, based on the lack of reliable authentication of individuals. CFS allows for updates, but provides no mechanism for explicit deletion, instead relying on eventual cache eviction to remove data. PAST is fairly similar to CFS (as Pastry is similar to Chord), also seeking to create a distributed file storage network. Rather than re-describing the details, I will explain the major differences between CFS and PAST. PAST achieves its file redundancy distributing the file to k different nodes, but unlike CFS, PAST stores whole files and allows much more flexibility in the placement of the data items. If node capacities are nearing capacity, PAST provides for diversion of the replica data, either to the leaf set of the preferred replicas, or to a completely different location (new salt value) when there is no space available throughout the leaf set. Nodes may also retain pointers to the actual data, allowing for the addition of new nodes while relagating the associated data transfer to the background. Nodes are responsible for detecting failures and re-replicating data. PAST caches data based on the Greedy Dual-Size policy, and places caches in currently unused storage space. PAST assumes that each client has a specified store quota and a smart card for user authentication. While not unreasonable, this does involve some form of central trusted authority in the system capable of maintaining this information. As a result of storing whole files, PAST exhibits a preference to small files, especially as the total storage in the network approaches capacity. Neither CFS nor PAST are explicitly designed for security, leaving secure and anonymous transfer to higher layer implementation. Both are capable of storing encrypted data, and obscuring the real content from the network. CFS should be capable of providing anonymous communications, subject (of course) to the caveats of well-chosen compromised nodes. PAST is a much harder problem, as the centralized authentication mechanism used provides a concentration point, allowing a censor to restrict information from individual users by denying authentication. Indeed, the existence of the storage quotas provides for additional attacks and traffic analysis of stored data. The PAST system is not well suited to providing strong user anonimity. PAST and CFS provide a very basic read-only storage mechanism, one which provides for the retrieval of a single file from a known fileID. Basic file system structures such as directories are not present, and would need to be emulated through means similar to those presented in Freenet. FileID distribution is also a sore point - unlike a unix filesystem, there are no convienient names associated with a data item stored at a node, and no simple way to perform searches of the filesystem. The requirement that a node know the fileID suggests that another out-of-band mechanism is required to efficiently distribute (filename, key). Going back to the censorship issue, this fileID distribution becomes a major complication, as one need not remove the data itself to maintain a largely effective censorship policy, but merely prevent people from obtaining legitimate keys to find the data pairs. This is a weakness in both CFS and PAST. From piyooshj@yahoo.com Tue Nov 5 12:32:06 2002 Received: from web10904.mail.yahoo.com (web10904.mail.yahoo.com [216.136.131.40]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with SMTP id gA5HW5c12883 for ; Tue, 5 Nov 2002 12:32:05 -0500 (EST) Message-ID: <20021105173204.21263.qmail@web10904.mail.yahoo.com> Received: from [128.84.223.189] by web10904.mail.yahoo.com via HTTP; Tue, 05 Nov 2002 09:32:04 PST Date: Tue, 5 Nov 2002 09:32:04 -0800 (PST) From: Piyoosh Jalan Subject: 615 PAPER 62 To: egs@CS.Cornell.EDU Cc: pj39@cornell.edu MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Wide-area cooperative storage with CFS Properties that differentiate file sharing based on PAST and CFS from Freenet. The main difference in the file sharing in Freenet that from PAST and CFS are - Anomymity: Freenet focusses on providing location-independent file system and hence provides anonymity for both the producers and the consumers. PAST and CFS files are location dependent. - Reliability: More over files in the CFS/PAST once inserted are guaranteed to be available but in Freenet does not provide guarantee of file lifetime since files in a Freenet node may be evicted due to its use of LRU policy and limited storage at each node. CFS and PAST would gives a faster lookup than that of Freenet. - Security: CFS can subvert DOS attacks by a malicious user who tries to insert a large amount of junk data into the sytem as it limits the amount fo data that can be inserted by a particular ip address. No such mechanism is present in Freenet. Some differences between PAST and CFS being - Fragmentation: CFS fragments the files into blocks and distributed and replicated in the overlay network of CFS nodes. This provides fault tolerance. PAST too provides fault tolerance by replicating the file among k nodes with closes matching nodeId and fielId. - if the space in one of the nodes in not enough to store a file (like in PAST) still the files could be stored after it is fragemented into blocks and distributed among nodes. PAST solves this probelm by file diversion and replica diversion. If the storage at particular node is full than it stores the file in one of its leaf set. In case if all the leaf sets of a node are full than it uses file diversion by generating a different fildId using a different salt and hence diverting the file to a different part of the nodeId space. - CFS permits parallel block retrievals which might provide substantailly less time for retrieval of large files as compared to PAST. Can these systems be used to share files securely in an environment with censors ? PAST security model employs the deployment of smarcards for each node and user of the system. Each smart card has a public/private key pair and the public key is signed by the smart card issuers private key. Thus a malicious user cannot take control of the system and hence maintaing the integrity of the files stored. Use of File certificates would allow storage nodes and clients to verify the integrity and authenticity of stored content thus subverting man in the middle attack who changes the content of data being transmitted. CFS too uses public keys to authenticate data. CFS thus implements read only file systems where files can be modified by only the owner by completely replacing the old file with the new man. Thus CFS too subverts man in the middle attack. What properties are missing in CFS/PAST that are traditionally found in filesystems ? The properties in CFS/PAST that are missing as compared to traditional file systems are - Access Control: CFS/PAST have no mechanism of access control based on different type of users or their groups. Though PAST and CFS have some security mechanism for integrity of client and files. PAST proposes use of smart cards for each client which has a public/private key pair. PAST also uses reclaim requests/receipt to verify the files legitimate owner. - Delete/Rename: Files once stored in the system cannot be renamed. If the file is updated the new file replaces the old file. CFS does not support explicit delete. Publisher who wants the files to be persistent must periodically refresh the blocks. - Persistance: Files in CFS are not persistant as it guarantees availability of data for an agreed interval of time only. PAST offers persistant storage system and files stored in PAST are immutable. PAST offers Recalaim operation which reclaims the storage occupied by the k copies of file, but it does not guarantee that file is no longer available. On the other hand traditional file systems offer persistent storage of data. - Searching: Content based searching is not possible in CFS/PAST as found in tradional file systems such as Unix grep utility. What additional mechanisms are necessary ? Some additonal mechanisms necessary or possible are - A key word search could be added to CFS/PAST with deployment of a central search engine or some other mechanism. - More security features to subvert - Both the file systems are distributed and has no central node like napster. It may be useful to add a centralized node, as different nodes have different storage capacity and for efficient load distribution this has to be taken into account. CFS subverts this issue by the use of multiple logical nodes associated with each physical node. PAST too uess this technique if the storage size exceeds the minimum by two order of magnitude. __________________________________________________ Do you Yahoo!? HotJobs - Search new jobs daily now http://hotjobs.yahoo.com/ From nbs24@cornell.edu Wed Nov 6 23:02:53 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA742qc29273 for ; Wed, 6 Nov 2002 23:02:52 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id XAA15321; Wed, 6 Nov 2002 23:02:51 -0500 (EST) Date: Wed, 6 Nov 2002 23:02:51 -0500 (EST) From: nbs24@cornell.edu Message-Id: <200211070402.XAA15321@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: nbs24@cornell.edu Reply-To: nbs24@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: nbs24@cornell.edu X-Originating-IP: 64.185.145.94 Subject: 615 PAPER 62 PAST is a large-scale peer-to-peer storage utility based on a self-organizing overlay of network of storage nodes that cooperatively route file queries, store files and cache popular files. It is intended as an archival storage and content distribution utility and does not provide facilities for searching, directory lookup or key distribution. It is not intended as a general-purpose filesystem. Each node is assigned a 128-bit id and each file is assigned a 160-bit fileId. Its goals are strong persistence, high availability, scalability, security. Popular files are cached along the request path. CFS is a also decentralized peer-to-peer storage utility. Its goals are fast lookup, load balance, and the ability of servers to join and leave the system frequently without affecting its robustness or efficiency. CFS uses the Chord lookup algorithm which does not tolerate malicious participants. Blocks of files are replicated on successor nodes and cached along the request path. Differences between PAST/CFS and FreeNet: FreeNet provides anonymity but this is not addressed by PAST/CFS. Differences between PAST and CFS: PAST uses the Pastry routing protocol while CFS uses Chord. PAST stores files in whole entities while CFS breaks them up into blocks for better load balancing. The downside is that CFS takes longer to retrieve chopped- up files. PAST uses replica diversions (storing a file at a node in the leaf set) and file diversions (generating different fields on a ‘file insert’ failure) in an attempt to balance loads. Difference from traditional filesystems: PAST/CFS do not provide persistence and facilities for searching and directory lookup. PAST/CFS use ids for files instead of explicit names. Nana B. Sam From shafat@CS.Cornell.EDU Thu Nov 7 00:22:14 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA75MEc15963 for ; Thu, 7 Nov 2002 00:22:14 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 62 Date: Thu, 7 Nov 2002 00:22:14 -0500 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D134FCF@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 62 Thread-Index: AcKGHaF0wE1n5G5gTxqYvojLdk+Ydg== From: "Syed Shafat Zaman" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id gA75MEc15963 The main difference between the file sharing based on PAST and CFS, and that of Freenet is the anonymous information storage and retrieval system that Freenet primarily addresses. It takes an anti-censorship approach that makes it difficult for a node to determine even its own contents. In contrast, PAST and CFS mainly focus on maintaining persistence and strong reliability expected of an archival storage system. They do not attempt to provide any kind of anonymity in the system. Freenet also does not guarantee the availability of unpopular files which are discarded if not looked upon frequently. This is in direct opposition to PAST or CFS' reliability goals. In fact, they employ mechanisms to replicate files to make the system more fault tolerant. It would be challenging to introduce anonymous features into both PAST and CFS given their present architecture. In PAST, for example, all the routing tables maintain a key to IP mapping which reveals the exact location of a particular object, making PAST a very non-anonymous system. Similar is the case in CFS as well. One way of implementing a secure environment to avoid censorship would be to take up Freenet's approach of continually moving files around the network. As mentioned earlier, this would however require large modifications in the PAST or CFS architecture which tries to store an object with a key closest to the key of a node. When compared to traditional filesystems, we obeserve that the ability to control access to files is missing from both these P2P systems. Also, since fileIds are unique, a file cannot be insterted into the network multiple times. In PAST, it is not possible to fully guarantee that a file would be deleted from the network if its original owner places such a request. From hs247@cornell.edu Thu Nov 7 00:34:01 2002 Received: from mailout5-0.nyroc.rr.com (mailout5-0.nyroc.rr.com [24.92.226.122]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA75Y0c18246 for ; Thu, 7 Nov 2002 00:34:00 -0500 (EST) Received: from hubby.cornell.edu (syr-24-58-42-130.twcny.rr.com [24.58.42.130]) by mailout5-0.nyroc.rr.com (8.11.6/RoadRunner 1.20) with ESMTP id gA75XwF14151 for ; Thu, 7 Nov 2002 00:33:58 -0500 (EST) Message-Id: <5.1.0.14.2.20021107003343.00b21250@postoffice2.mail.cornell.edu> X-Sender: hs247@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 07 Nov 2002 00:34:01 -0500 To: egs@CS.Cornell.EDU From: Hubert Sun Subject: 615 Paper 62 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed The contributions of these two papers are the development of two distributed file systems: PAST and CFS. As we saw last week, the two distributed hash table schemes, Chord and Pastry are very similar and because CFS and PAST are respectively build on top of Chord and Pastry, they too are very similar. Both provide the basic operations: Insert, Lookup and Update. How they differ is that CFS distributes data to nodes at a block level granularity where as PAST is at the granularity of files. The advantage of block level distribution is that (with prefetching), CFS is optimized for large files that are accessed regularity. Both file systems also perform replication: CFS replicates blocks to it's k successors, while PAST replicates to k "nearest" neighbours. Both systems use their replicas as cache and also cache data near the owner node and along a look up path. These two systems differ from Freenet in that they are built on structured networks. Freenet has no guarantees that if a file exists, that it will find it. Because of the structures of Chord and Pastry, CFS and Past can guarantee a file to be found in logarithmic time given that there aren't k failures at the same time. In my opinion, these systems can not be used in censor networks. In general, both schemes are based on non-geographical closeness schemes. Nodes numerically close do not mean they are geographically close. In fact, both of them preach this property as being good as two numerically close together are highly unlikely to fail at the same time. However, in a censor network, locality is key because communication is expensive. Therefore these two systems would not be fit for a censor network. Also, both these systems use signatures and public/private key cryptography to ensure data integrity. As shown in the security papers, censor networks do not have this sort of power to perform these expensive security measures. Modifications to the security and underlying hashing scheme need to be done in order for these file systems to be adapted to censor networks. In CFS, deletions are not explicit. Files that are not used simply are removed by servers. In PAST, files can be reclaimed, but that does not guarantee the file cannot be found after that. Therefore, deletion is a major property that differentiates these two systems from regular file systems. In order to implement delete, much more communication and synchronization has to be done, but this is not what Pastry and Chord were built upon. Perhaps a searching mechanism (which is absent in both systems) would help in deletion/updates of files. From mp98@cornell.edu Thu Nov 7 03:16:37 2002 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA78Gbc18982 for ; Thu, 7 Nov 2002 03:16:37 -0500 (EST) Received: from cornell.edu (r109493.resnet.cornell.edu [128.253.240.252]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id DAA09733 for ; Thu, 7 Nov 2002 03:16:36 -0500 (EST) From: mp98@cornell.edu Date: Thu, 7 Nov 2002 03:16:27 -0500 Mime-Version: 1.0 (Apple Message framework v546) Content-Type: text/plain; charset=US-ASCII; format=flowed Subject: 615 Paper 62 To: egs@CS.Cornell.EDU Content-Transfer-Encoding: 7bit Message-Id: <36AEB54A-F229-11D6-ABC6-003065EE5F0A@cornell.edu> X-Mailer: Apple Mail (2.546) PAST and CFS obviously are differentiated from Freenet in that their substrates are structured hash tables with the motivation of achieving more efficient routing. In fact, if a search for a key does not terminate in PAST or CFS, there is reason to believe the file does not exist in the system at all. Replication in PAST and CFS is determined at initial storage time and hopefully maintained throughout the life of the file. Freenet, on the other hand, solely uses caching to achieve replication of its contents. The ability of censors to destroy a network relies either on their ability to 1) attack central, important nodes or 2) enforce a file black list. Freenet resists censors by having no centralized structure and a complicated key generation scheme. PAST and CFS are decentralized, but in order to fully resist censors, they must change their key generation schemes from the ones specified. The RIAA must not be able to determine the key for Madonna's LikeAPrayer.mp3. If they are able to determine such a key, then they can 1) attack the servers responsible for maintaining it [effectively neutering Pastry and Chord's decentralized nature] or 2) legally requiring nodes not to carry that key. Since CFS as specified uses content hashes, it is vulnerable. PAST is slightly more secure in this regard, since it uses both salts and the publisher's key, but given well-known publishers, one could do a search for a particular file. There is no real reason, however, that keys have to be generated in this manner. Another CFS-particular vulnerability to censors is that it replicates entire file systems. If a censor somehow determines that a file in one file system is black listed, it could black list the entire file system. Obviously, in order to keep censors from black listing a particular publisher, some aspects of the publication protocols must be altered (i.e. PAST wants all publication to be signed. This is unacceptable for anonymity). PAST is not really a file system protocol. It is more similar to FTP. So most file system operations (listing directories, moving files, etc.) are meaningless in its context. CFS does simulate a full file system, and with proper finagling one can imagine implementing complicated file system operations on top of it. Even hard and soft links could be implemented since CFS stores not only files, but file system structures (such as directories) as well. One could imagine even grep running (albet slowly) over CFS. If there is one feature missing in CFS, it is file permissions. It is awkward to rely on CFS itself to impose them, and there is no metaphor for an operating system service to authenticate users. If CFS wants to achieve this property, it must using a more sophisticated access model. From ks238@cornell.edu Thu Nov 7 09:37:12 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7EbCc15815 for ; Thu, 7 Nov 2002 09:37:12 -0500 (EST) Received: from ks238.cornell.edu (syr-24-24-18-11.twcny.rr.com [24.24.18.11]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id JAA18236 for ; Thu, 7 Nov 2002 09:37:10 -0500 (EST) Message-Id: <5.1.0.14.2.20021107093613.01b55380@postoffice2.mail.cornell.edu> X-Sender: ks238@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 07 Nov 2002 09:37:02 -0500 To: egs@CS.Cornell.EDU From: Karan Suri Subject: 615 Paper #62 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed This paper describes PAST, a storage utility system which is based on an Internet based overlay network of storage nodes. The primary contribution of PAST is the idea of using route caching effectively as a method to decrease the bandwidth and memory overhead associated with searching in P2P network. The primary issue is that currently, each request in a decentralized P2P network will query those nodes that contain some kind of file (this will be conducted by PASTRY). The problem arises, when some files, which are frequently queried are geographically distant from the starting node. The problem is that it is inefficient to continuously query the distant node. Hence, the paper proposes using a caching system whereby replicas of files are stored at those nodes that have a nodeID which is similar to the fileID being requested. Since, there may be some very popular nodes that are queried more often, the paper advocates another mechanism by which additional caching of nodes is done to reduce the number of redundant requests. With such caching systems, the amount of overhead required to fetch certain files is reduced. So, the end of goal of trying to reduce fetch distances and trying to implement a load balancing mechanism for popular files that are frequently queried. The paper talks about using storing replicas of the files on those nodes which have nodeIDs that correspond closely to the fieIDs and in addition using mechanisms by which nodes that are traversed each time to serve certain clusters can store popular files. Clearly, the interesting or difficult thing to assess is at what point each file will be cached. Frequency with which data is requested can obviously be in constant flux as well as the "cluster" which is frequently requesting it. These all make the node(s) that is caching a certain file constantly changing. I they should address how their algorithms adapt to these fluctuations. The next paper presents CFS, a read only P2P file system, which is used primarily in a decentralized architecture. One of the primary contributions of the paper is the notion of using hash blocks as a method of distributing anonymous storage throughout the network. Each client in the network has three layers: a file system client, a Dhash storage layer (used to store and retrieve identified blocks), and a Chord lookup layer (used to locate blocks). The Chord layer and the Dhash layer interact in order to locate and retrieve the necessary blocks. The Chord layer employs consistent hashing in order to help identify these blocks. The Dhash layer essentially manages these blocks and is integral to the distribution of the blocks over the respective servers. In addition, for popular data, the Dhash layer will also cache certain frequently queried blocks (closer in proximity) in order to reduce the amount of overhead associated with querying the block form a more distant node. Also, another benefit of the block storage design is the load balancing achieved in the network since the ids are uniformly distributed. From tmroeder@CS.Cornell.EDU Thu Nov 7 09:54:41 2002 Received: from dhcp98-88.cs.cornell.edu (dhcp98-88.cs.cornell.edu [128.84.98.88]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7Esfc19872 for ; Thu, 7 Nov 2002 09:54:41 -0500 (EST) Received: (from tmroeder@localhost) by dhcp98-88.cs.cornell.edu (8.11.6/8.11.6) id gA7Eqq508845; Thu, 7 Nov 2002 09:52:52 -0500 From: Thomas Roeder MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15818.32324.575735.434595@dhcp98-88.cs.cornell.edu> Date: Thu, 7 Nov 2002 09:52:52 -0500 To: Emin Gun Sirer Subject: 615 PAPER #62 X-Mailer: VM 7.07 under Emacs 21.2.1 Today's papers covered the PAST and CFS systems, both of which were built on top of systems that we have already studied: PAST on Pastry, and CFS on Chord. Both systems are distinguished from Freenet by the fact that lookup of a file is guaranteed to succeed if the file actually exists in the system, unlike in Freenet where the anonymity constraints require us to take our chances and just search. CFS is much closer to being a true filesystem than is PAST, since it supports functions such as 'ls', whereas PAST only knows about file identifiers (this could be rectified in PAST without too much difficulty), and not the block structures of directories. One of the most valuable mechanisms which could be added to either PAST or CFS would be a search facility, either centralized or distributed. CFS tries to be a block filesystem distributed across peers on the Internet. It adds two layers on top of the already existing Chord layer (which it uses, once it has the name of a block, to find the block on the 'Net): the DHash layer, which performs replication and distributes blocks among the servers, and the FS layer, which makes all of this appear to the client to be NFS. They claim to get approximately the same latencies as FTP does in getting the same files, which suggests that their lookups are relatively efficient (as we noted in dealing with Chord itself). The CFS authors themselves note that CFS cannot deal with censors, which could make nodes conspire to give a different, internally consistent, view of the filesystem in which the offending file does not exist. Indeed, the authors state that they are explicitly making the assumption at the Chord level that there are no malicious nodes, which also means no Byzantine failures are allowed either. These are not useful assumptions for applications meant to run over the Internet, and particularly those which purport to share files. PAST is a system much closer to Freenet in general tenor: it associates files with a quasi-unique key and replicates in the system. It makes them available to queries by clients by routing through the Pastry layer, which we have discussed before. As for the security of the system, the authors make the assumption that an attacker cannot control the behavior of the smartcards used to authenticate with the system. The entire ability of PAST to avoid censorship revolves around the inability of an attacker to choose its nodeId. In any case where this is not true, attackers can use up all of the space around a given key and censor that material. The claim by the authors that entities which control large blocks of IP space are generally benevolent is naive at best given attempts by large corporations to stifle file sharing using all the resources at their disposal. From kwalsh@CS.Cornell.EDU Thu Nov 7 10:56:23 2002 Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7FuMc03854 for ; Thu, 7 Nov 2002 10:56:22 -0500 (EST) Received: from localhost (moe.cs.duke.edu [152.3.140.74]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id KAA24204 for ; Thu, 7 Nov 2002 10:56:22 -0500 (EST) From: kwalsh@CS.Cornell.EDU Received: from 132.236.225.147 ( [132.236.225.147]) as user walsh@imap.cs.duke.edu by login.cs.duke.edu with HTTP; Thu, 7 Nov 2002 10:56:22 -0500 Message-ID: <1036684582.3dca8d2619e6f@login.cs.duke.edu> Date: Thu, 7 Nov 2002 10:56:22 -0500 To: egs@CS.Cornell.EDU Subject: 615 PAPER 62 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.0 X-Originating-IP: 132.236.225.147 What properties differentiate file sharing based on PAST and CFS from Freenet ? Can these systems be used to share files securely in an environment with censors ? What properties are missing in CFS/PAST that are traditionally found in filesystems ? What additional mechanisms are necessary ? Storage management and caching in PAST, etc. Wide-area cooperative storage with CFS. These systems differ from freenet primarily in their motivation. Freenet was designed from the ground up to privide anonymity and robustness properties (both for file insertion and file retrieval). This constrained the routing mechanism to less than optimal strategies, and also eliminated the possibilites of storage quotas, persistence, etc. (or, at least made these features quite difficult). Both PAST and CFS attempt to provide persistent storage: PAST in the style of an archival backup system, and CFS as a semi-persistent read/write cache. Both are built on top of distributed hash tables (Pastry and Chord, resp.). PAST has some interesting features that are not as much emphasized in other systems. Like CFS, PAST aims to provide load-balancing (both for lookups and in terms of storage space. This is important when one considers the diversity of nodes, and the distribution of file sizes and search patterns. Further, PAST attempts to achieve >95% storage utilization. This work seems misguided, since storage is cheap compared to almost all other constraints (bandwidth, ram, cpu, etc.). Also, PAST provides a quota system based on centralized authority (smart cards). This is a nice feature, but has its obvious drawbacks. Additionally, there seems to be very little incentive for individuals to contribute storage to PAST, since, in return for X bytes of contributed storage the user is promised only X/k bytes of storage in the PAST system. This seems like a bad deal for individual users. Worse, this seems to violate the exact property that drives peer-to-peer sharing. Many users do not use their full share of storage (there is lots of empty disk space on the Internet). The peer- to-peer ideal is that users who have spare storage can contribute this for the benefit of everyone. PAST is also missing most traditional features of file systems: mutable files (write, update, etc.), locking, etc. Another big pitfall is requiring a new participant to download (albeit in the background) a large amount of already- replicated files. Consider a PAST system at 50% capacity on average. A new 56k modem user with 1GB shared space would be immediately asked to download 500MB of storage. CFS differs from PAST in two main areas: Updates are allowed, and files are split into blocks for storage. The first feature allows a directory structure to be built (directory nodes, inodes, etc.). The second achieves some level of load-balancing automatically, and allows parallel downloads. One significant problem is the neglect of quotas or fairness. The issue is mentioned, but the approaches suggested seem overly simplistic and unrealistic. From vrg3@cornell.edu Thu Nov 7 11:36:52 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7Gaqc13550 for ; Thu, 7 Nov 2002 11:36:52 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id LAA09734 for ; Thu, 7 Nov 2002 11:36:50 -0500 (EST) Date: Thu, 7 Nov 2002 11:36:49 -0500 (EST) From: vrg3@cornell.edu X-Sender: vrg3@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII PAST is a system meant to provide persistent storage using only a distributed peer-to-peer network of cooperating nodes. PAST uses Pastry to perform the routing, and stores files in their entirety on storage nodes. These files are duplicated to maintain robustness in the face of node failure. Files in PAST can be inserted into the file system, where they remain indefinitely until a reclaim operation is performed, after which their presence is no longer guaranteed. Since different files can be of different size, PAST uses diversion methods to attempt to balance load. CFS is similar to PAST; it is a peer-to-peer data storage system. CFS uses Chord to perform the underlying routing, and presents itself to the client as a filesystem. Unlike PAST, CFS stores files by splitting them up into blocks and spreading those blocks over many nodes, to provide a more natural load balancing. Both CFS and PAST differ from Freenet very obviously in the fact that they do not attempt to provide anonymity. CFS does offer some resilience against tampering that Freenet does not, in its regulation of how much data a publisher may insert. They also both try to guarantee that, if a file is present in the system, a client will be able to access it, while Freenet makes no such guarantee. PAST and CFS both provide a measure of security against censorship. While it is possible for a someone to modify an existing file, it is not possible for that person to replace the existing copy with it, since files are read-only once they are published. It is impractical for an attacker to forge a reclaim certificate in PAST, and in CFS it is impossible to ask that a file be deleted (you can only ask that it *not* be deleted). Moreover, file publishers sign their content. Features that would be needed before a system like CFS or PAST could actually replace a traditional filesystem include searching or directory listing, greater support for metadata, ownership and access control with some kind of globally semi-consistent user table, and file deletion or moving. From jsy6@postoffice2.mail.cornell.edu Thu Nov 7 11:43:28 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7GhSc14950 for ; Thu, 7 Nov 2002 11:43:28 -0500 (EST) Received: from Janet.cornell.edu (syr-24-58-41-193.twcny.rr.com [24.58.41.193]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id LAA11678 for ; Thu, 7 Nov 2002 11:43:25 -0500 (EST) Message-Id: <5.1.0.14.2.20021107114234.00b93468@postoffice2.mail.cornell.edu> X-Sender: jsy6@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 07 Nov 2002 11:43:16 -0500 To: egs@CS.Cornell.EDU From: Janet Suzie Yoon Subject: 615 PAPER 62 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Past is a storage system that is built on top of the Pastry routing protocol. Past supports the following three operations: Insert, Lookup, and Reclaim. The insert operation, given a fileID, stores the corresponding file on the k nodes whose nodeID is most similar to the fileID. The insert operation is integrated with Past's storage management. When a node receives an insert request, it checks to see if it has the storage capability to store the file. If it does, it forwards the request to the other k-1 closest nodes. If it cannot store the file, then it performs replica diversion. The node looks in its leaf set for some node n that is not one of the k-1 nodes and does not already contain a replica of the file. If the issued request is successful, then the node keeps a pointer to n in its table. Else, a negative acknowledgement request is sent to the client node and file diversion is performed. In file diversion, the client node created a new file ID by changing the salt and retries the insert operation. This storage management scheme increases availability of a file and provides storage load balancing. Caching is utilized in Past for minimizing access latency, maximizing query throughput, and for query load balancing. The caching protocol is similar to that of Freenet in that files are cached along the path of a lookup request. Unlike Freenet, the caching protocol used is the Greedy Dual-Size eviction policy which assigns a weight to each cached file and not LRU. CFS is a read-only file storage system that is implemented on three levels. CFS performs load balancing by breaking up a file into blocks. The location protocol, Chord, is used to map blocks onto servers. Dhash (distributed hashing) performs block fetches, block distribution, and maintains the cached and replicated copies of the blocks. The third level, FS, interprets the blocks as files. CFS is better suited than Past for handling large files. Similar to Past, CFS stores each block on k different servers. Cache is also performed along the path of lookup requests. The eviction policy used is LRU. In the paper, the idea of virtual servers is introduced as a means of aiding load balancing. A server can be represented a set of virtual servers. The number of virtual servers is dependent on the server storage capacity and the overall network capacity. Virtual servers that map to the same physical server has access to each others tables. Past and CFS differs from Freenet in that they do not handle the issue of anonymity. Unlike Freenet, both schemes gaurentees persistance of files by differentiating between replicas of files and caches of files. A file will only disapear in the cache. Also, Freenet does not handle load balancing. Unlike Freenet, Past and CFS gaurentess the retrieval of an existing file since they are built ontop of Pastry and Chord respectively. Past and CFS both are missing a way to efficiently search for files based on a given term, a mechanism traditionally found in file-sharing systems. One way would be enforcing a rigid way of describing a file and using this description to produce the file hash key. Another would be to have a vector of terms associated with each stored file. Both file-sharing systems did not focus on security. Past briefly explained a way to maintain the integrity of messages due to smartcards. This protects from a malicious node changing a message, but does not protect against a node dropping the message. From mvp9@cornell.edu Thu Nov 7 11:55:03 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7Gt3c18074 for ; Thu, 7 Nov 2002 11:55:03 -0500 (EST) Received: from zoopark.cornell.edu (syr-24-58-46-186.twcny.rr.com [24.58.46.186]) by cornell.edu (8.9.3/8.9.3) with ESMTP id LAB18808 for ; Thu, 7 Nov 2002 11:55:03 -0500 (EST) Message-Id: <5.1.0.14.2.20021107115222.00ad0240@postoffice.mail.cornell.edu> X-Sender: mvp9@postoffice.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Thu, 07 Nov 2002 11:55:01 -0500 To: egs@CS.Cornell.EDU From: mike polyakov Subject: 615 PAPER 62 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed These papers present 2 peer2peer distributed filesystems PAST and CFS that offer persistent storage, high availability, scalability, efficiency and security. Both have similar key to server id mapping schemes that give them access times and storage on the order of O(log N). Both, also place high priority on load balancing. PAST is built on Pastry and thus inherits its routing properties. It's contribution is in the managing the insertions, replicas, caching and requests. In routing, there are also some nice properties such as the non-deterministic choice of routes, which can circumvent malicious nodes. One of its main interesting mechanisms is the replications of a file to k nearby nodes upon insertions and the diversion of replicas and files. To improve better behavior, nodes offer some of their space for caching copies that come through them. Load balancing is achieved by the way node and file Id's are generated and by the diversion mechanism. Their experiments show that even at high utilization, performance remains stable. There are some issues with PAST. First, who performs node assignment? If it is the nodes themselves, what prevents them from crowding around particular fileId's, unless there is a specific scheme not described here? Also, it's not clear that the invariant of file location would be preserved in the network their experiment was performed on a single computer and latency in the internet may break it. CFS the cooperative file system is based on Chord and is much closer to an actual file system. To come closer to the goal of full load balancing, files are split into blocks which can be stored on different nodes. On the other side, to accommodate large servers, CFS fully integrates use of virtual servers separate nodes that run on the same computer and can use each other's routing tables. Unlike PAST, CFS allows updating of files (or individual blocks) at least by its publisher. File system semantics are to be layered on top of CFS, since it only provides insertion and retrieval of files and management of the distributed file system itself through Chord (for maintaining routing information) and "Dhash" for fetching and distributing blocks, their replicas and cache copies. Because files are split among nodes it would seem likely that retrieval could be significantly [slowed down], but their experiment with internationally placed nodes show the speeds are similar to ftp retrieval. However, it is still hard to believe that a large file with several or more blocks on different nodes (which can be geographically very distant) would have such speeds, even with prefetching. One of the benefits of the block approach over PAST is that arbitrarily large files can be accommodated, while PAST would just pass them back to the application. The main difference between these systems and freenet is the speed and the guaranteed success of retrieval and insertion -- Log N instead of virtually unbounded; similar observation applies to routing table size. The relationship between file Id's and node Id's are enforced in both of these systems, while in freenet they are expected to come about lazily. Another difference is that both PAST and CFS guarantee file persistence (at least for some time) while freenet only keeps files while they are used. There is also an effort to enforce load balancing, especially in CFS, whereas freenet only depends on the randomness of keys (and the PAST paper shows this is not enough). At the same time these systems are not as secure or anonymous as freenet while the content itself may be encrypted and the mapping of files to keys is fairly obscure, there is no effort to mask the server or source of a request. Still, because file keys are hashes of file names with public keys and salt, finding a particular file by name is not easy, but once its key is known, it is no difficult matter to find who has it and force them to remove it. In terms of file systems, CFS and PAST are quite disparate. In fact, PAST is more like file sharing there are no updates or forced deletions, access is granted to every one, files are atomic. CFS is closer to a file system: files are split into blocks, updates are allowed to owners of files. However, both are missing the standard delete operation, even basic access controls (read access is public in both systems), and any directory structure (if one was layered on top, even something like listing contents could be quite slow). While CFS invites further semantics layered on top, it provides no outright mechanisms for them. Thus to bring these systems closer to the standard file system format, one would need to add much more complex key(user) management, synchronization mechanisms, additional linking structure, etc. From adam@graphics.cornell.edu Thu Nov 7 12:12:19 2002 Received: from bach.graphics.cornell.edu (bach.graphics.cornell.edu [128.84.247.50]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7HCIc22142 for ; Thu, 7 Nov 2002 12:12:19 -0500 (EST) Received: from envy.graphics.cornell.edu (envy.graphics.cornell.edu [128.84.247.206]) by bach.graphics.cornell.edu (8.12.1/8.12.1) with ESMTP id gA7HCD0k019539 for ; Thu, 7 Nov 2002 12:12:13 -0500 (EST) Date: Thu, 7 Nov 2002 12:11:44 -0500 (EST) From: Adam Kravetz To: egs@CS.Cornell.EDU Subject: 615 paper 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII PAST: Another p2p system, though it is based on top of PASTRY, so that PASTRY does the routing and hashing to the files. PAST uses a good replication and diversion schemes to ensure that files are always available to a user and would be hard to censor. First a file is passed to k nodes as a replica of a file received (these all lie in the leaf set). As long as the file goes unaltered it would be hard to censor this type of storage since it would be on multiple servers. This replication style involve receipts, so if the receipts weren't verified by the sender, then it could simply try and place the file someplace else, using a different salt. The file diversion technique helps to load balance and ensure that overloaded servers don't go down completely, but in fact get to pass off the work to someplace else. This however could result in the file getting passed continously (if no one is willing to give the file a home). PAST is unlike Freenet in that it has set paths and isnt' prone to fads as Freenet is. PAST using PASTRY has a limited hop distance which would be better than the could be anywhere Freenet system. CFS: This is a wide-area (single file split over multiple computers) storage system which sits on top of chord (determines positions of the blocks) to build a large scale distributed file system. CFS works best in low-latency high-speed networks in which the servers can respond swiftly to RPCs from requesting machines. This system is read-only so decentralized control can be accomplished and owners of the files and owners of the server need not share any administrative responsibities. CFS scales logarithmically wrt the number of servers that there are in the network, (messages passed grows logarithmically). CFS uses replication to allow the fiels to always be available (hopefully). CFS balances load by using small block sizes and "virtual servers" so that if one computer has many resources to give it can have a large number of virtual servers, but doesn't hurt computers which have extremely ppopluar files stored on them (they can have less virtual servers, things can be cached at other places). CFS keeps files for a minimum time that is agreed-on. CFS has a quota system, keeping file insertion "fair". CFS is fairly different from freenet in that it has no anonymity, it has better load balancing (there it isn't prone to the fad problem). It uses Chord for routing and therefore has a distinctive and small (at least than Freenet) path to files. Enivironment censors could easily have many "virtual servers" on a high power machine and affect blocks of files, this is bad. CFS has block replication however so if an adversary only held a few blocks CFS might be able to recover from this sort of censoring. CFS wouldn't be particularly good at working against censors. From ashieh@CS.Cornell.EDU Thu Nov 7 12:20:54 2002 Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7HKsc24067 for ; Thu, 7 Nov 2002 12:20:54 -0500 (EST) Received: from localhost (ashieh@localhost) by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id gA7HKsh14528 for ; Thu, 7 Nov 2002 12:20:54 -0500 (EST) Date: Thu, 7 Nov 2002 12:20:54 -0500 (EST) From: Alan Shieh To: Subject: 615 PAPER 62 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Differences between CFS, PAST and Freenet: CFS and PAST do not provide any anonymity guarantees. Their caching scheme relies on the locality of late hops in their underlying P2P overlay topologies, instead of migrating data along the query path as in Freenet. PAST provides lifetime guarantees that the other two systems do not. Security in the presence of censors: PAST: PAST provides some resilience against censorship in rejecting data updates that collide with a previous fileID insertion. Also, insertion operations are protected with private keys. Hence, attacks would have to focus on capturing portions of the underlying network topology. CFS: If a censor is provided with access to the higher level metadata, then it is possible to cause large scale file system inconsistencies if portions of the keyspace are captured. CFS is also vulnerable to attacks on the underlying topology. Lack of anonymity in both systems allows censors to apply out-of-band and social pressures to clamp down on unauthorized data sharing and viewing. Missing properties of PAST and CFS (* denotes truly desirable properties): Both: - Read/write ACL (*) - In-place updates (*) and traditional Unix file system open/close semantics - Indexing (*) PAST: - Hierarchical namespace CFS: - Journalling and other forms of hardening (*) Summaries: PAST PAST contributes a distributed file sharing system layered on top of Pastry. Each file is stored in the system as a complete unit and replicated at the k nodes closest to the fileID, derived from the name, user ID, and a salt; the salt value is used to allow a client to attempt insertion at different parts of the Pastry key space. PAST guarantees that files will remain in the system until deleted; files are not directly writeable. PAST requires many mechanisms for achieving load balancing. Replicas may be diverted away from the k closest node to a key, using forwarding pointers. This diversion potentially increases the latency of lookup, and so large files are preferred for diversion over smaller files. Diversion is performed in advance of storage pressure to smooth out degradations in performances over different system load levels. Asymmetry between host storage capacity is handled by adapting the diversion threshold. PAST was shown to be reliable in terms of the accessibility of files in the presence of failures. However, the system has difficulty coping with large files, and any size of file when load increases beyond a certain point. Shortcomings The relocation of complete files causes external fragmentation, which complicates storage management. The smartcard authentication scheme for maintaining quotas and identity requires considerable out of band resources for management. Also, relying on the presence of the leaf set in PASTRY for replica placement breaks an abstraction layer; it would thus be difficult to deploy PAST on top of a different DHT. Unpublishing a file is tricky in PAST, since not all replicas are purged. Hence, it is not necessarily possible to perform in-place updates, and so the publisher may need to distribute a new hash for the updated data. Extensions - Add support for different forms of quota management. In a corporate or other organized environment, centralized management such as smartcards are feasible. However, in a less organized environment, like a group of enthusiasts on the web, a distributed system is preferable, possibly one that allocates quotas based on resource contribution. CFS CFS contributes a distributed file system layered on top of CHORD. CFS manages sharing on the block level, and hence there are few of the issues with external fragmentation in PAST. Extra latency is introduced with this extra layer of indirection; thie is alleviated with prefetch and parallel transfers. Unlike PAST, blocks are treated as soft state, and must be refreshed constantly to prevent the system from purging blocks. The file system stored in CFS is almost identical to the traditional UNIX block layout, except without write support. Each file system is writeable only for a particular user. Asymmetric host abilities are managed by presenting more powerful hosts as virtual nodes; this allows storage capacity to be presented at different parts of the keyspace. The CFS paper also introduces some modifications to CHORD for optimizing latency in the underlying network. Shortcomings The layered file system approach may be more fragile than the whole file approach used in PAST, especially if many files are stored in the same file system. In this case, if a higher level block were to disappear, then all the lower level files would become inaccessible. This implies that the publisher of the data must retain a local copy of all data and metadata stored in the network. Treating blocks as soft-state adds an extra point of failure for data loss. Perhaps adding a mechanism for sharing the burden of sending out refreshes would alleviate this problem. Extensions - The higher level metadata blocks have a high degree of indirection, with little parallelism until lower level blocks are available. Thus, it may be beneficial to use a different replication mechanism for the metadata (better spatial locality), and perhaps allow the remote hosts to perform interpretation of the metadata to reduce the latency of having to perform repeated requests to follow pointers. - Increase the level of replication for metadata to improve availbility. From sangeeth_c@hotmail.com Thu Nov 7 13:33:56 2002 Received: from hotmail.com (f33.law7.hotmail.com [216.33.237.33]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id gA7IXtc10536 for ; Thu, 7 Nov 2002 13:33:56 -0500 (EST) Received: from mail pickup service by hotmail.com with Microsoft SMTPSVC; Thu, 7 Nov 2002 10:33:49 -0800 Received: from 192.76.82.90 by lw7fd.law7.hotmail.msn.com with HTTP; Thu, 07 Nov 2002 18:33:49 GMT X-Originating-IP: [192.76.82.90] From: "sangeeth kumar" To: egs@CS.Cornell.EDU Subject: 615 PAPER 62 Date: Thu, 07 Nov 2002 18:33:49 +0000 Mime-Version: 1.0 Content-Type: text/plain; format=flowed Message-ID: X-OriginalArrivalTime: 07 Nov 2002 18:33:49.0857 (UTC) FILETIME=[3725E910:01C2868C] Submitted by Sangeeth Chandrakumar (sc329) PAST is an internet based peer-to-peer global storage utility, whcih aims to provide strong persistence, high availability, scalability and security. In past, storage nodes and files are each assigned iniformly distributed identifiers , and replicas of a file are stored at nodes whose identifier matches closely with the file's identifier. Past nodes uses pastry as the routing protocl underneath. Past is intended as a archival storage and content distribution utility and not as a general purpose filesystem. In past, every node is assigned a 128 bit identifier. A set of adjacent nodeIds are highly likely to be diverse in all aspects, thus forming excellent candidates for storing replicas. Since layered on top of pastry, look up is of the order of log N. In past, whenever a file is to be inserted, the fileId is computed using a secure hash function and is routed to the node which is responsible for that key. Once the node verifies the authenticity of the file, it forwards the replica to k-1 other nodes whith nodeIds closest to the fileId. On lookup, the first node which has the copy of the file replies to the request. Due to the locality properties of pastry routing scheme and the fact that replicas are stored in k-1 nodes, a lookup is likely to find a replica whihc is near the client, according to the proximity metric. Files are deleted from the system using a reclaim message. The paper also discuss about the load imbalance which coud occur in the system due to insuffucient storage space, in whihc case it employs either a replica diversion shceme or file diversion. If a node cannot accomodate a copy locally, it chooses another node in its leaf set that is not among the k closest nodes and requests it to store the copy on its behalf and enters a pointer to the second node in its table. Care is taken to ensure that if either of the nodes go down, appropriate action is taken to maintain consistency. When a file insertion fails becauese the k closest nodes cannot accomdate a file not divert the replicas locally within their leaf set, the client generates a new fileId and tries to perform a new insertion. Three such attempts would be made before giving up the operation. For better perfomance, Caching is also employed, where a file that is routed through a node as part of a lookup or insert is inserted into the local cache if it has suffiecnt storage left. A greedyDual-size replacement policy is shown to have the best performance for replacing chaceh entries. The simulation results show that PAST with replicas and file diversion has much better storage utilization. CFS is a peer-to-peer read only storage system that provides provable guarantees for the efficiencey, robustness and load balance of file storage and retrieval. A CFS system exists as a set of blocks distributed over the available CFS servers. The core of the CFS software consists of two layers, Dhash and Chord. The Dhash layer performs block fetches for the client, distributes the blocks among the servers and maintains cached and replicated copies. DHash uses the Chord distributed loopup system to locate the servers responsible for a block. DHash provides load balance for popular files by arranging to spread the blocks of each file over many servers. To balance the load imposed by small files, DHash cacheseach block at servers likely to be looked up in future. CFS uses a unix style file organisations structure. Every parent block contains the identity of its children. The publisher of a file signs the root block with his private key and insert the block into CFS using the corresponding public key as the root block's identifier. this approach guarantees that a client see a authentic and internally consistent view of each file system. CFS uses a per-publisher quota scheme to thwart malicious injection of large quantities of data. It also caches a block along the lookup path starting to the block's server. _________________________________________________________________ Add photos to your e-mail with MSN 8. Get 2 months FREE*. http://join.msn.com/?page=features/featuredemail