From wbell@CS.Cornell.EDU Mon Nov 26 14:46:55 2001 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAQJktR19086 for ; Mon, 26 Nov 2001 14:46:55 -0500 (EST) Received: from dhcp-190.rover.cornell.edu (dhcp-190.rover.cornell.edu [128.84.24.190]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id OAA15942 for ; Mon, 26 Nov 2001 14:46:54 -0500 (EST) Subject: 615 PAPER #63 From: Walter Bell To: egs@CS.Cornell.EDU Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.99.1+cvs.2001.11.07.16.47 (Preview Release) Date: 26 Nov 2001 14:46:29 -0500 Message-Id: <1006804012.2009.3.camel@brute> Mime-Version: 1.0 63) Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Wide-area cooperative storage with CFS These papers introduce Chord, a scalable key-value lookup service and CFS, a distributed read-only filesystem implemented on top of Chord. Chord aims for simplicity and does not provide many of the features of the other systems we've looked at. Chord strictly provides lookup for m bit keys, resolving them to hosts. It does not handle any storage or other facilities; it's their view that this should be left up to the application designers. This small set of primitives forces the question of if this primitive really gives the application writer a large enough gain to use such a system; as redundancy and other facilities have to be built by the application. Chord views all hosts as in a ring; hosts get a unique n-bit identifier (randomly assigned), which puts them in a total ordering on the ring. Lookup of an n-bit key merely finds the place on the ring where the value should be, and whatever node is immediately after it on the ring is who should be consulted to find the value. This provides for efficient lookup, as only log(N) hosts need to be consulted for a key lookup with some optimizations. But this comes at a price, as keys need to be migrated to satisfy this property when new hosts join. It's not clear to me that this was a good tradeoff. Since there is no replication facility, when hosts leave, data is lost-- CFS implements replication via clustering replicas around the position in the ring where the single copy would have been kept, which reduces the movement of values when nodes leave. Chord provides a simple lookup mechanism and good guarantees of usability of the system during periods of rapid host joining and leaving, which makes it attractive for widescale use. CFS is a read-only filesystem on top of Chord, where blocks are distributed throughout nodes and they are found via the lookup mechanism in Chord. This means that when CFS nodes join or leave, many filesystem blocks are moved between nodes, which can be very costly. Filesystem metadata is stored as just regular blocks in the system, which means that in order to retrieve a file, first the appropriate metadata blocks must be retrieve and assembled, and then the file blocks must be retrieve and assembled. It would seem that this points that Chord is the wrong facility for this, as a double indirect seems to be quite costly in a radically changing environment. The most amazing thing in all this to me was the fact that CFS after everything was implemented without using Chord's lookup mechanism because it didn't provide the appropriate level of usability, hence rendering Chord's usage to nothing more than routing state propagation. I thought this really pointed out that the primitives given by Chord weren't at the right level for applications, and even the lookup mechanism didn't provide the right facilities for application designers. I see these as the key factors that are traded off against efficiency, and any system that doesn't provide adequate primitives for applications is not worth the efficiency penalty in using it. From eyh5@ee.cornell.edu Mon Nov 26 19:43:02 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAR0h2R09276 for ; Mon, 26 Nov 2001 19:43:02 -0500 (EST) Received: from photon.ece.cornell.edu (photon.ece.cornell.edu [128.84.81.138]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fAR0eNM03033 for ; Mon, 26 Nov 2001 19:40:23 -0500 Date: Mon, 26 Nov 2001 19:40:27 -0500 (EST) From: Edward Hua X-X-Sender: To: Subject: 615 Paper 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Chord: A Scalable Peer-to-peer Lookup Service for Internet Appications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan This paper presents the Chord protocol that executes the function of mapping a key onto a node in a distributed, peer-to-peer network whose size and composition change intermittently. Chord may is proposed to meet the challenges facing any large-scaled peer-to-peer netowork, namely, load balance, decentralization, scalability, availability, and flexible naming. The higher-layer application interacts with Chord in two ways: Chord yields the IP address of the noe responsible for the key, and notifies the application in a node of changes in the set of keys that the node is responsible for. Because of Chord's characteristics, it may be used in such applications as distributed indexes, large-scale combinatorial search, and cooperative mirroring. The Chord protocol employs the consistent hashing function to acquire two m-bit identifiers for the node and the key, respectively. The identifiers are ordered in an identifier circle modulo 2^m, using which the keys may be assigned to the nodes with the successor node technique. Consistent hashing is designed to let nodes enter and leave the network with minimal disruption. Further more, it is proven in this paper that each node is responsible for at most (1+epsilon)K/N keys, where K is the total number keys and N the number of nodes in the network. Each node maintains routing information for only O((logN)^2) other nodes, and resolves all key lookups via O(logN) messages to other nodes. To make the mapping of keys to nodes more scalable, each node directed by the Chord protocol need only be aware of its successor node on the identifier circle. Chord generates a finger table, whose entries include the Chord identifiers and the IP addresses of the relevant nodes. This scheme allows each node on the circle to store information about a small number of nodes, and knows more about nodes closely following it on the identifier circle. Chord is designed to provide ease for the additions and withdrawals of nodes, and the core concept for achieving this is to find the new predecessor and successor nodes on the identifier circle after addition/withdrawal of nodes in the network. When multiple nodes join or fail, a stabilize routine is executed in the Chord in order to re-map keys to the nodes so that the integrity of the identifier circle is maintained. Chord also addresses the issues of node failures and replication of data. This is done by a successor list each node is equipped with. The entries in the list act as backup successors in case the current successor node fails. Also, the fact that a Chord node keeps track of its successors means that it can inform the higher layer software when successors come and go, and thus when the software should propagate new replicas. *************************************************************************** Wide-area Cooperative Storage with CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Soica In this paper, the authors propose a cooperative file system, CFS, that provides the fault tolerance, load balance, and the ability to harness idle storage and network resources in a peer-to-peer read-only storage system. A CFS file system exists as a set of blocks distributed over a number of CFS servers. It is comprised of two layers: Chord, which specializes in mapping file block identifiers to servers, and DHash, which provides load balance by spreading the large number of file blocks over diverse servers. According to the researchers, CFS is capable of achieving impressive download speeds, recovering server or link failures expediently, and maintaining a good load balance across the network. The Chord layer in CFS uses the consistent hashing function to discover the server that is responsible for a given key. This is done by having every CFS node associated with a unique m-bit node identifier, which exists in an identifier circle. The consistent hashing allows nodes to enter and leave network with minimal movement of keys, and therefore makes the network very adaptive to such changes. The DHash layer performs multiple functions, including data block replication, block caching, load balancing, quota, and file updates and deletion. These functions, built into the DHash layer, underscore the key CFS design philosophy: to split each file system and file into blocks and distribute these blocks over a number of servers. Of them, the load balancing is achieved by the introduction of virtual servers. Although this technique helps balance the load of serving files that are popular, this distribution of data blocks, which all belong to the same file, means the client will have to look all of them up to re-construct the original data, thus imposing a delay challenge on the overall performance of the network. It is clear why the researchers favor implementing the CFS in a wide-area network. Inherently, the decentralized nature of CFS allows the network to provide redundancy in both the servers and the data. This works best if a large number of nodes are involved in this coordinated effort. However, it is my impression that CFS does not address the security issue in a more concrete manner. It has a rather passive approach to guard against malicious alteration of published data. That is, it deletes that that has not been modified for an extended period time. In order for the server operator to preserve the published data, this approach requires the operator to constantly "touch" the data in order for the timestamp not to expire. The performance evaluation in this paper gives some additional insights to the capabilities of CFS. First, it can be seen that the having server selection in the CFS system greatly improves the download speed of files, and live experiment shows that the CFS speeds are on par with those of FTP servers. Second, the use of virtual servers has a significant impact in alleviating the load on the physical server. Third, a large fraction of CFS failures does not significantly affect the data availability or the network performance. This is because of the replication of data blocks, as done by the DHash layer in CFS. From c.tavoularis@utoronto.ca Mon Nov 26 19:52:40 2001 Received: from bureau6.utcc.utoronto.ca (bureau6.utcc.utoronto.ca [128.100.132.16]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAR0qeR10764 for ; Mon, 26 Nov 2001 19:52:40 -0500 (EST) Received: from webmail2.ns.utoronto.ca ([128.100.132.25] EHLO webmail2.ns.utoronto.ca ident: IDENT-NOT-QUERIED [port 61870]) by bureau6.utcc.utoronto.ca with ESMTP id <238716-13057>; Mon, 26 Nov 2001 19:52:33 -0500 Received: by webmail2.ns.utoronto.ca id <24411-11981>; Mon, 26 Nov 2001 19:52:19 -0500 To: egs@CS.Cornell.EDU Subject: 615 PAPER 63 Message-ID: <1006822334.3c02e3be2f82a@webmail.utoronto.ca> Date: Mon, 26 Nov 2001 19:52:14 -0500 (EST) From: c.tavoularis@utoronto.ca MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7BIT User-Agent: IMP/PHP IMAP webmail program 2.2.3 These two related papers discuss Chord, a peer-to-peer lookup protocol, and Cooperative File System (CFS) which is a storage system that provides distribute hash tables (DHash) for block storage and employs Chord to locate blocks on a distributed set of servers. They address: load balancing among server nodes storing blocks; full distribution with decentralization; scalability; availability, fast retrieval and failure recovery via replication and caching; and flexible naming. Chord efficiently maps keys to data on corresponding nodes and allows nodes to maintain a minimal amount of information regarding other nodes. Consistent hashing is employed to assign each node an identifier by hashing its IP address. An identifier is also assigned to each key (or data) by hashing the key. A unified set of the identifiers can be totally ordered and a key is assigned to its successor node in identifier number order. Nodes keep track of their successor nodes with pointers forming a circle structure. Queries are passed around the circle until a node has the required identifier and thus stores the target key and data. To simplify the process, each node maintains a finger table of a number of its successors and always forwards a query to the node in its table whose identifier most closely resembles the key identifier. Finger pointers at doubling distances around the circle average lookup complexity to O(logN) and make scalability possible. When a node joins Chord, it takes responsibility for the keys of its successor (data must be transferred) and affected nodes must update their finger tables. A new node becomes the ith finger if it can be shown that it is the predecessor of the existing ith finger. Similarly, when a node leaves, its keys are assigned to its successor. To accommodate these events, nodes maintain their predecessor, which is initialized when a node joins the system. Since simultaneous joins or node failures complicate the situation, periodic stabilization updates refresh pointers and verify finger table contents. Also, successor lists are maintained with replicated copies of keys to handle node failures. Thus, any problems cause by node failures or concurrent joins are transient. CFS consists of a layered structure at the clients requesting blocks of data, and the servers storing them. The DHash layer gets blocks for clients, distributes blocks over servers and maintains duplicate and cached copies at the servers. DHash uses the Chord layer to locate the server. Chord acts as a library with two main functions accessible by DHash: a lookup-key function which returns the IP address of the node responsible for the given key, and a notify function to tell applications of changes in keys. Chord has been refined to use a latency estimate to contact servers that are closer in the underlying network to reduce lookup latency. CFS is a read-only file system but publishers can insert blocks of content with hashing and a public/private pair. It is found that load balancing can only be achieved when virtual nodes are used particularly with heterogeneous servers. CFS and likewise Chord are lacking in location anonymity but trade this for a predictable retrieval time. Another tradeoff is load balancing versus lookup costs caused when files are broken up and distributed into blocks. It is found that CFS outperforms systems without blocks (e.g. Freenet) for the case of large popular files, but performs worse for large unpopular files. I think CFS and Chord need to address network partitioning, as it could severely deteriorate performance, and it is mentioned as future work. From andre@CS.Cornell.EDU Tue Nov 27 00:51:39 2001 Received: from postoffice.mail.cornell.edu (postoffice.mail.cornell.edu [132.236.56.7]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fAR5pdR28129; Tue, 27 Nov 2001 00:51:39 -0500 (EST) Received: from khaffy (d7b080.dialup.cornell.edu [128.253.157.80]) by postoffice.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id AAA26650; Tue, 27 Nov 2001 00:51:36 -0500 (EST) Received: from andre by khaffy with local (Exim 3.22 #1 (Debian)) id 168VZH-0000ZJ-00; Tue, 27 Nov 2001 00:53:43 +0100 Date: Tue, 27 Nov 2001 00:53:42 +0100 From: =?iso-8859-1?Q?Andr=E9?= Allavena To: =?iso-8859-1?B?R/xu?= Sirer Cc: andre@CS.Cornell.EDU Subject: 615 PAPER 63 Message-ID: <20011127005342.A2151@khaffy> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit User-Agent: Mutt/1.3.18i Sender: =?iso-8859-1?Q?Andr=E9_Allavena?= Chord, a scalable peer to peer lookup service Chord is a distributed lookup protocol. It maps keys to nodes (keys can be though of file names, the files beging kept with the key). The way it works: nodes are orgonized on a circle, the circle is sorted by increasing IDs of the nodes. Each node keeps a pointer to its sucessor. That would be really slow. So if the there are N nodes, it also keeps pointers to its 2^i successors i \in [0..ln N], as well a backward pointers to the nodes pointing to him (I'm not that sure about the backwards ones). The path to any node will be O(ln N). In fact the space of the keys is diveded into N parts (i, i+1, I+2, up to i+k), nodes take care of one or more conescutives part of the key space, and the pointers are to the closest node before the node supposed the handle the given key. A single join is perfectly well handled, the pointers are quite close to the correct one. There will be updates - O(ln˛ N) messages - and the net will converge. They assume it will also converge in the case of many leaves or joins, although there isn't any insight for a proof. There are not very "clean" about loop issues. They leave the other operations (redundancy, load balancing) to higher level protocols. Copying files to new nodes as soon as they join might not be a good idea (could generate a lot of traffic for nothing). What of a lazy scheme? Or trying to map new files to new nodes, what this scheme cannot do? What could be the external mechanism they leave out to learn the identity of a Chord node? It could be nice if this external mechanism sort of spread out the requests to more than a very few nodes. A broadcast sort of thing ? -- André Allavena (local) 154 A Valentine Place École Centrale Paris (France) Ithaca NY 14850 USA Cornell University (NY) (permanent) 879 Route de Beausoleil PhD in Computer Science 06320 La Turbie FRANCE From ramasv@CS.Cornell.EDU Tue Nov 27 10:54:06 2001 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARFs6R18269 for ; Tue, 27 Nov 2001 10:54:06 -0500 (EST) X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: cs615 PAPER 63 Date: Tue, 27 Nov 2001 10:54:05 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4301E7F299@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: cs615 PAPER 63 Thread-Index: AcF3W73VnRdoLhO3QpiT5JB/i4eSbw== From: "Venu Ramasubramanian" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fARFs6R18269 Cooperative File Storage, Chord These paper present an efficient and scalable data storage and data retrieval system designed for large peer-peer networks. There are two basic components in the system: Chord is a data retrieval protocol where values of data can be obtained efficiently from a large peer-peer network of nodes and DHash is a hash based mechanism that distributes the data based on hash values among the set of nodes in the peer-peer network. The important features of this system are that it is lograthimically scalable, evenly distributes load with high probablity and resilient to several independent failures. The Chord protocol performs a look-up service for data values given the key or the identifiers. The node-ids are hased to blong to the same domain as the keys. The data is placed at the node with smallest id bigger than the data key. The nodes are arranged in a circular list based on the hased node-ids. The routing is performed based on a data structure called finger list that ensures that data lookup, insert and deletions take place in O(log n) steps. Further a node insertion and a node deletion involve at most O(log^2 n) steps, where each step corresponds to an RPC. Fault tolerance is achieved by replicating data in successive nodes. The important concept here is that since node-ids are hashed, a successor in the circular list does not imply any geographical proximity. This has two impacts: it helps the fault tolerance because failure of succesive nodes is independent, it affects the routing performance because contacting a successor is no cheaper than contacting an arbitrary node. The CFS is a distributed file system using chord as the underlying data retrieval protocol. Each file block is a data value and a hash of the contents is the data key. Block level hashing helps the CFS to distribute load evenly in the network. CFS does not guarantee anonymity unlike Freenet. Block level hashing has a downside that the data retrieval involves several lookups and this increases the latency of the network. CFS proposes to do pre-fetching and caching to improve performance. Since CFS evenly balances load, the server in the peer-peer network with the least storage forms a bottleneck to the system. CFS solves this by splitting server's storage space into virtual nodes. But this might drastically affect fault tolerance as a failure of single server would induce failure of several virtual nodes. From avneesh@csl.cornell.edu Tue Nov 27 10:54:47 2001 Received: from capricorn.ds.csl.cornell.edu (capricorn.csl.cornell.edu [132.236.71.92]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARFslR18621 for ; Tue, 27 Nov 2001 10:54:47 -0500 (EST) content-class: urn:content-classes:message Subject: 615 Paper 63 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Tue, 27 Nov 2001 10:57:44 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <97C142C1212ED545B0023A177F5349C40A0A0C@capricorn.ds.csl.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 Paper 63 Thread-Index: AcF3XEB0QtfxEjwDS4qvPAgbQOAQ2g== From: "Avneesh Bhatnagar" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id fARFslR18621 Chord: A scalable Peer to peer Lookup Service for Internet Applications This paper discusses the implementation of Chord, a distributed lookup protocol that approaches the problem of efficient location of data within a peer to peer system. The main aim is to decentralize the storage mechanism, thus alleviating the scalability and maintanence problems as faced by Gnutella and Napster. The protocol does not guarantee anonimity as is done by Freenet, but provides a flat address space as Freenet. Furthermore, the protocol establishes nodes into a circular region, which allows for several optimizations regarding lookup latency guarantees, node joining/leaving, as well as handling failures. Chord uses keys to map onto nodes,providing a lookup(key) algorithm, that yields the IP address of the node. In order to increase performance,consistent hashing is used, which not only balances the keys on each node, but also optimizes lookup times, such that lookup requires O(logN) messages. In order to make key location more scalable, chord uses successor and predecessor information, in the form of a 'finger table', where the ith entry contains node s which succeeds the node by 2^(i-1) on the indetifier circle. Thus basically nodes and their successor info can be derived by (n+2^(i-1))mod 2^m, where m is the max number of entries in the finger table. This information helps because instead of now searching around the circle and taking a performance hit of n lookups, the node stores information about a small part of the network and uses a node whose ID is closer than its own to a particular key K to get to the node corresponding to K. The paper then evaluates the worst case join time as O(log^2N). An important part of the chord application is a stabilization algorithm which runs periodically to mantain correctness of finger table information. Bad table information would lead to more lookups than necessary. Also this algorithm helps to create freshness of information in the case when nodes leave or join. Simulation results yield chord to be failure resistant with an average path length of a lookup as 1/2 log2N.Also the protocol scales well for increasing number of keys. Future work attempts to look at improving behaviour under a network partition. I think that the chord protocol is interesting considering its scalability properties. The circular approach has also been used in the design of the PAST system. I am not sure about the consistency of the finger tables though, since these might be succeptible to attack, where successor values could be made inconsistent. Synchronized with the stability algorithm , this could yield a DOS kind of attack. ************************************* Wide Area Cooperative storage with CFS Summary/Critique: The Cooperative file System (CFS), establishes a decentralized file system, in which files are viewed as blocks of data, the blocks being distributed across multiple servers. CFS uses a disributed hash table (DHash), which is built upon the Chord peer to peer system. CFS is different from npaster or gnutella since it does not guarantee anonymity, and does not require servers to hold whole files. File block distribution is done by keeping different file blocks on different servers depeding upon the relaive load. Hence the main aim is to provide efficiency and robustness. Chord provides a structured address space, by placing nodes in a circular region. The DHash layer provides a simplified API for accessing the file system. File blocks are inserted into the CFS by taking a content hash of each block as the identifier of the block. The root block is then signed by the publisher's private key, which prevents a malicious user from overwriting a particular file's contents. The CFS system uses quotas and finite storage period mechanisms for data maintanence and preventing a user from uploading large amounts of data from a single machine. Block fetching while reading a file can be expensive, hence the DHash layer uses prefetching to alleviate the latency. Caching is also used, but this may lead to a consistency problem, since there is no update/invalidate mechanism, and a root block may be updated without the update being reflected in the caches. Hence, a client needs to be able to check the freshness information of a block. Another interesting feature of CFS is the ability to achieve load balance through the use of virtual servers on a lightly loaded server. This abstracts the machine characteristic from the file system, and a server can be configured with a number of virtual servers which are proportional to the machine's storage and network capacity. The authors experiment with real life and simulated (large number of virtual servers on a single machine), environments to establish an idea of the efficiency and scalability of the system. The effect of prefetching is similar to thar seen in a distributed shared memory system, where large prefetches can yield to bad caching and network behaviour. Server selection might not always yield the best download speed, especially since a nearby cached copy might be ignored. The experimentalk results show that lookup costs are scalable as well as virtual servers provide good load balancing. I think that this system is quite well designed, though some inherent problems with the Chord layer need to be addressed. There is also a cache consistency issue. From ranveer@CS.Cornell.EDU Tue Nov 27 11:20:49 2001 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARGKnR23454 for ; Tue, 27 Nov 2001 11:20:49 -0500 (EST) X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Subject: 615 PAPER 63 Date: Tue, 27 Nov 2001 11:20:49 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D430232E6BB@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 63 Thread-Index: AcF3X3mqk+NcxZDRSfiqWFHIKF5uPQ== From: "Ranveer Chandra" To: "Emin Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id fARGKnR23454 Chord and CFS Frank Dabek, M. Frans Kaashoek, David Karger, Robert Morris, Ion Stoica CFS is a wide area, read only cooperative file system. It has two distinct layers: DHash and Chord. DHash distributes replicas, performs queries and maintains cached copies. Chord is used to perform lookups on blocks to find the destination servers. The uniqueness of CFS is its block level replication and caching. This is in stark contrast to previous systems that use whole-file storage. Chord ensures the efficiency of lookups, joins and leaves by maintaining a ring of nodes. The ring is formed by the consistent hashing used by Chord. To reduce the speed of lookups, each node stores the list of 'k' of its successors. This list is used for faster lookups, O(log N) steps, where each step is an RPC call. Chord also ensures a worst case complexity of a join or a leave operation to be O(log^2 N). CFS uses the location services of Chord to provide a cooperative file storage system. Replicas of blocks are stored on 'k' successors of the owner along the ring. Since ring successors need not be physically close, this scheme adds fault tolerance. Caching is used to improve lookup time and virtual servers are used to evenly distribute the storage space. Virtual servers allows a server with a large storage capacity to act as two servers. So the load is higher on nodes with a higher storage. Overall CFS provides a highly efficient, distributed and scalable system for cooperative file storage. However, the efficiency of block level storage and caching for files shared on a CFS is arguable. For example, multimedia files would be requested as whole files and not as blocks. How much of overhead does this scheme incur? Secondly, security is ignored! A monitoring scheme would be of prime importance in any cooperative storage system. Finally, I thought that the number of steps as a measure of efficiency of Chord was misleading because it involves log(N) different routes to be used which is different from O(1) used by other systems. From daehyun@csl.cornell.edu Tue Nov 27 11:56:19 2001 Received: from wilkes.csl.cornell.edu (wilkes.csl.cornell.edu [132.236.71.69]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARGuJR29678 for ; Tue, 27 Nov 2001 11:56:19 -0500 (EST) Received: (from daehyun@localhost) by wilkes.csl.cornell.edu (8.9.3/8.9.2) id LAA67864 for egs@cs.cornell.edu; Tue, 27 Nov 2001 11:56:14 -0500 (EST) (envelope-from daehyun) From: Daehyun Kim Message-Id: <200111271656.LAA67864@wilkes.csl.cornell.edu> Subject: 615 PAPER 63 To: egs@CS.Cornell.EDU Date: Tue, 27 Nov 2001 11:56:14 -0500 (EST) X-Mailer: ELM [version 2.4ME+ PL54 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit I'm a presenter today. My slides are at 'www.csl.cornell.edu/~daehyun/Present1.pdf' 'www.csl.cornell.edu/~daehyun/Present2.pdf'. From papadp@ece.cornell.edu Tue Nov 27 12:00:16 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARH0FR00550 for ; Tue, 27 Nov 2001 12:00:15 -0500 (EST) Received: from kiki.ece.cornell.edu (kiki.ece.cornell.edu [128.84.83.13]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fARGvWM20564; Tue, 27 Nov 2001 11:57:32 -0500 Date: Tue, 27 Nov 2001 12:03:30 -0500 (EST) From: "Panagiotis (Panos) Papadimitratos" To: Emin Gun Sirer cc: papadp@ece.cornell.edu Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Review of: "Chord: A scalable Peer-to-Peer Lookup Service for Internet applications" & "Wide-Area cooperative storage with CFS," by I.Stoica, R.Morris, F. Kaashoek, D. Karger, F.Dabek,H. Balakrishnan The protocols described in the former paper provide a lookup service in order to retrieve data distributed over a (large) number of peers. The latter paper presents a distributed file system that utilizes chord as a lookup/routing method on top of a layer responsible for data caching and replication. Both systems target simplicity, efficiency, load balancing, scalability, flexible decentralized control and availability. Chord organizes nodes as a logical ring, according to the node identifiers derived from a hash function and the maintenance of successor indices. Each node maintains information on a subset of nodes for fault recovery; the successor list is large enough (logN) in order to have successful lookups with high probability. The protocol also addresses the cases of nodes joining or leaving the system (i.e., the ring) by reallocating their keys (identifiers in the same space as the node identifiers (extracted with use of a hash function); example: data names) to other nodes. The stabilization alrorithm provides some probabilistic guarantees in case of multiple concurrent changes in the ring. The resolution of queries is performed by propagating them along the ring to the 'closest'successor. CFS relies on DHash, a layer responsible for splitting each file into a number of blocks distributed over a set of nodes (servers), while the degree of replication depends on the popularity of the file, in order to provide load balancing. Moreover, the caching is adaptive but it only targets smaller files. The lookup is performed by Chord and data are fetched by one or more successive lookups, with the prefetching and the adaptive caching trying to render this process more efficent. Although both schemes address highly decentralized, open, distributed environments, they do not discuss possible implications from the presence of malicious nodes. Availability and fault tolerance are along these lines and overwhelming populations of nodes may indeed mask the effects of attacks, but there is no clear statement of what are the achieved properties in terms of security and survivability of these systems. From viran@csl.cornell.edu Tue Nov 27 12:17:33 2001 Received: from moore.csl.cornell.edu (moore.csl.cornell.edu [132.236.71.83]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARHHXR03359 for ; Tue, 27 Nov 2001 12:17:33 -0500 (EST) Received: from localhost (viran@localhost) by moore.csl.cornell.edu (8.11.3/8.9.2) with ESMTP id fARHHSx24503 for ; Tue, 27 Nov 2001 12:17:28 -0500 (EST) (envelope-from viran@moore.csl.cornell.edu) X-Authentication-Warning: moore.csl.cornell.edu: viran owned process doing -bs Date: Tue, 27 Nov 2001 12:17:28 -0500 (EST) From: "Virantha N. Ekanayake" To: Subject: 615 Paper 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Chord is distributed lookup protocol that provides one service - it maps a key onto a node. This can be a mapping of hostnames to IP addresses, or filenames to the associated server. This later functionality is used as the basis for block level lookups (a large file is split into several blocks on multiple servers) in the CFS distributed file system. Chord uses a form of consistent hashing to provide minimal movement of keys on node entry and departure, with modifications to support scaling. Normally, the circular nature of keys mapping to successor nodes results in O(N) lookup time. However a "finger table" is kept to reduce the lookup time to O(logN) -- basically, every node now knows the identities of nodes at powers of two intervals on the circle. It's important to note that this lookup and file system does not provide anonymity. Naturally this provides tighter bounds on lookup latency. The CFS file system is built on top of this Chord lookup name, as well as a DHash layer that provides support for handling file blocks. The maximum block size is on the order of tens of kilobytes, which could lead to large amounts of fragmentation for larger file, and thus larger latency in retrieving the file (which can be mitigated by prefetching). One service that is difficult to provide in this system is collaboration -- only the publisher can update a file, thus making it difficult for multiple parties to modify data in a seamless fashion. From samar@ece.cornell.edu Tue Nov 27 13:33:03 2001 Received: from memphis.ece.cornell.edu (memphis.ece.cornell.edu [128.84.81.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARIX3k16575 for ; Tue, 27 Nov 2001 13:33:03 -0500 (EST) Received: from plato (plato.ece.cornell.edu [128.84.81.135]) by memphis.ece.cornell.edu (8.11.6/8.11.2) with ESMTP id fARIUJM23606 for ; Tue, 27 Nov 2001 13:30:19 -0500 Date: Tue, 27 Nov 2001 13:36:22 -0500 (EST) From: Prince Samar X-X-Sender: To: Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications & Wide-area Cooperative Storage with CFS Chord is a distributed lookup protocol that maps a given key onto a node in a peer-to-peer environment. The Cooperative File System (CFS) is a peer-to-peer read-only file storage system that utilizes Chord to do the lookup and routing. The design goals of both the protocols are load balancing, decentralization, scalability, availability and efficiency. Chord utilizes consistent hashing which assigns each node and block an m-bit identifier. These identifiers can be arranged as points in a ring. Given a block's ID, chord returns the block's successor - the server whose ID most closely follows the block's ID in the identifier circle. Each node maintains O(log N) entries in a routing table called the finger table. The protocol gives a bound O(log^2 N) on the number of messages required to handle the event of a node joining or leaving an N-node Chord network. Each node of the network runs a stabilization algorithm periodically to maintain correctness of the finger table in the even of nodes joining the system concurrently or nodes failing or leaving the system voluntarily. The CFS consists of two layers, Chord and DHash. The DHash (Distributed Hash) layer distributes the blocks among the servers, maintains cached copies and fetches blocks for the client. DHash provides load balancing by file replication, caching and pre-fetching and controls the limit on the amount of data a server must store on behalf on others. The Chord layer maintains the routing tables to find the blocks. CFS presents stored data to applications through a ordinary file system interface. The attractive features of both Chord and CFS are simplicity, provable correctness and provable efficiency. However, the security provided by the system is not clear. How would the presence of some malicious nodes affect the performace of the overall system. Also, partitioning is a big problem with Chord and the approach mentioned by the authors does not look convincing to me. From gupta@CS.Cornell.EDU Tue Nov 27 14:08:01 2001 Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARJ80k22892 for ; Tue, 27 Nov 2001 14:08:00 -0500 (EST) From: Indranil Gupta Received: (from gupta@localhost) by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) id fARJ80k13814 for egs@cs.cornell.edu; Tue, 27 Nov 2001 14:08:00 -0500 (EST) Message-Id: <200111271908.fARJ80k13814@zinger.cs.cornell.edu> Subject: 615 PAPER 63 To: egs@CS.Cornell.EDU Date: Tue, 27 Nov 2001 14:08:00 -0500 (EST) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Chord and CFS. Stoica, Morris, Karger, Kaashoek, Balakrishnan. This project has designed a p2p infrastructure for file sharing, Internet applications etc., that work on large-scale groups of processes. The basic scheme works by assigning each node a random identifier in a large name space, obtained by hashing the node's ip address and port number. Routing from any node to another is done by a hypercube-like routing which guarantees O(logN) average case routing time. Dynamic groups are supported by allowing nodes to join, leave and fail. Comments: - Chord/CFS is similar to the design of several other large-scale p2p systems such as PAST/Pastry, Tapestry, etc. Several researchers have raised the issue of whether such systems might actually be inherently unscalable due to several reasons. One reason is that the rate of failures rises linearly with rising system size. Each such failure leads to massive amounts of copying across replicas, and as the system size is scaled, this might eventually lead to a phenomenon where the system is effectively 'thrashing', copying data everywhere and unable to cope with failures. Such a phenomenon has been observed in group communication systems (Isis, Horus, etc.), limiting their scalability. Randomized algorithms and probabilistic guarantees at the lower layers of the stack are one way to avoid this problem. - Each node needs to maintain an invariant that it knows about its preceding and succeeding k neighbors in the name space. This will potentially use heartbeating across these neighbors. As id's are assigned randomly, this implies that each heartbeat will typically travel across several hops in the network. When such a p2p system with thousands of nodes is deployed on a WAN-wide scale, such a heartbeating, if done at a high rate, will overload core routers in the network. If the heartbeating rate is not high enough, the system will be in an inconsistent state due to the increasing rate of failures with number of nodes. From jcb35@cornell.edu Tue Nov 27 14:54:38 2001 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.7) with ESMTP id fARJsbk01667 for ; Tue, 27 Nov 2001 14:54:37 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id OAA08324 for ; Tue, 27 Nov 2001 14:54:30 -0500 (EST) From: jcb35@cornell.edu Date: Tue, 27 Nov 2001 14:54:28 -0500 (EST) X-Sender: jcb35@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 63 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Chord Chord provides a distributed lookup mechanism for peer-to-peer applications. Chord aims to create a system that scales better than other peer to peer systems (such as gnutella) and is distributed and balances the load well. Chord provides a lookup(key) algorithm, which will yield the ip responsible for the given key. Because of this, Chord does not provide anonymity, but unlike freenet, its lookup operation runs in predicatable time and always results in success or failure. The lookup algorithm is executed by each node in the Chord joins a circle where each node has a predecessor and a successor. On top of this, each node has a finger table which contains pointers to a small number of nodes around the circle. Chord can execute a lookup function in logn messages across the network. Chord also provides reasonable join and leave procedures. With regard to failures and replication, chord allows the application to take care of storing information at nodes surrounding the node a key is located at. It also uses the finger list to figure out the new successor and predecessor information after another node has failed. They mention in further work that malicious Chord nodes could cause a problem - it would be interesting to look at what needed to be authenticated and verified. I am also curious about network partitioning and how chord could efficiently handle splits and rejoins. CFS uses Chord as a bottom later of a file system. It uses DHash to perform block fetches for clients, distributing blocks among the servers, and maintains cached and replicated copies. DHash uses Chord to locate servers responsible for blocks. As with other systems such as Freenet, it does not support some kind of keyword search mechanism, and it would be interesting to make it secure.