From tmroeder@CS.Cornell.EDU Tue Oct 29 00:24:20 2002 Received: from dhcp98-88.cs.cornell.edu (dhcp98-88.cs.cornell.edu [128.84.98.88]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9T5OKh11464 for ; Tue, 29 Oct 2002 00:24:20 -0500 (EST) Received: (from tmroeder@localhost) by dhcp98-88.cs.cornell.edu (8.11.6/8.11.6) id g9T5MWH15972; Tue, 29 Oct 2002 00:22:32 -0500 From: Thomas Roeder MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15806.6936.928259.266338@dhcp98-88.cs.cornell.edu> Date: Tue, 29 Oct 2002 00:22:32 -0500 To: Emin Gun Sirer Subject: 615 PAPER #60 X-Mailer: VM 7.07 under Emacs 21.2.1 In the papers for today, we were asked to do an evaluation on lookup time, failure resilience, simplicity and tamper-resistance. I was reminded of the common saying: "cheap, quick, and good: pick two". It seems that none of these systems perform well in all the areas. Freenet is a collection of peered nodes which share space on their hard drives in a manner designed to maximize the anonymity of the persons involved. It does fairly well against tampering, although only in certain modes; the KSK mode allows attackers to forge content quite easy, but the SSK mode does not. It gains the ability to block these attacks at the expense of providing a little bit more data about the producer of the information (namely, a key for a namespace. This allows an attacker to discover that two documents were likely authored by the same person. It creates an identity without attaching it to an individual). Its failure resilience is good on average, but does not necessarily hold up well under extreme circumstances. For example, if data is on a node, and that data has not yet been requested by anyone, it can be removed by the failure of that one node. This is one of the problems with its lazy replication. The lookup time is entirely dependent on the user, and lookups may or may not succeed independent of whether or not the data is in the system. The hops-to-live here is an attempt to push the lookup time onto the user of the system; if they guess well, it's small, and otherwise it's not. The simplicity of Freenet is reasonable, given the metric of anonymity. The Freenet paper made me think about creating a peer-to-peer proxy http network for those users whose government blocks them from reading certain sources of information. The service could be run on port 80 and look just like an HTTP request, but the peer would proxy it to either another peer or the requested server. There would have to be some quirks in the system that I won't go into here, but I think it is an interesting idea. The next paper was on the characteristics of Gnutella, which is the P2P replacement for the Napster system. Basically, it involves broadcasting a search instead of passing through a central server. As the paper points out, this is a Bad Thing. It has low resistance to tampering, since anyone can claim to have the file you want. It has slow lookup, since broadcast inherently causes storms and is naturally slow, since you must flood to find some data. The failure resilience is both low and high: since there are potentially many servers which have the data you want, you can easily find another server which has it. Unfortunately, the percentage of good to bad data found is not high, due to the tampering problem (both innocent (posting junk data) and malicious (seeking to overwhelm the network with bad copies)). The simplicity is very high, but the points above show that this is not very valuable, given the other problems in the system. CAN describes a general system to address content in a network. It is not particularly simple at all, having an underlying model of a d-dimensional toroid, where the data maps to points in the space. It looks like it should have, on average, better lookup time than the other protocols studied, since it tries to take a straight path through the space to the data. Unfortunately, in the worst case, this "straight path" might pass through many more servers than a more roundabout one, since there do not seem to be guarantees (at least in the constructive phase of CAN) about balancing the subdivision of spaces. The fault tolerance can be made to be quite good with the various multiple-reality/-dimension models proposed in the paper (although each of the proposals hurts the lookup time of the protocol), but in the base version is not particularly good, since the failure of a peer, in the worst case, can cause unrecoverable data loss. CAN has basically no tamper-resistance, and a node can claim almost anything about the data it is asked to retrieve (or requests that it is asked to forward). This is a significant issue that really ought to be solved by some sort of keying on the content of the data, as in Freenet. From shafat@CS.Cornell.EDU Tue Oct 29 03:21:43 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9T8Lhh11660 for ; Tue, 29 Oct 2002 03:21:43 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 60 Date: Tue, 29 Oct 2002 03:21:42 -0500 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D134FB7@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 60 Thread-Index: AcJ9QB0zH1PJBy9LQhmJL9SG18+Ylg== From: "Syed Shafat Zaman" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g9T8Lhh11660 Freenet: -------- When a request for a file is placed in Freenet, the user must first obtain or calculate the binary file key. If the file is not found at the local node, a lookup for the nearest key is done on its hash table and a request is sent to the corresponding node. In case of another lookup failure there, the request is further propagated to the next preferred node. This process continues until the appropriate file is located, or all the nodes have been searched, or if the request's hops-to-live limit is exceeded. However, Freenet also uses caching to store any successful lookups which can improve performance on subsequent lookups. This is reflected in the simulations in which the request path lengths decreased over time. The simulations show that Freenet's fault tolerance performance is quite satisfactory (median path length remain below 20 at 30% node failure rate) despite the absence of any concrete mechanism to guard against failures. This result can perhaps be attributed to the "similar key" approach Freenet takes while retrieving data in the network. Security features in Freenet aim to protect the anonymity of users in the network. Sender, and thus requestor, anonymity is maintained by the fact that a node cannot tell whether its predecessor is sending a request or merely forwarding it. As for guarding actual data files against malicious attacks, if they are stored under content-hash keys or signed subspace keys, inauthentic data can easily be detected due to a hash collision. Data stored under other types of file keys are however vulnerable to attacks. The paper also proposes two probably ways of thwarting a number of DOS attacks on Freenet. Gnutella: --------- Gnutella is a decentralized P2P file-sharing model. When a query is placed from a client node, the announcement is broadcasted to neighbors n-hops away, where n is the TTL value of the query. Lookup time can therefore vary largely. No evaluation or quantitative information is presented in the paper that could be used in a direct comparison of Freenet with Gnutella. However, given that there is no caching and the somewhat unfinished state of Gnutella, its performance will very short of that of Freenet's. Gnutella does not appear to be failure resilient at all. The paper states frequent download failures which could very well be due to nodes crashing or taken off the network. There is much needed to be done to make Gnutella more robust. The case is also very similar when it comes to security. Currently, there is nothing in the model to stop a malicious user to serve junk or malicious files in the network. The paper proposes a not-so-unique user registration and user rating model to indirectly address this problem. Although at first glance, there still seems to be a number of ways to get around this barrier. CAN: ---- CAN or Content-Addressable Network primarily targets reducing average path lengths between nodes, and hence improve lookup times. Its multi-dimensional structure serves as the first step towards this goal. This provides each node with a number of neighbors that is directly proportional to the dimension d. The paper suggests further improvements that have all been proven to be feasible and effective in the simulations. By maintaining multiple realities, RTT-weighted routing, overloading coordinate zones, using multiple hash functions and lastly constructing topologically-sensitive CAN overlay networks can all contribute towards improved lookup times. In this respect, CAN seems to be a much superior protocol when compared to Freenet or Gnutella. All the design improvements mentioned in the paper, and repeated above, also unintentionally make CAN a highly fault-tolerant system. Since each node is capable of having a large number of neighbors due to CAN's "spatial" structure, a node only becomes unreachable in an unlikely circumstance of all its neighbors dying simultaneously. Having multiple realities also creates replicas of the same key spread across the network - thus improving failure resilience. In addition, by overloading coordinate zones and using multiple hash functions, CAN further ensures that the system remains robust and tolerant against failures. The paper does not address the issue of security in CAN at all, which could be interpreted as a major weakness in the system. There is currently no mechanism in place that prevents nodes from falsifying file pointer information, or point to malicious files in the network. The paper also does not touch upon file maintenance on the nodes. However, my assumption is that it should not be too difficult a task to implement a number of security features in CAN to guard against attackers. From jsy6@postoffice2.mail.cornell.edu Tue Oct 29 07:59:43 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TCxhh04990 for ; Tue, 29 Oct 2002 07:59:43 -0500 (EST) Received: from Janet.cornell.edu (syr-24-58-41-193.twcny.rr.com [24.58.41.193]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id HAA03601 for ; Tue, 29 Oct 2002 07:59:41 -0500 (EST) Message-Id: <5.1.0.14.2.20021029075842.00b87100@postoffice2.mail.cornell.edu> X-Sender: jsy6@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 29 Oct 2002 07:59:07 -0500 To: egs@CS.Cornell.EDU From: Janet Suzie Yoon Subject: 615 PAPER 60 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Each node in Freenet maintains a datastore of keys that are associated to files and a table of references (other nodes and the keys they contain). Freenet queries are routed hop by hop based solely on local information on which node would make the most progress towards a node containing the target file in datastore. The paper claims that this kind of decentralized algorithm eliminates centers of attacks. Unfortunately, centers of attacks since Freenet is based on a small-world model. The performance of Freenet is dependent on a few nodes that contain high connectivity. Although Freenet performs well in the face of random attacks (and failures), it is highly vulnerable towards attacks targeted to these few highly connected nodes. The scalability of Freenet can also be attributed to the small-world network model. The average lookup time seems to be logarithmic to the number of nodes, but the worst case time is bad (O(n)). The performance of Gnutella is not dependent on the small-world effect. Gnutella uses broadcasting to perform queries. A locally maintained host catcher list containing addresses of other nodes and the nodes they know about is used to select a set of active communications to other peers. Due to its broadcast nature, Gnutella is not very scalable and its performance is not as good as Freenet's. Gnutella is also not very fault tolerant, failure of a node requires reshuffling of remaining peers to replace the lost connection. One of the major problems of Gnutella is download failures. Content-Addressable Network (CAN) distributed hash table meant for usage in Internet proportions. The basic functions in CAN are insertion, lookup, and deletion which are centered around a d-dimensional Cartesians coordinate space on a d-torus. Each node in CAN contains a piece (zone) of the entire hash table and information about nodes in adjacent zones. Routing is done using a greedy-forwarding method nodes forwards a query to the neighbor with the coordinates closest to that of the destination. CAN is fault-tolerant since many different routes exists between any two nodes. If a node becomes unreachable (losses all its neighbors), a take-over algorithm is employed to have another node take over that zone. CAN contains all the desirable features of being scalable, fault tolerant, and good performance but achieves these by sacrificing simplicity. From bd39@cornell.edu Tue Oct 29 09:41:23 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TEfNh24062 for ; Tue, 29 Oct 2002 09:41:23 -0500 (EST) Received: from boweilaptop.cornell.edu (r102439.resnet.cornell.edu [128.253.163.42]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id JAA15134 for ; Tue, 29 Oct 2002 09:41:19 -0500 (EST) Message-Id: <5.1.0.14.2.20021029093854.00bd3ee0@postoffice2.mail.cornell.edu> X-Sender: bd39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 29 Oct 2002 09:39:30 -0500 To: egs@CS.Cornell.EDU From: Bowei Du Subject: 615 PAPER 60 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Paper 60 - Freenet - Freenet is a system which aims to provide anonymity for its users, deniability for the storers of information, a distributed, decentralized organization. Also, there are provisions to maintain the scalability of the system to large numbers of nodes. Files in Freenet are maintained using a two level namespace that is global throughout the system. The file identifiers consist of signed subspace key, which is a user namespace (public portion of a public/private key pair) hashed and XOR'ed with the hash of a descriptive name of the file. The private key is then used to sign the file and the content of the file is encrypted as well. - the encryption of the files stored on the file servers and the use of SSK enable the operators of the file servers to deny knowledge of the the content of their servers. - anonymity of requests is served by the hash id as well. Also, only the end user and the one who posted the file can decrypt the content of the file. - SSK ensures that only the creator of the subspace will be able to change information regarding their subspace. - file updates can be achieved by using a level of indirection. File are identified by their content hash, and the user posts redirections to newer content. This ensures that old content can still be accessed. Searching for files in the network is achieved by a n-hop deep query into neighboring nodes. Nodes can drop a request at any time, and requests are routed in preference to entries in the routing table that have routing hash entries closest to the hash entries that the request is for. When the file is found, the file is cached in all nodes on the path back to the requestor. Inserts are done in a similar manner, inserting n-hops deep, along the path with nodes close to the hash of the inserted item. Improving the Gnutella Protocol - Gnutella is a peer-to-peer network with the goal of sharing files in decentralized manner. Gnutella clients announce their presence by a ttl limited flood into the network. Queries are propagated to every node on known to a Gnutella node. After this, gnutella protocol packets are exchanged between nodes. The paper does not go into detail about how the protocol is run, assuming that each implementation is free to do whatever they want. The paper lists many problems with the present gnutella network, among them: Ping flooding between nodes, load balancing, encouragement of content sharing versus freeloading. An interesting aspect of this paper is the proposal of a user rating system, in which the identity of the user operating a node can be obtained. This common infrastructure can be used to reduce freeloading of users on the network, i.e. taking and not giving back to the community. The user registration is unfortunately not as distributed as the network, which may be a bottleneck in implementation. Scalable Content-Addressable Network - CAN presents a manner in which a hash table naming scheme can be implemented in a distributed manner across the internet. The hash table stores (key, value) pairs and supports insert, lookup and delete operations. The goals for CAN are to be scalable, completely distributed and fault tolerant. - Every node owns a set of the hash table key space (zone) - The zones are regions of a d dimensional cartesian co-ordinate space arranged in a d-torus (i.e. edges wrap around.) A uniform hash function is applied to a key, which maps the key to a particular zone. - routing is path formed in the cartesian space from src to destination node. Every node keeps track of its nearest neighbors. - Claim: average routing length is grows at a rate of O(n^1/d) and each node needs to keep track of 2d neighbors. - New nodes claim zones by randomly picking a point in the whole space and splitting that zone in half, sharing it with the node that had claimed the zone. - Replication can be implemented by having different instances of CAN running at the same time on seperate nodes. Also, multiple nodes can occupy the same zone. - Various methods to improve the performance of CANs: knowledge of the underlying internet topology to pre-segregate the nodes into zones. A local volume balancing scheme can improve the distribution of nodes to the volume of a zone. Also, key/value pairs may be cached at nodes that see frequent requests, and replicated to nearby nodes in times of high load. Evaluation of the Hash Table schemes - Gnutella does not seem to propose any such scheme. (Naive naming scheme) Freenet - In Freenet the hash space is randomly distributed among the nodes in key distance from the node's prefered key. Load imbalance can be caused by having poor distribution of keys, or by in insertions not deep enough into the network. It can be seen that the 3rd quintile of path lengths in the evaluation maintains path lengths of 80 hops over time. Newer nodes with closer prefered hash table entries will only aggregate files only after other nodes access the data from that node. The process of node discovery and transfer of key,value pair to the appropriate node is important to the overall load balancing scheme. CAN - explicitly segments the entire hash space into zone controlled by the node. Then advantage of this scheme is that there is no question as to whether the node designated to accumulate particular keys is actually going to do so. CAN is also nice because it does this in a local fashion - there is no need to broadcast a node's presence in the network, except to nodes that are local to it. (Contrast with gnutella, which had problems with the scalability of ping flooding in announcing node presence. Also contrast to Freenet, which has no explicit distribution of key,value pairs within the network). CAN is also nice because it explicitly addresses the issue of replication and loadbalancing in the algorithm, rather than relying mechanisms (Freenet) which could very much depend on access patterns. From kwalsh@CS.Cornell.EDU Tue Oct 29 10:34:02 2002 Received: from duke.cs.duke.edu (duke.cs.duke.edu [152.3.140.1]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TFY1h04603 for ; Tue, 29 Oct 2002 10:34:01 -0500 (EST) Received: from localhost (curly.cs.duke.edu [152.3.140.76]) by duke.cs.duke.edu (8.9.3/8.9.3) with ESMTP id KAA22446 for ; Tue, 29 Oct 2002 10:34:00 -0500 (EST) From: kwalsh@CS.Cornell.EDU Received: from 132.236.29.71 ( [132.236.29.71]) as user walsh@imap.cs.duke.edu by login.cs.duke.edu with HTTP; Tue, 29 Oct 2002 10:34:00 -0500 Message-ID: <1035905640.3dbeaa68a181f@login.cs.duke.edu> Date: Tue, 29 Oct 2002 10:34:00 -0500 To: egs@CS.Cornell.EDU Subject: 615 PAPER 60 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit User-Agent: Internet Messaging Program (IMP) 3.0 X-Originating-IP: 132.236.29.71 Freenet: A Distributed Anonymous Information Storage and Retrieval System. Improving Gnutella Protocol: Protocol Analysis And Research Proposals A Scalable Content-Addressable Network (aka: CAN) These three systems provide similar services but have very different motivations. Each provides hash-table primitives: LOOKUP(key), INSERT(key). Only Gnutella is able to directly support a QUERY primitive, although the others claim (dubiously) that this can be built at a higher level. In brief, the motivations: Freenet aims to provide anonymity for during both inserts and lookups, and is intended to provide robustness in the face of hostile agents (legalistically and technically). Gnutella, as freenet, is intended to be a complete solution to file-sharing in the post-Napster era. Both are designed for immediate deployment. CAN, however, is a building block for higher level services, such as file-sharing, DNS, distributed file systems, etc, and is largely a research rather than production system. The differences in motivation translate into differences in design. Freenet and Gnutella can be classified as unstructured overlay networks. Construction of the overlay (ie, choice of neighbors) is essentially random. In order to route requests, gnutella falls back on full-scale flooding of lookups, queries, and join announcements. This imposes great demands on the network, since there is no mechanism to intelligently route queries, prune search trees, or choose neighbors, or reorganize the overlay topology for better performance. The simplicity of the system, however, has lead to rapid deployment (and rapid congestion collapse, alas). Failures in this system are trivial to handle: since no node plays any particular role in the overlay, its departure will hardly be noticed and impact only the aggregate quality of service and availability of the particular documents uniquely owned by that node. Freenet, also unstructured, takes a very different approach. First, a logical structure is imparted on the key space, abandoning the QUERY primitive in favor of more intelligent routing. Now, in order to route requests (lookups, inserts, joins) for a particular key, a node passes the request to a neighbor closer in the key space. A successful lookup for a document also replicates the document along the entire path taken by the search. The designers claim that these features (among others) will lead to nodes specializing in particular areas of the key space, improving common case requests. Still, it is difficult to analytically understand lookup time, since the topology is only heuristically constructed, and depends very much on usage patterns. Overall complexity of the system is fairly low, however, making it simple to implement and deploy, and providing some level of resistance to failure. Since tamper-resistance was built in from the start, it achieves this goal more so than any of the other systems. An attempt is made to provide availability of documents in the face of a hostile agent by replication, encryption, and anonymity. The structured CAN overlay is the most complex of the three, making it more susceptible to failures but also more efficient in terms of route lengths. Lookup time is O(n^1/d) where d is a parameter that increases demands on each node (route table size, storage size, network control overhead). Although the concepts are quite simple, the many optimizations and failure modes make the resulting implementation somewhat complex. Some amount of resistance to failure is achieved by replication, although this effectively multiplies demands on individual nodes. The protocol also has the ability to route around failures in the same manner as freenet. Finally, I do, as always, have some criticisms of the papers. The most glaring is the general confusion between closeness in a logical sense and closeness in a physical sense. In all three base systems (excluding the speculative geographic enhancement for CAN), closeness in the overlay network has nothing to do with closeness in the physical network. Also, there is very little acknowledgement or leveraging of small-world phenomena (although such phenomena are only recently being studied). Freenet does make some claim along these lines, but the argument and intuition behind what makes a small world network is just plain wrong. From mr228@cornell.edu Tue Oct 29 11:37:13 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGbDh17064 for ; Tue, 29 Oct 2002 11:37:13 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id LAA19461; Tue, 29 Oct 2002 11:37:11 -0500 (EST) Date: Tue, 29 Oct 2002 11:37:11 -0500 (EST) From: mr228@cornell.edu Message-Id: <200210291637.LAA19461@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: mr228@cornell.edu Reply-To: mr228@cornell.edu Cc: mr228@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: mr228@cornell.edu X-Originating-IP: 128.84.223.176 Subject: 615 PAPER 60 The three papers we read present mechanisms for content distribution and indexing in peer-to-peer networks. Each has slightly different goals and therefore different strengths and weaknesses. Freenet is a system for anonymous file sharing. They seek anonymity not only for consumers but also the publishers of content. The system is a fairly complex one and is based on public-key cryptographic primitives. The main problem with Freenet is its inability to be searched. Since Freenet users publish within their own "namespace" (based on a key pair they generate) the problem becomes an inability to search. You must 'meet' the person who placed the file into the network in order to retrieve it. Freenet's primary contribution is anonymity and this reduces the anonymity somewhat. While the argument can be made that the person handing you the key is not necessarily also the publisher, I bet this is rare in practice. Future work might consider a system in which participants can publish a key in an anonymous fashion. The Gnutella paper was lacking in real content. Gnutella was started as a truly decentralized successor to Napster. Napster's fatal problem was it's centralized server-based model. Not only did this create a single point of failure and bottleneck (in terms of performance) but they were the subject of many lawsuits which ultimately caused their downfall. Gnutella is a completely decentralized system for file sharing. This paper describes some of the underlying problems in a decentralized P2P system such as this, but stops short of clear descriptions of how Gnutella actually solves many of them. Gnutella's main problem is scalability, specifically with respect to the massive flooding that occurs. CAN is a distributed hashtable system much like Pastry or Chord. It is a fairly complex system in which data is mapped into d-dimensional space effectively producing it's signature within the network. Unlike Chord and Pastry, this system seems not to have strong guarantees concerning lookup times. The protocol tries to cut this virtual space in the most efficient way and thus take the shortest path across the network to the data. Like Chord and Pastry each node has a portion of the key space where content with that key (signature) can be found. CAN seems to be fault tolerant, but is lacking in simplicity. From nbs24@cornell.edu Tue Oct 29 11:42:14 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGgDh18133 for ; Tue, 29 Oct 2002 11:42:13 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id LAA23581; Tue, 29 Oct 2002 11:42:09 -0500 (EST) Date: Tue, 29 Oct 2002 11:42:09 -0500 (EST) From: nbs24@cornell.edu Message-Id: <200210291642.LAA23581@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: nbs24@cornell.edu Reply-To: nbs24@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: nbs24@cornell.edu X-Originating-IP: 132.236.71.82 Subject: 615 PAPER 60 Freenet Freenet is implemented as an adaptive peer-to-peer network of nodes with the ability to store and retrieve data files in a cooperative distributed manner. It is designed to address concerns of privacy and availability and enables users to share free disk space on their hard drives. Files can be identified by three different binary file keys (KSK, SSK, CHK); obtained by applying the 160-bit SHA-1 hash function. The KSK uses a string chosen by the user to generate a public/private key pair. The public key is used to obtain the file key, while the other provides minimal integrity for files. However, this can suffer from a dictionary attack. Also nothing prevents two users form choosing the same string for different files or from inserting junk files under popular descriptions. The SSK enables personal namespaces by allowing a user to create a randomly generated a public/private key pair. This is more secure than the KSK. The CHK is derived by hashing the contents of the corresponding file. The lookup time depends on the hops-to-live values but given a reasonable value the lookup time may take polynomial time. Failure resilience is not high because if a node fails and data on that node has not been replicated, the data is completely lost. Freenet is fairly simple. KSK and SSK are obviously not tamper resistant. Gnutella Gnutella is a decentralized peer-to-peer file-sharing model, which sought to improve the centralized Napster model. Queries are broadcast to neighbors based on a time-to-live value. Packets are then passed on between nodes. The authors identify numerous problems existent in the current network model. The lookup time is high because flooded queries may actually fail to find files that exist on the system because of the limited time-to-live value. Failure resilience may be considered reasonable since the network is decentralized and several servers have copies of the same data. Gnutella uses a very simple model. However, it is not tamper-resistance – it is possible for an attacker to insert junk files on the servers. CAN CAN is a distributed infrastructure that provides hash table-like functionality on Internet-like scales. The design goals are to provide scalability, fault- tolerance and complete self-organization for peer-to- peer systems like Napster and Gnutella, large-scale storage management systems and in the construction of wide-area name resolution services that decouple the naming scheme from the name resolution process. The design centers around a d-dimensional coordinate space. Each node maintains a zone of the entire hash table and information about nodes in neighboring zones. The lookup time could be polynomial. The failure resilience is high since different routes are provided between nodes and provision is made a node taking over the zone of an unreachable node. CAN is fairly complex. It’s hard to comment on its tamper-resistance since security is not addressed in the paper. It probably has little or none. Nana B. Sam From hs247@cornell.edu Tue Oct 29 11:47:45 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGljh19296 for ; Tue, 29 Oct 2002 11:47:45 -0500 (EST) Received: by travelers.mail.cornell.edu (8.9.3/8.9.3) id LAA27820; Tue, 29 Oct 2002 11:47:41 -0500 (EST) Date: Tue, 29 Oct 2002 11:47:41 -0500 (EST) From: hs247@cornell.edu Message-Id: <200210291647.LAA27820@travelers.mail.cornell.edu> To: egs@CS.Cornell.EDU Errors-To: hs247@cornell.edu Reply-To: hs247@cornell.edu MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: IMP/PHP3 Imap webMail Program 2.0.9 Sender: hs247@cornell.edu X-Originating-IP: 128.84.99.155 Subject: 615 Paper 60 Freenet is a distributed file system based on identifying a file by a hash. Each file when created is associated with a short descriptive string. The string is used to determine a public key and private key for the file. The public key is then hashed to become the key for the file. This kind of key is called keyword-signed key. There are two other hashes called signed-subspace key which offers directory like control for users, and content hash key which is good for updating files. In this system each node is responsible for a set of hash keys. This is achieved by key to location lookup tables. In the paper A Scalable Content-Addressable Network (CAN), each node in the network is responsible for region in the hash space. If you can imagine the hash space to be in the Cartesian plane, the plane is divided in to N regions where each region is managed by a node. In addition to its own region, each node knows which nodes manage its neighbouring regions. In both networks, “closeness” does not have geographical meaning. When two keys are close, they may not necessarily be close in geographic location. In CAN, the initial lookup for a key can be seen as a straight line from the requesting node to the destination, as each node knows who its neighbours are and therefore the request can be forwarded always in the right direction. For Freenet, the lookup is basically a best guess lookup. It can potentially try all its neighbours and therefore N nodes. However, because requests are cashed, subsequent lookups can be cheaper. The claim is that the more popular a lookup is, the more it will be optimised. As far as simplicity, Freenet is simpler in that it assumes a lot less. Look ups are not guaranteed and if a node dies, all of the files on that node dies also. So file storage is not guaranteed. Therefore managing the system is simpler. This also carries over into the fault tolerant schemes. Freenet just assumes that if a file is needed and popular, it will be replicated on many nodes, however if a file happens to die with a node, it is lost. Therefore Freenet would only be good for certain files, as there are files which are important yet not accessed a lot (ie. source files to compile a kernel). CAN takes into account nodes leaving the network as a node monitors it’s neighbours and merges with their zone if it leaves. Both Freenet and CAN have potential security issues. The main one is messaging routing. A malicious node (along the look up path) can direct the request in the wrong path, or worse forge data. Freenet addresses this issue somewhat with digital signatures, however the paper still identifies problems with dictionary attacks against keyword-signed keys. Security is not really discussed in CAN, and one could imagine making sure a node is valid before it is allowed to become a member. Gnutella is a protocol for decentralized peer-to-peer file sharing. To find a file, a node basically has to flood the network which is worse then both Freenet and CAN. Like Freenet, Gnutella has a loose fault tolerant model. If a node leaves the network, it is gone. Because of this, Gnutella can be ver unreliable. To improve the protocol, the author introduces the idea of rating users. From ks238@cornell.edu Tue Oct 29 11:56:56 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGuth21251 for ; Tue, 29 Oct 2002 11:56:55 -0500 (EST) Received: from ks238.cornell.edu (syr-24-24-18-11.twcny.rr.com [24.24.18.11]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id LAA07320 for ; Tue, 29 Oct 2002 11:56:53 -0500 (EST) Message-Id: <5.1.0.14.2.20021029115603.0209c9e0@postoffice2.mail.cornell.edu> X-Sender: ks238@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 29 Oct 2002 11:56:42 -0500 To: egs@CS.Cornell.EDU From: Karan Suri Subject: 615 Paper #60 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Freenet is a distributed information storage which aims at obviating the need for centralized indexes which make networks vulnerable to a single point of failure. In Freenet, each node is described by a location independent key and based on certain queries generated by a start node, each hop on the path to the goal node will conduct a local search to find the optimal next hop using their stored proxy chain. Each file, in Freenet, is described by a binary file key (public/private asymmetric key pair) which is generated by the description of the file which is stored by the user. Three kind so keys can be used. The first, known as keyword signed key uses a description of the file in order to generate an asymmetric public/private key pair. The public half can be hashed in order to create the file key and the private key will be used for integrity verification. Next, a signed subspace key (SSK) can be used if the user generates the public/private key pair and hashes the public key with a short description of the file being generated. This will solve the problems from having redundant file descriptions for different files. This key is useful when a user attempts to retrieve data since he has the public key and the description of the file, but doesn't have the private key to store data. The final type of key used is the content hash key which uses the content of the file in order to generate a file key that will be a unique hash value for the file. Files can be received if nodes are able to identify the binary file key of choice and request it from the local nodes. These local nodes will check to see if the requested data has been cached or else it will forward the file request to the nodes in its routing tables and will continue until the corresponding file is found. Files are stored using the hashed file keys as generated above, and the time to live value will decide at which node the file will be stored. Managing data techniques and adding nodes to the network are also discussed in detail. The simulation concludes that over time the median path length decreases substantially, the network is able to scale to 40,000 nodes and still show promising signs of convergence and the protocol is robust and adapts quickly to failures. A final note of security is added. First, the sender of data is anonymous since a node can not discern if the data is being sent or forwarded form a certain node. In addition, the paper proposes a buffer to DoS attacks through the use of Hash Cash which is an expensive "payment" which must be conducted prior to any kind of insertion. This would slow down the impact of the attack. The next paper focuses on Gnutella, a peer-to-peer file exchange system which employs a decentralized approach where nodes serve as both Servers and Clients in the network. Such a network design facilitates two key features: one, distributed processing and two, file sharing. The paper discusses the pros and cons of having a central index for the network (i.e. like Napster). Gnutella's communication happens on top of TCP/IP and communication is facilitated by the use of descriptors described in the paper (i.e. ping and pong). When a node wants to share a file, it will broadcast its existence to the network and will begin querying once it has been seen by all nodes. The TTL (time to live value) will place an upperbound on the time that a certain node that has just entered the network will be able to broadcast its presence to the rest of the network. Furthermore, nodes are given user registration and user ratings in order to allow other nodes to authenticate and assess how well certain nodes will be able to participate in the network. The simulation obviously points out the weakest areas of Gnutella. The network is not robust at all and its success is impeded by its inability to recover from random failures. This is one weakness that needs to be addressed. IN addition, the user registration and user ratings creates issues with respect to load balancing with one node with a high rating getting excessive pings in the network. Security, is not addressed as much as it was in the Freenet paper. Clearly, Gnutella is still in its beginning stages and much development must continue as noted in the paper. The paper presents CAN, a distributed infrastructure which uses a hash table like implementation to large networks. This hash table functionality is ideal for P2P file sharing systems. A key contribution of CAN is the use of dividing the network into zones, such that each node in the network is part of a zone and stores information about other adjacent nodes. Since the network is based on hash like functionalities (i.e. insertion, deletion, lookup) any time that a certain function is requested, the zones will forward the request to those zones that contain the specific key of choice. For insertion, an existing node will need to split its current zone in half and allocate an adequate amount of zone to the new node. For lookup, a node will forward the request to the zone that contains the appropriate key that is being searched for. Finally, when a node departs it will have to handover the zone area that it is responsible for to its adjacent neighboring nodes. Node's that are absent for long periods of time due to failure will have the network's takeover mechanism invoked as adjacent nodes take over the remaining network area. The paper also discusses the benefits of increasing the coordinate dimensions thereby reducing the path length and path latency since neighboring nodes will be closer and their respective zones will be as well. Another optimization is the use of realities, in which each node is designated a distinct coordinate space. In addition using RTT weighted routing aims at reducing the latency at each hop whereas originally they looked at only reducing path length. A final optimization they mention is the use of multiple hash tables which improves data availability. In their design, they look at various design parameters in order to assess the viability of the protocol. Their results indicate that their protocol is extremely scalable (i.e. 260,000 nodes) and its performance as far as latency reduction is concerned is impressive as well. The primary weaknesses, that the authors themselves mention pertains to security and specifically DoS attacks launched by malicious nodes. Overall, this seems to be a better protocol than that used in the previous two papers. From mtp22@cornell.edu Tue Oct 29 11:57:45 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGvih21325 for ; Tue, 29 Oct 2002 11:57:44 -0500 (EST) Received: from narnia (syr-24-58-57-15.twcny.rr.com [24.58.57.15]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id LAA10486 for ; Tue, 29 Oct 2002 11:57:44 -0500 (EST) Content-Type: text/plain; charset="iso-8859-1" From: Matt Piotrowski Reply-To: mtp22@cornell.edu To: egs@CS.Cornell.EDU Subject: 615 Paper 60 Date: Tue, 29 Oct 2002 12:58:28 -0400 X-Mailer: KMail [version 1.2] MIME-Version: 1.0 Message-Id: <02102911582806.00149@narnia> Content-Transfer-Encoding: 8bit Freenet and Gnutella differ from CAN in that CAN guarantees an answer to a query in a bounded amount of hops while Freenet and Gnutella do not. Freenet searches for cached copies of documents instead of having a particular server responsible for the data. It uses a forwarding mechanism when data is not found, and although this mechanism is thought to improve over time as the source moves closer to the requests, the scalability has not been proven. Gnutella uses flooding to find data. This flooding does not scale and may not find data in a network even when that data exists. Freenet's search for cached copies hurts lookup time guarantees--as mentioned above--but is a nice property for failure resistence; avoiding a single point of failure. CAN assumes that data is immutable. This increases the simplicity of CAN, but has the tradeoff that CAN must reinsert data if it changes. Another simplification of CAN is the way it views the network. It only considers logical distance in making decisions. However, the real network distance may be much more expensive. So this simplifying assumption again has a performance tradeoff. From vrg3@cornell.edu Tue Oct 29 11:58:59 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TGwwh22088 for ; Tue, 29 Oct 2002 11:58:58 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id LAA07217 for ; Tue, 29 Oct 2002 11:58:55 -0500 (EST) Date: Tue, 29 Oct 2002 11:58:55 -0500 (EST) From: vrg3@cornell.edu X-Sender: vrg3@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 60 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Freenet is a distributed file-sharing system designed with decentralization, anonymity, and DoS resistance in mind. Requests for files propagate as unicasts, with prescribed maximum hop counts. Each node which receives a request can either answer the request itself (if it has the file requested) or forward it on to the neighbor who seems most likely to have that data (with the assumption that a node will tend to have many files with similar descriptions). The results of searches are then cached to improve subsequent lookup times. The number of hops specified by the querier determines how long the lookup takes, and whether or not the results can be found. Freenet's failure resilience emerges from the caching of query responses. After the first request done for a file, that file exists on multiple nodes. The only issue with this is that nodes with new data to insert into the distributed filesystem cannot spread it to other nodes unsolicitedly. Files are represented using descriptions as filenames. The signed-subspace key (SSK) file key type allows each user to generate a public/private key pair for his or her own namespace. That user is then responsible for the files offered in that namespace, and files are signed using both the filename and the namespace key. This provides a good degree of tamper-resistance; as long as a namespace is trusted, the data can be trusted. Overall, Freenet is a fairly simple system. The asymmetric encryption techniques used are the most complex part of the system. Given the anonymity, failure resilience, and tamper resistance, the specifications are not overly complex. Gnutella is a decentralized file sharing system meant to only use peer-to-peer communication. Searches are flooded through the network with a certain TTL. This makes the lookup time unpredictably worse than Freenet's. There is no inbuilt scheme to achieve failure resilience; the only chance is that more than one node might have the data you're searching for, so if one dies you can try another. Similarly, there is nothing to prevent tampering in a Gnutella network. Anybody could mislabel any file, intentionally or by mistake, and respond to queries with it. All these problems are the result of a very simple design. Gnutella could not practically be any simpler than it is. The tradeoff probably isn't quite worth it. CAN is not a distributed file-sharing system like Gnutella or Freenet; rather, it is a generalized system for naming content in a distributed network, meant to be used to improve existing peer-to-peer file-sharing systems. CAN is much more complex than Freenet or Gnutella. The layout of the network is that of a d-dimensional torus. The surface of the torus represents the space of data in the network. Each node has responsibility for a certain zone on that surface, and has links to its d neighbors. A query is then sent on as direct a path as possible to the location where the data would be. Lookup time should be quite good using this technique, as it is an improvement upon the Freenet system because there is a clear "correct" direction for the query to propagate. Failure resilience is achieved using a "multiple reality" approach; there are a certain number of different mappings from node space to data space, all of which are used at any given time. This way, any piece of data would be at many different places at any given time, one for each reality. The likelihood of the data disappearing in all realities simultaneously is slim. CAN, like Gnutella, does not have a tamper-resistance scheme. Given the high level of complexity already, it seems like it wouldn't be a problem to try to implement some kind of signing method like Freenet uses, especially since each node already has its own "zone" that could correspond to namespaces. From smw17@cornell.edu Tue Oct 29 12:01:45 2002 Received: from cornell.edu (cornell.edu [132.236.56.6]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TH1jh22268 for ; Tue, 29 Oct 2002 12:01:45 -0500 (EST) Received: from cornell.edu ([128.84.84.84]) by cornell.edu (8.9.3/8.9.3) with ESMTP id MAA02065 for ; Tue, 29 Oct 2002 12:01:45 -0500 (EST) Message-ID: <3DBEBF3A.709@cornell.edu> Date: Tue, 29 Oct 2002 12:02:50 -0500 From: Sean Welch User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.1) Gecko/20020823 Netscape/7.0 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Emin Gun Sirer Subject: 615 PAPER 60 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Gnutella Gnutella is in many ways the simplest of the P2P schemes presented, in part because it was not quite complete before its release. Gnutella's operation is a basic limited broadcast mechanism, where a node specifies a search and a Time To Live (TTL). This message is then broadcast through a portion of the overall network. Search hits (or Pong responses) are returned over the same path they travelled from the source. Gnutella is by far the simplest of the schemes presented for this class. Unfortunately, the price for this simplicity is relatively poor behavior in almost any other scope. Gnutella is not secure, as not only can anyone simply look at your queries to determine your IP address, there is also no protection against forwarding nodes logging packets, modifying packets, or cluttering the network with faked files. Gnutella provides no means for Quality of Service routing, and lacks (in the common implementation) a quality measure to select a good download source. Performance in theory can be good, as a flooded broadcast should find every copy of the data reachable by the host node, but the broadcast also comsumes significant network bandwidth. This overhead results in the simple, but true observation that Gnutella does not scale well. Freenet Freenet is a P2P network with significantly different aims than a generic P2P network such as Gnutella. The main goal of Freenet is to provide a P2P network capable of maintaining user privacy and enhancing information availability. To accomplish this goal while still maintaining a useful network, the authors created a system where nodes have only local knowledge of their surroundings. In order to prevent key-squatting or intentional pollution of popular keyspaces by malicious users, each end key is composed of a hash of the XOR of both a data hash and a private key. While this does not prevent key-squatting of popular names, it does allow a user to specify a known data originator and ignore the malicious nodes. To actually retrieve the data, a steepest ascent algorithm with backtracking is used in conjunction with a TTL. Searches continue until the desired key value is located, at which time it is returned to the requestor. During the request, nodes will randomly re-designate themselves as the requestor, and upon retreival, nodes will randomly re-designate as data sources. In addition, the data is replicated throughout the sucessful routing path. This helps to both obscure the source/destination of the data and to improve availability of data based on request frequency. Upon expiration of the TTL, the message is not discarded, but instead is forwarded with some finite probability. Freenet delivers a number of features that offer substantial improvements in tamper resistance, user security, and reslience against malicious nodes. It is not a particularly simple system to implement, as the wide array of security mechanisms require significant overhead to maintain. Lookup time in Freenet becomes heavily dependent on the caching mechanisms described. If the nodes are persistant enough to allow the network to settle, the lookup time runs at a reasonable 10-20 hops for a 1000 node system. While this is significantly longer than CAN (for reasonable d), the Freenet system was not designed for high performance. Overall system performance in Freenet has been sacrificed to improve the overall security of the system. CAN The CAN work presents a P2P routing protocol obtained by mapping the P2P keyspace onto a d-torus. Each node is considered to have exactly d nearest neighbors in the d-space, and owns a portion of the total keyspace. New nodes are accomodated by partitioning a pre-existing keyspace into equal portions, granting one of those partitions to the new node and informing neighbors to update routing information. Nodes route by forwarding to neighboring nodes closer to the desired keyspace destination than themself. This gives both a reasonable performance of O(n^1/d) in hopcount, but gives nodes multiple potential paths in many cases, allowing for extension to some basic load-balancing mechanism if desired. Extensions of the basic protocol to include multiple realities (multiple replicated coordinate spaces) and overloaded coordinate zones are presented to address the issues of load balancing and reliability. CAN is an attempt to improve the performance of P2P networks. It does an admirable job, reducing the average routing length to O(n^1/d) unicast hops. CAN also divides up the keyspace in a deterministic fashion, allowing for relatively simple route determination and eliminating the need for node broadcasts. Basic CAN has some issues with suboptimal keystream volume distributions, but these can be ameliorated reasonably well through either zone overloading or multiple reality implementations. CAN is definitely not simple, as it seeks to build structure out of the network instead of working with the network as is (as per Gnutella). The basic implementation is somewhat vulnerable to failures, as a failed partition will result in a loss of the data from that partition until the owners refresh the said data. There also exists the potential for redirection attacks, where regardless of the desired key, the node is forwarded to a desired destination (requiring authentication at some higher level as well, and potentially vulnerable to tampering with chosen popular keys for which a hash is known). Both of these problems can be lessened somewhat through overloading zones and permitting multiple realities, but the fact remains that intentional malicious nodes may have a significant detrimental effect on overall network performance. From adam@graphics.cornell.edu Tue Oct 29 12:05:25 2002 Received: from bach.graphics.cornell.edu (bach.graphics.cornell.edu [128.84.247.50]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TH5Ph23093 for ; Tue, 29 Oct 2002 12:05:25 -0500 (EST) Received: from envy.graphics.cornell.edu (envy.graphics.cornell.edu [128.84.247.206]) by bach.graphics.cornell.edu (8.12.1/8.12.1) with ESMTP id g9TH5J0k044617 for ; Tue, 29 Oct 2002 12:05:19 -0500 (EST) Date: Tue, 29 Oct 2002 12:04:57 -0500 (EST) From: Adam Kravetz To: egs@CS.Cornell.EDU Subject: 615 Paper 60 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Freenet: A collection of nodes that work as peers, which attempts to provide anonymity for provider, receiver and storage of data. Basically information is passed around via public text keys which can be hashed to provide a location, the location is searched for using a "smartish" search scheme (look at entries close to this one, and you might find it if that fails look at everything). It is of great importance to realize that each system has a certain amount of dedicated storage in freenet. This anonymous file space is simply the cache of a file on the network, the original file is not affected by LRU algs in these situations. The problem w/ this is that even though the original data is around if the LRU has removed it in all instances (LRU run indepenedently on each machine) then the data is removed and subsequent searhes will not be able to the data even if the original is still available (and the author chooses to provide it). Freenet scales relatively well supporting possibly in the 10's of thousands of nodes pretty well. Fault tolerance is a slight issue however, I would like to see better performance allowing more than 30% of the nodes to fail. The situation in napster in which a user gets on the network, looks for one thing and then gets off comes to mind. Although in this network we want to encourage users to stay on for long periods of time. I would further be interested in what sort of "restart" happens when there is a temporary break in connectivity (say moving just out of range of an AP for 1 minute). Freenet could possibly be coupled with other secure communications models to ensure higher levels of anonymity (SSH to ensure that eavesdroppers are excluded from data transmissions). Further I would be interested in looking at some sort of Hierarchical or overlapping networks (some nodes in both freenet A and freenet B) to increase scalability and possibly stability. Further a "smarter" caching metric could be used to ensure that a situation like this doesn't happen. There are 100 computers each with room for 2 files in their system, each has 1 unique file to begin in it's cache. 1 computer has a new file which everyone (or nearly everyone) wants ensuring that everynode caches it. Another computer has a file which everyone wants again, 2 files both replicated 100 times on all the nodes, fast access to popular files, no access (LRU has removed all the other 98 files on every system) to files that might be useful later. Every machine doesn't need a copy of the file, just a "close neighbor" I think it would be interesting to find some sort of randomized metric to ensure that someone, but not everyone caches the files. Gnutella: A popular P2P level 2 (relatively anonymous, yet not completely so since we have a known path to the file, the user knows what files are on its system as well) sharing protocol. The immediate problems with Gnutella have been reliability and scalability. Limewire and Morpheus have tried to ensure reliability by matching multiple streams w/ the same file attributes (name, size) and trying to continue transmissions from a new stream if one fails. The broadcast search is a huge issues however, it replicates a search over potentially the whole network and provides no way of kiling it off. Further there is no QoS of files in the network, no comparision of pathlength (which we at least can bound), coherence of name to actual file (dummy data inserted under popular file name) etc. The network can be extremely finnicky and not resistent to flooding, but however can be very resitent to overall failure. There is no caching of data so popularity of file is the only assurance that a file is replicated across the network. I think the user rating system is a good idea, but might not work in this case. I am not sure that the EBAY system works either, if you make a transaction you almost always get a "Great, customer, fast response, fast pay A+++++" as a comment. Negative feedback obviously can hurt, but if there are only two ends of the spectrum (really good or really bad) than it will simply segment users into "can get data" and "cannot get data". CAN: Scalability, an issue that plagues current system is what this paper tries to address. The single point of failure and non-distributed expenses of a centralized network provide plenty of motivation of CAN. Further motivations for CAN include building a wide-area name scheme which unlike DNS provies name resolution disjoint from naming scheme. CAN is a compicated d-dimensional space in which each system lies, a system which took a few looks at it to understand. CAN seems to have high fault tolerance but that might just be in the design of the tests. Can doesn't address security of the network in means of tampering etc but other algs and schemes could be implemented on top of this to ensure these things. From pj39@CS.Cornell.EDU Tue Oct 29 12:14:17 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9THEHh24785 for ; Tue, 29 Oct 2002 12:14:17 -0500 (EST) Received: from pj39.cornell.edu (syr-24-59-67-50.twcny.rr.com [24.59.67.50]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA17608 for ; Tue, 29 Oct 2002 12:14:16 -0500 (EST) Message-Id: <5.1.0.14.2.20021029115725.01fec110@postoffice2.mail.cornell.edu> X-Sender: pj39@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 29 Oct 2002 12:14:11 -0500 To: egs@CS.Cornell.EDU From: Piyoosh Jalan Subject: 615 PAPER 60 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Freenet: A Distributed Anonymous Information Storage and Retrieval System Improving Gnutella Protocol: Protocol Analysis And Research Proposals A Scalable Content-Addressable Network First I would give an overview of all the three papers and I would touch on the points (lookup, failure resilience, simplicity, tamper-resistance) we are asked at the end. The Freenet paper addresses some of the basic problems with traditional file storage systems. In traditional file storage systems files are stored in a few hosts which are thus central point of failure and are also thus susceptible to DOS attacks. Since files are not location independent, there is little user privacy and hence the nodes that provide storage cannot conceal its identity. Since the location of the file is known to public, malicious third party can alter the stored information. Freenet solves all the above problems. It provides anonymity for both producers and consumers of information, deniability for storer's of information, resistance to attempts by third parties, efficient dynamic storage and routing of information and decentralization of network functions. Some salient features of Freenet that make's all the above possible are - Each lookup and insert has a hops-to-live counter and a transaction ID. - Data is replicated over every node along the path of request. - Hops-to-live is incremented and decremented with random probability so that the original sender of the request can be obscured. - The request key and the file content is encrypted as a hash. Therefore the nodes storing someone else's files cannot access the stored information. - LRU algorithm is used to replace files at nodes that has exceeded storage capacity. Therefore files are not guaranteed to live forever and frequent requests are necessary to maintain files over the network. Gnutella is another protocol that provides decentralized file-sharing capabilities to its users. This paper proposes strategies for the improvement of the protocol. The authors identifies two schemes that would increase the performance of the protocol namely user registration and user rating. Among the issues with the Gnutella network are download failures, scalability, fragmented development, encouragement of file sharing by users, reducing unnecessary traffic and addressing security concern. The authors proposes that establishing the user registration and user rating system would solve all the above problems. During registration user's along with its username, password and provides some other information such as its geographical information. This helps in serving the clients from its nearest located server thus solving many issues inherent with Gnutella like download failures, improper bandwidth etc. Having the user rating and contact information allows users to contact other users and confirm about the file content and rating of the user. Certain privileges based on rating like only higher privileged users are allowed to download certain files when there is contention between higher rated and lower rated users. This will encourage users to share their files to improve their rating and hence solving many other issues with Gnutella like download failure where people list files for download but does not provide sufficient bandwidth for upload. The next paper a scalable content addressable network (CAN) describes a distributed hash table like functionality on Internet like scales so that files are content addressable. It considers a d-dimensional Cartesian coordinate space as d-torus. This coordinate space is dynamically partitioned among all the nodes in the system with every node owning a distinct zone. The key, value pairs map to a point P in the space using a uniform hash function. The value corresponding the key is retrieved from the node that owns the zone corresponding to point P. A new node joining the CAN results in zone splitting with the other half of the zone being assigned to the new node. CAN thus addresses two key problems of Content Addressable Networks i.e. scalable routing and indexing. Now coming to how these distributed hash tables differ in terms of lookup time, failure resilience, simplicity, tamper-resistance. The three papers discussed today differ a lot in terms of lookup time, failure resilience, simplicity and tamper-resistance. In Freenet ( and design enhanced CAN with caching and replication) lookup time is a lot dependent on the pattern of data lookups requested. As the data is cached at each nodes in the path if the same file is requested at intermediate nodes in the path then the lookup time can be substantially faster than that in Gnutella. Overall CAN seems to be better protocol for superior lookup time through its various design improvements such as overloading coordinate zones, using multiple hash functions, more uniform partitioning and topologically sensitive construction of the CAN overlay network. The paper also talks about caching and replication technique which would be very similar to the one in Freenet or one applied in web using proxy server and caching in the web browser. With these enhancements CAN would be definitely the most superior protocol in terms of lookup time. In Freenet since the data is replicated over nodes along the path of request it is more resistant to failures as compared to Gnutella and CAN as if some nodes fail the data can be served from other nodes where it is replicated. Gnutella focuses more on user registration and user rating to improve the chances of locating the data at a closer geographically located node and encourage users to share data and provide proper bandwidth for sharing. Thus Gnutella is not much failure resistant as the protocol itself doesn't try to replicate the data among nodes and it is upon the users to share it or not. Moreover this paper on Gnutella identifies the failure resilience as one of the key problems with Gnutella. CAN similarly too doesn't has any mechanism that makes it resistant to failures. In the CAN paper the design improvements suggested makes it highly resilient to failures as each node has a large number of neighbors it is highly unlikely that all the nodes would die. Using multiple hash functions, say k different would replicate the data to k nodes and hence making it failure resistant. All the three schemes I think provide tamper-resistance of the data at the nodes. Freenet focuses on anonymity and distributed file storage which is transparent to the users hence it is to some extent tamper-resistant as the users does not explicitly know where the data is located, so cant tamper with it. In both Freenet and CAN since the request key and the file content is encrypted as a hash so the nodes storing someone else's files cannot access the stored information and hence cannot tamper with it. In Gnutella too I believe the data at the nodes are for read only purpose and the user accessing the data are generally not permitted to modify it. As far as simplicity is concerned Freenet seems to be most simple to implement followed by Gnutella and CAN. From ag75@cornell.edu Tue Oct 29 12:17:57 2002 Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9THHvh25247 for ; Tue, 29 Oct 2002 12:17:57 -0500 (EST) Received: from travelers.mail.cornell.edu (travelers.mail.cornell.edu [132.236.56.13]) by travelers.mail.cornell.edu (8.9.3/8.9.3) with SMTP id MAA21353 for ; Tue, 29 Oct 2002 12:17:54 -0500 (EST) Date: Tue, 29 Oct 2002 12:17:54 -0500 (EST) From: ag75@cornell.edu X-Sender: ag75@travelers.mail.cornell.edu To: egs@CS.Cornell.EDU Subject: 615 PAPER 60 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII This week we are presented with 3 papers in which the systems make use of distributed hash tables. The systems differ in their goals and in their performance characteristics. Freenet is a P2P system that concentrates on keeping the identities of producers and consumers of information secret. This is done by referring to files in a location-independent manner and replicating them in different locations, thus making it very difficult to discover the true origin or destination of a file and to know what is stored on a node's hard drive. The lookup time for files can vary a lot, since the user decides on the hops-to-live limit. Also, the user is not guaranteed that the system will find a file, even if it's located somewhere on the network. Finally, users that access the file frequently will benefit from caching. Freenet should be pretty resilient against failure as shown in the experiments. However, some of the data may be lost because there is no permanent storage and the data is kept as an LRU cache. Freenet is fairly complex because of everything that needs to be done to keep users anonymous. Lastly, the data on Freenet is tamper-resistant under content-hash keys and signed-subspace keys. The system is also resilient against a number of DoS attacks. Gnutella is a P2P system that concentrates on providing decentralized file-sharing capabilities to its users. The idea behind Gnutella is to prevent the system from being brought down by the failure of the single centralized entry point into the network, such as what happened to Napster under legal pressure. The lookup time for Gnutella is probably slower than that of the other two systems because it needs to do a network flood to find files. Additionally, it does not make use of caching to improve performance. Like Freenet, Gnutella makes the user select time-to-live for the packets, so the performance also depends on this value. Gnutella is simpler to implement than the other two systems; however, it is not very resilient to either failure or tampering with data. CAN is a distributed infrastructure that provides hashtable-like functionality on a large scale. The idea behind CAN is to make it scalable, fault-taulerant and completely self-organizing. The lookup time on CAN is quite fast thanks to its multidimensional and multireality structure. Its structure also makes it very fault-taulerant and scalable. On the other hand, CAN would be quite complex to implement and it has no protection against data tampering. From liuhz@CS.Cornell.EDU Tue Oct 29 12:18:49 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9THInh25637 for ; Tue, 29 Oct 2002 12:18:49 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 60 Date: Tue, 29 Oct 2002 12:18:49 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302CEE6AD@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 60 Thread-Index: AcJ/bz7R3cQ+YmwkRuWFsE6bAA+1WQ== From: "Hongzhou Liu" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by sundial.cs.cornell.edu id g9THInh25637 These 3 papers introduce Freenet, Gnutella and contention-addressable network(CAN) respectively. Freenet is a distributed anonymous information storage and retrival system. In freenet, a file is identified by a location-indepent key obtained by applying a hash function. Each file is signed by a private key and encrypted using some descriptive string. Thus it will be very difficult for an attacker to modify files during transmission, unless he knows how to forge the cryptographic signature, which turns out to be feasible since the private key is not publicized. Each node in freenet maintains a routing table that maps keys to remote modes. When a node is requesting a key, the request is forwarded to the node that corresponds to the most similar key in the routing table. The process is repeated until a node that owns this key is reached. Then the requested file is sent back and duplicated along the reverse path. In this way, we copy files to locations near the requestor dynamically and reduce the mean path length accordingly. Node storage is managed by LRU algorithm, thus out-of-date data can be removed automatically. An important goal for Freenet is protecting the anonymity of requestors, inserters, and storer of files. During transmission, a node can unilaterally modify the source node ID of a packet. To reduce the information that an attacker can obtain from the hops-to-live field of the packet, packets do not automatically terminate after hops-to-live reaches 1 but are forwarded on with finite probability. Thus it's infeasible to discover the true origin or destination of a file being transmitted over the network. Freenet is scalable and fault-tolerant due to the small-world model properties. The Gnutella paper analyzes some disadvantages of Gnutella protocol and proposes some solutions to them. Two serious problem of Gnutella are downloading failures and unwillingness of users to share their files through the network. The author suggests user registration as well as user rating schemes to solve these problem. By user rating scheme, users can choose to download files with higher rating thus get more chances to achieve complete files successfully. For users that offer resources with high rating, the protocol will give them bonus, like download priority, more bandwidth etc. to encourage more users to provide high quality service. Gnutella protocol relies on broadcast to discover resources, which limits its scalabilty while improves the lookup time, because we can always find the optimal path by broadcast given enough high TTL. The fundmental goal of CAN is to provide a hash table that maps "keys" onto "values" in a totally distributed way. The design of CAN is based on a virtual d-dimensional Cartesian coordinate space. The entire virtual space is parititioned dynamically among all the nodes in the system such that every node "owns" its individual, distinct zone within the overall space. All the keys are mapped onto points in the coordinate space using a uniform hash function. The (key,value) pair is then stored at the node that owns the zone within which the point corresponding to the key lies. The routing protocol in CAN is also very straightforward. First calculate the point corresponding to the requested key, and then the request is forwarded following the straignt line path through the coordinate space from source to desination coordinate. At each step, the node choose the neighbor that's closest to the destination to forward request to. The paper introduces many scheming that can improve the performance of CAN, as well as the basic architecture of CAN, like multi-dimensioned spaces, multi-realities, better CAN routing metrics, and etc. The most outstanding charecteristic of CAN is its scalability. For a CAN with 260,000 nodes, we can route with a latency that is less than twice the IP path latency. Multi realities, overloading coordinate zones and other schemes also make it failure resilient. However, CAN is not secure at all. Especially, it is vunerable to DOS attack. A malicious node can modify and redirect packets going on in any way. The future work of CAN will focus on how to improve the security of this protocol. From sc329@cornell.edu Tue Oct 29 12:37:02 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9THb2h28711 for ; Tue, 29 Oct 2002 12:37:02 -0500 (EST) Received: from sangeeth.cornell.edu (syr-24-58-36-135.twcny.rr.com [24.58.36.135]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with ESMTP id MAA19371 for ; Tue, 29 Oct 2002 12:36:59 -0500 (EST) Message-Id: <5.1.0.14.2.20021029123305.028af978@postoffice2.mail.cornell.edu> X-Sender: sc329@postoffice2.mail.cornell.edu (Unverified) X-Mailer: QUALCOMM Windows Eudora Version 5.1 Date: Tue, 29 Oct 2002 12:37:03 -0500 To: egs@CS.Cornell.EDU From: Sangeeth Chandrakumar Subject: 615 PAPER 60 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Submitted by Sangeeth Chandrakumar Freenet is an adaptive peer-to-peer network application that permits the publication, replication and retrieval of data while protecting the anonymity of both authors and senders. All the network functions are decentralized and it offers dynamic storage and routing of information and it offers resistence to attempts by third parties to deny access to information. No permanent file storage is guarenteed as it is purges out-dated files which have not ben requested for a long time. There is no central file repository, the files are cached at nodes which are closer to the requesting nodes. Each node maintains its own datastore of data files, which are named by location independent keys, and also maintains a dynamic routing table containing address of other nodes and the keys that they are thought to hold. The routing decisions are done on the basis of local, rather than global, knowledge. Files in Freenet are identified by a binary file key. The file keys are generated by one of the schemes: keyword-signed, signed subspace or content hash key. Content-hash keys are most useful in conjunction with signed-subspace keys used as an indirection mechanism. Data is retrieved by sending out requests for a key. A node which receives this request can either reply with the data or forward it to other nodes as long the TTL is satisfied. Once the data is found, it is returned via the same path is also cached at all the intermediate nodes. When inserting a new file, a proposed key is send out to all the neighbors within a certain hop count. If there is no collission, the file would be inserted with the given name. Since the cache has a finite storage capacity, efficient replacement algorithms are used to manage the data. Once all the nodes are decided to drop a particular file, it will no longer be available to the network. The simulation results show that the network converges as the nodes discover more nodes over time. Over time, the request path length has been shown to drop to a value of just six hop counts. Also, the system showed good resiliency even with 3o % nodes removed randomly. Key and sender anonymity can be acheived by adding mix-style pre-routing of messages. Tampering of existing files by inserting alternate versions are not possible against a signed-subspace or content-hash keys, as it requires finding a hash collision. Gnutella is a decentralized file-sharing protocol of the peer-to-peer type. Gnutella was suggested as an improvement over napster, which used a centralized server for query resolution. A centralized scheme like this suffers from a single point of failure. Today a number of gnutella client applications based on gnutella protocol has ben developed. Every gnutella client acts as a server as well as a client. All gnutella communication happens on top of tcp/ip protocol. IP addresses are used as part of gnutella descriptors for the node identification. A node initially gets connected to the network by initiating connections with a few known peers. Once a connection is established, the nodes communicates among themselves by exchanging any of these descriptors - ping, pong, query, queryHit and Push. Ping and query descriptors are broadcast to all the neighbors and the servent that is recognized as the target of a particular descriptor will not forward that descriptor further. One of the mains drawbacks attributed to gnutella protocol has been its scalability factor and its download failures. Also since it exposes IP address, security is a problem with this protocol. Decentralized nature helps to achieve fault tolerance. From yao@CS.Cornell.EDU Tue Oct 29 13:02:55 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TI2th04015 for ; Tue, 29 Oct 2002 13:02:55 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615 PAPER 60 Date: Tue, 29 Oct 2002 13:02:55 -0500 Message-ID: <706871B20764CD449DB0E8E3D81C4D4302479821@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615 PAPER 60 Thread-Index: AcJ/dWgG8T692HENTbqQSspjyWE3/w== From: "Yong Yao" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id g9TI2th04015 For freenet, there is a key for each file stored in the network. The paper discusses three different key generation scheme, keyword-signed key, signed-subspace key and content-hash key. One general property of all three schema is that there is no direct connection between the distance of the nodes who hold the files to the closeness of these files' keys. The hash table saves the next node's address who has the highest probability to know where the exact file of the key is. It is very possilbe that a node with the entry to a particular key will receive queries about the location of other files with similar keys. So it is going to accumulate knowledge about those files. Such a cluster consists of files distributed in the network, and makes the whole system more reliable to node failures. Freenet makes up a lot of copy of a file after a successful lookup operation. One problem is how to keep them consistent if updates of the original file are frequent. Gnutella is a very simple protocol, compared to other peer to peer file systems. Although it is a decentralized system, it requires flooding in the network to look up a file given its name. To reduce the network workload, each request has a lift time, the maximum hops of the message. A negative consequence is that search may fails even if there are some files available in the network, which are far away from the source. A file will not get replicated or cached at other places, so the lookup time is high and the reliability is low. Its hash table is very similar to the routing table, if we replace the file name by the node address. The CAN paper proposes how to construct a content-addressable index on peer to peer networks. It has a virtual d-dimension space, and divides it into small regions and assign echo node a piece. A node maintain its neighbors in the virtual space. CAN will hash each file to a location in the viral space, and ship the file to the node who hold the region covering the location. However, there is no guarantee between the distance of two neighbors. In the worst case, it can be the diameter of the whole network. The cost to maintain such a strucutre is also expensive if neighbor are far away to each other. Yong Yao From xz56@cornell.edu Tue Oct 29 13:12:55 2002 Received: from postoffice2.mail.cornell.edu (postoffice2.mail.cornell.edu [132.236.56.10]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TICsh05819 for ; Tue, 29 Oct 2002 13:12:55 -0500 (EST) Received: from Xin (dhcp-128-84-93-92-wifi.ece.cornell.edu [128.84.93.92]) by postoffice2.mail.cornell.edu (8.9.3/8.9.3) with SMTP id NAA12044 for ; Tue, 29 Oct 2002 13:12:48 -0500 (EST) Message-ID: <00c201c27f76$bac50de0$5c5d5480@Xin> From: "Xin Zhang" To: "Emin Gun Sirer" Subject: 615 PAPER 60 Date: Tue, 29 Oct 2002 13:12:22 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2720.3000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 FreeNet is a distributed model of peer-to-peer network. Each node maintains its own local datastore and dynamic routing table. It allows publication, replication and retrieval of data. What is specific to it is that it protects the anonymity of both authors and readers. It achieves this mainly by using 3 times of hashing to get 3 keys corresponding to a specific file. First, KSK is derived from hashing the public part of the public/private key pair which is obtained deterministically from the description of the file; Second key, SSK is got from hashing the XOR of the independent hashing result from public key and the description; Then, hashing the contents of the file yields the CHK. The way to retrieve data is much like DFS. So the lookup-time may increase quadratic. But it should have good failure resilience. The Gnutella protocol improved over the traditional centralized P2P model by using a distributed strategy. The improved Gnutella protocol further introduced the idea of rating into file sharing. Since it's using broadcast to retrieve the data, the lookup-time should be shorter then the FreeNet, and also good failure resilience, but it may be vulnerable to the tempering. CAN's effort was to improve the scalability of a peer-to-peer file distribution system. It maps the file (maybe described by (key, value) pairs) to a point in a d-D Cartesian coordinate space using a uniform hashing function. Then it dynamically partitions the entire space in to zones. Each node stores an entire piece of zone. New nodes can join the system by finding a node in the CAN, splitting the corresponding zone and updating the routing info. Thus, CAN is self-orgnizable. When routing, it finds in the coordinate space the shortest path to the data. The lookup-time can be short, but it's vulnerable to both failure and tempering. From ashieh@CS.Cornell.EDU Tue Oct 29 13:39:36 2002 Received: from zinger.cs.cornell.edu (zinger.cs.cornell.edu [128.84.96.55]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TIdZh11625 for ; Tue, 29 Oct 2002 13:39:36 -0500 (EST) Received: from localhost (ashieh@localhost) by zinger.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g9TIdZc07682 for ; Tue, 29 Oct 2002 13:39:35 -0500 (EST) Date: Tue, 29 Oct 2002 13:39:35 -0500 (EST) From: Alan Shieh To: Subject: 615 PAPER 60 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII ** Lookup time - Gnutella Files are not relocated, keys are not guaranteed to be cached at closer nodes. No theoretical performance guarantees. - Freenet Like-valued keys and files tend to cluster together on nodes or groups of nodes, so repeated searches by clients causes data with close keys (not necessarily related data) to gather together; also moves files around the network for closer local copies. No theoretical performance guarantees. Failure resistence - Gnutella does not hash distribution of files - content-based; hence, there may be a high correlation in the position of related files, allowing those nodes to be selectedly targeted for a DoS. - Freenet Enforced replication during searches Once the correct hash key for a desired file is found, it is difficult for an adversary to purge the network of those files, or to propagate an incorrect copy Related files may be inserted into vastly different parts of the network, due to the fact that locality is defined in terms of the hash value. ** Simplicity - Gnutella HTTP-based protocol, simple flooding algorithms for query. Loose correctness requirements, since messages are sent en masse to all neighbors. - Freenet Designed-in cryptography, anonymity. Correctness is more stringent due to depth-first propagation of queries. ** Tamper-resistance - Gnutella Node may join the network at any point Arbitrary query responses can be generated, since there is a very loose affiliation between keys and files - Freenet Node does not choose its initial peers. File store is signed, encrypted. Attempts to provide privacy Freenet The Freenet paper contributes a number of ideas for providing a secure, peer-to-peer distributed file system. Freenet uses hashes to locate and position files within the overlay network, and to direct queries. Hashing may be performed on a keyword description of the file (keyword-signed key); this is vulnerable to collisions, spamming and dictionary attacks since the space of keys is small. Signed-subspace keys combine the keyword description with a particular subspace to which only a subset of users may publish. Content-hash keys based on the contents of files are used to maintain concurrent versions of files within the network, and to support file splitting. The query routing and reply algorithm in Freenet is structured to improve the performance of subsequent requests. Each node stores a routing table mapping known hash values to nodes. Searches are dispatched steepest ascent (closest hash value to the one being queried for), depth first (wait for a queried node to reply before proceeding). All nodes on the return path of a query or file transfer will cache the results of that query and file transfer in both its routing table and file store. The query propagation pattern is designed to cause closely-related data (in the hash value space) to cluster together, and hence once a single query for an item in a cluster is successful, subsequent queries are more likely to return faster/succeed. Although the choice of query order does not provide any theoretical guarantees, measurements in the paper showed that average hopcount within the network tended to decrease, and hence this design appears to converge to a favorable distribution of hash keys within the network. Freenet provides a number of security features. All files and traffic are encrypted, with jitter introduced to routing fields to discourage traffic analysis. When a node enters the network, its initial set of peers is determined in a distributed manner, thus preventing nodes from infiltrating a specific key range. It is argued empirically that Freenet is a small-world network, namely the number of nodes with a particular degree follows a power law, and hence there are a few nodes with high degree, and many nodes with low degree. As the high-degree nodes are more important for maintaining connectivity, and are far fewer in number, a random denial of service attack is unlikely to find these nodes directly. ** Shortcomings The replication scheme is not particularly efficient, since files is unconditionally propagated to all agents on the query response path. This design decisions was not evaluated to consider whether any bandwidth was actually saved. It seems plausible for an attacker to find the small number of high-connectivity nodes through directed querying, and then launch a selective denial of service attack to cause query performance to degenerate. ** Future work - A key chain (or tree) mechanism could be introduced to track the relationship between different versions of files on the network. Gnutella This paper contributes a description of the Gnutella protocol, its shortcomings, and possible directions for future research. In Gnutella, nodes choose an existing node in the network as a peer. A ping message is unconditionally flooded through the network, up to a preset depth; this establishes the initial network connectivity. Queries are flooded through the network in a similar manner. Replies (Pongs and QuerieHits) are unicast back towards the originating node, along the reverse of the original broadcast tree, possibly with aggregation at intervening nodes. The paper mentions several shortcomings with Gnutella, thus establishing a set of evaluation axes against which to design a P2P network. These are reliability of file transfers, scalability, the need for economic or other incentives to promote fair load distribution, query control, and topology management. The paper proposes collecting user information out of band in registration servers to provide better QoS guarantees. User information would be tracked at registration servers, and then used to track user karma (used for load balancing, enforcing fairness guarantees), or to direct a query control mechanism. ** Shortcomings The proposed user management scheme requires some degree of centralization, which inherently adds centralized points of failures to the network. ** Future work - Implement a method for authenticated distributed karma tracking. - Evaluate methods for patching Gnutella with structured key distribution. From linga@CS.Cornell.EDU Tue Oct 29 14:53:03 2002 Received: from snoball.cs.cornell.edu (snoball.cs.cornell.edu [128.84.96.54]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TJr3h26015 for ; Tue, 29 Oct 2002 14:53:03 -0500 (EST) Received: from localhost (linga@localhost) by snoball.cs.cornell.edu (8.11.3/8.11.3/C-3.2) with ESMTP id g9TJr3500360 for ; Tue, 29 Oct 2002 14:53:03 -0500 (EST) Date: Tue, 29 Oct 2002 14:53:02 -0500 (EST) From: Prakash Linga To: Emin Gun Sirer Subject: 615 PAPER 60 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII FREENET ------- This is a peer-to-peer application that supports insert, delete and lookup operations for files (data) while protecting the anonymity of both authors and readers. Files are identified by file keys. Basically file keys are obtained by hashing file names. Keyword-signed key (KSK), signed-subspace key (SSK) and content-hash key (CHK) are three types of file keys discussed in the paper. In KSK a descriptor text is used as input to generate a public/private key pair; the public half is hashed to get the file key. Each node makes a local decision about where to forward a request to (completely distributed). Each node has knowledge of only immediate upstream and downstream neighbors in the proxy chain to maintain privacy. Nodes closer together store files with closer file keys. So when a node gets a request for a file key it checks if it knows where this file key is stored. If not it will forward the request to a neighbor which holds file keys closest to this file key. This forwarding continues until the file is found or HTL (hops-to-live) becomes one (forwarding continues with some probability to prevent attackers to get info from HTL). This mechanism can be used to insert/search a file assuming you can obtain/calculate the file key. Nodes enroute can also cache the information about where the file is stored. LRU cache policy is used. Requests have an associated depth field for the replying node to set the HTL of the reply accordingly. Depth field is initialized to some small random value to obscure location of the requestor. Performance numbers show that the system is reasonably scalable and fault-tolerant. Pros and Cons: Main goal of FREENET is anonymity which it achieves to a decent extent (I don't think it is very difficult for an adversary to find out who the author and the read is for a given request. It is reasonably difficult but not provably difficult). Lookup time is terrible initially but as the system stabilizes experiments show that it tends to logarithmic number of hops. There are no guarantees here, hops are application level hops (there is no talk of locality), system is decently (but not provably) fault tolerant. Tampering has not been investigated and it is fairly easy to do in this system. Gnutella -------- This one of the earliest decentralized P2P file sharing protocols and is based on flooding. Each node exports whatever data it wants to. Each request is associated with a TTL and the request is forwarded to all the neighbors of the node. Forwarding continues until TTL becomes 0. Meanwhile if a node contains the requested file it sends it to the requestor. Some research problems associated with this model are proposed in this paper: security, enforcing registration and controlling the registration server load, utilizing the registration information (ex: geographic location), file segmentation (to improve download speeds), prioritizing query results based on load balancing requirements. Pros and Cons: This is one of the first decentralized P2P file sharing protocols. Lookup time is logarithmic in the number of hops but bandwidth requirement is tremendous because we are just flooding the request. Failure resilience, tamper-resistance, anonymity etc are non-existent. CAN --- CAN is a scalable, fault-tolerant and self-organizing distributed hash table where each node stores a part of the hash table. CAN is a scalable P2P indexing system with applications in other areas like large scale storage management systems. Basic idea is to consider a virtual d-dimensional Cartesian co-ordinate space on a d-torus with each node owning a region of this space. This space is used to store pairs. Each such pair is mapped onto a point P in the co-ordinate space using a hash funtion (hash value of the key). If the current node owns this space store this pair at this node. Otherwise route the request along the straight line path from source (this node) to destination (the node which owns the space containing point P). Lookup for a pair works similarly given that you know the key. Coming to node insertions (node picks a random point P which it decides to cover): new node first finds a node already in CAN; then using CAN routing mechanism finds a node whose zone will be split; the neighbors of the split zone are notified so that routing can include the new node. Node deletions: when a node is deleted the space it used to own is merged with that of one of its neighbors (one which detects this node is dead or one with least space) and the rest of the neigbors are informed about it. Simulation results show that system is scalable and fault-tolerant. Pros and Cons: Scalable, fault-tolerant and self-organizing distributed hash table. Lookup time here is measure in terms of average path length and that grows as n^(1/d). Zone overloading and replication improves fault-tolerance. Anonymity, Security issues and locality issues are not discussed in this paper. From vivi@CS.Cornell.EDU Tue Oct 29 14:58:44 2002 Received: from exchange.cs.cornell.edu (exchange.cs.cornell.edu [128.84.97.8]) by sundial.cs.cornell.edu (8.11.3/8.11.3/M-3.10) with ESMTP id g9TJwih27249 for ; Tue, 29 Oct 2002 14:58:44 -0500 (EST) content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-MimeOLE: Produced By Microsoft Exchange V6.0.6249.0 Subject: 615paper60 Date: Tue, 29 Oct 2002 14:58:44 -0500 Message-ID: <47BCBC2A65D1D5478176F5615EA7976D11AF9D@opus.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: 615paper60 Thread-Index: AcJ/hZW1mJKsPiDfQcmjWC4Cs8lAuw== From: "Vivek Vishnumurthy" To: "Gun Sirer" Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id g9TJwih27249 Freenet is a peer-to-peer system designed with the aim of enabling distributed data sharing while providing anonymity of both authors and readers. File keys that identify the files are generated by hashing (Keyword-signed Key / Signed-subspace Key/ Content Hash Key schemes). Requests for files are handled based on the file keys; A node that receives a request for a certain file key checks if it has the copy of the file asked for. If it does not, it forwards the request to the node that has the file corresponding to the file key lexicographically closest to the requested file key. File-inserts are implemented in a similar fashion. Simulation results show that look-up times improve over time; As nodes come to specialize in locating sets of similar keys, the number of hops taken to reach the node that has the file goes down. The system seems to be fairly fault-tolerant; the authors conjecture that this is because the Freenet system is a small-world model. Implentation of this system is fairly complex (generation of keys, signing the files, etc). The system provides some protection against tampering of files: Files are signed, thus detection of modification of the file contents is made possible. Gnutella is a decentralized file-sharing system riddled with problems (as the paper suggests). The system does not seem to have been fully developed. It does not provide good look-up times; It offers no support to detect file-tampering; Gnutella seems to have some amount of fault-tolerance: (not clear in the paper): Failure of a small number of nodes should not cripple the network as a whole. Implementation of Gnutella seems to be much simpler than that of Freenet. The Scalable Content Addressable Network (SCAN) stores files based on a hashing scheme. File keys are hashed to a d-dimensional coordinate space. This multi-dimensional space is split into different zones, and each node owns one zone. Every node maintains addresses of the neighboring nodes in the coordinate space. Routing is implemented in a greedy fashion: The requests are forwarded to that node whose coordinates are closest to the coordinates of the required destination. Node additions and removals are handles by splitting a zone and merging of zones, respectively. SCAN provides good look-up times: The end-to-end latency is within a factor of two of the underlying network latency. The authors suggest the incorporation of more dimesions or multiple realities to achieve better Fault-tolerance; These features have the overhead of added state maintenance at each node. SCAN does not provide any protection against file-tampering.