From kash Wed Feb 1 15:58:11 2006 Return-Path: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k11KwB412538 for ; Wed, 1 Feb 2006 15:58:11 -0500 (EST) Received: from EXCHVS1.cs.cornell.edu ([128.84.97.23]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Wed, 1 Feb 2006 15:58:11 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C62772.355AD0C8" Subject: PAPER 3 Date: Wed, 1 Feb 2006 15:58:09 -0500 Message-ID: <2EE48095D8C21643B0B70EC95F9BFBAFFBC179@EXCHVS1.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: PAPER 3 Thread-Index: AcYncjSx0iM57l5sR5uT5XK1zyznzg== From: "Ian Kash" To: X-OriginalArrivalTime: 01 Feb 2006 20:58:11.0227 (UTC) FILETIME=[35FE96B0:01C62772] X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-5.7 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, HTML_MESSAGE,MIME_QP_LONG_LINE autolearn=ham version=3.0.2 X-Spam-Level: This is a multi-part message in MIME format. ------_=_NextPart_001_01C62772.355AD0C8 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable CAN is a structured p2p network based on ownership of zones of a multidimensional space. Each key hashes to a point in the space and the owner of the zone that point is in controls that value. To route, each node keeps track of its 2d neighbors in the d dimensional space and routes it to the one closest to the destination giving an average path length of O(n^{1/d}). Long hops can be achieved by using multiple realities (essentially several different copies of the space with each node assigned a different zone in each reality. Thus each node has neighbors in a variety of locations so can cover a large distance with a single forward. =20 The use of multiple realities is not well justified theoretically (and this manifests itself in limited performance improvements in the simulations. The problem essentially boils down to that of local search in a social network. While the existence of long hops guarantees the existence of very short paths in expectation, there is no practical way to find them because you do not know which of your neighbors leads to them because you do not know what the long hops of your neighbors and their neighbors etc are. Thus the only improvement comes from stumbling across hops that move you more quickly in a greedy fashion which is not a major improvement. =20 Also, CAN seems to have trouble handling multiple simultaneous failures. They claim that they have a mechanism that eventually resolves the takeover issues, but it seems like it ultimately has to involve identifying the neighbors and reaching a consensus among them which seems extremely problematic in practice. ------_=_NextPart_001_01C62772.355AD0C8 Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable -->

CAN is a structured p2p network based on ownership of = zones of a

multidimensional space.  Each key hashes to a = point in the space and

the owner of the zone that point is in controls that = value.  To route,

each node keeps track of its 2d neighbors in the d dimensional space

and routes it to the one closest to the destination = giving an average

path length of O(n^{1/d}).  Long hops can be = achieved by using

multiple realities (essentially several different = copies of the space

with each node assigned a different zone in each = reality.  Thus each

node has neighbors in a variety of locations so can = cover a large

distance with a single = forward.

 

The use of multiple realities is not well justified theoretically (and

this manifests itself in limited performance = improvements in the

simulations.  The problem essentially boils down = to that of local

search in a social network.  While the existence = of long hops

guarantees the existence of very short paths in = expectation, there is

no practical way to find them because you do not know = which of your

neighbors leads to them because you do not know what = the long hops of

your neighbors and their neighbors etc are.  = Thus the only improvement

comes from stumbling across hops that move you more = quickly in a

greedy fashion which is not a major = improvement.

 

Also, CAN seems to have trouble handling multiple simultaneous

failures.  They claim that they have a mechanism = that eventually

resolves the takeover issues, but it seems like it = ultimately has to

involve identifying the neighbors and reaching a = consensus among them

which seems extremely problematic in = practice.

------_=_NextPart_001_01C62772.355AD0C8-- From okennedy Wed Feb 1 16:46:12 2006 Return-Path: Received: from exchfe1.cs.cornell.edu (exchfenlb-1.cs.cornell.edu [128.84.97.33]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k11LkC400347 for ; Wed, 1 Feb 2006 16:46:12 -0500 (EST) Received: from exchfe1.cs.cornell.edu ([128.84.97.27]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Wed, 1 Feb 2006 16:46:12 -0500 Received: from [128.84.98.36] ([128.84.98.36]) by exchfe1.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Wed, 1 Feb 2006 16:46:11 -0500 Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary From: Oliver Kennedy Subject: PAPER 3 Date: Wed, 1 Feb 2006 16:46:33 -0500 X-Mailer: Apple Mail (2.746.2) X-OriginalArrivalTime: 01 Feb 2006 21:46:11.0725 (UTC) FILETIME=[EAE7B3D0:01C62778] X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Level: CAN is a distributed hash table with the same basic idea as Chord, Pastry, and Tapestry. All nodes are mapped into ranges over the space of all keys (or more specifically keys are mapped onto numbers, and nodes are mapped onto ranges of numbers). Nodes are aware of other nodes with numerically similar ranges to their own. However, while the original four papers use a 1 dimensional cyclical keyspace, CAN uses a multi-dimensional cyclical keyspace. This changes the routing scheme somewhat. Due to the multi-dimensional nature, the exponential jump approach is less viable. Nodes would have to keep a three dimensional routing table, rather than just a 2d one. However, the single hop at a time approach becomes more viable. Since nodes are scattered throughout this multidimensional space, average distances drop from O(N) to an inverse power of the number of dimensions. This dramatically reduces the state each node is required to keep and simplifies the routing protocol. Adding nodes to the network is also much simpler. As with Chord, only one other node need be contacted in order to obtain all the data to be managed by a new node. Unlike Chord, all necessary routing state can also be obtained from that node. That said, CAN's toroidal topology (as written) doesn't scale as well as a ring topology with respect to routing. While at a given dimension CAN can outperform a ring network of a certain size, as the network grows, the exponential jumps become more and more important. This idea could potentially be integrated into CAN (at the cost of increased state). CAN also suffers from most of the same problems that Pastry et.al do. -Oliver Kennedy Computer science isn't about computers any more than astronomy is about telescopes. -- Dijkstra From km266 Wed Feb 1 21:13:12 2006 Return-Path: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k122DC411347 for ; Wed, 1 Feb 2006 21:13:12 -0500 (EST) Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k122DBgx004119 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Wed, 1 Feb 2006 21:13:11 -0500 (EST) Message-ID: <000601c6279e$43459ef0$4425cf45@KEVSTOY> Reply-To: "Kevin" From: "Kevin" To: Subject: PAPER 3 Date: Wed, 1 Feb 2006 21:13:30 -0500 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_NextPart_000_0003_01C62774.59FD2610" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2527 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.4 required=5.0 tests=AWL,BAYES_00, FORGED_OUTLOOK_TAGS,HTML_10_20,HTML_MESSAGE autolearn=no version=3.0.2 X-Spam-Level: This is a multi-part message in MIME format. ------=_NextPart_000_0003_01C62774.59FD2610 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable CAN (Content-Addressable Network) constantly makes parallels against its = infrastructure and a hash-table. It is a torus shaped overlay that is = d-dimensional. A packet is routed in one direction along each dimension = until it reaches its destination. This allows a routing path length of = (d/4)(n^(1/d)) hops. A node picks a random location along each = dimension to put itself into and gets associated with a bunch of = key,value pairs. It inherits these pairs from a node that was taking up = part of that volume. To route a packets to any specific node, you just = contact your neighbor in the dimension that leads to that node and relay = the message to it. To do this you have to maintain a neighbor list = which is O(d) in size. A lot of the joining is similar to those of = other networks: you have to know a node which is already in the system = and then find a place to put yourself. You then inherit a large amount = of information from your new-neighbor and tell your neighbor nodes about = your place in the system. It seems like a lot of the interesting parts = of the paper were about the improvements and modifications you could = make to this scheme. The first obvious improvement would be to increase the dimensionality in = which you operate. The more dimensions you have, the fewer = application-level hops you will have to take to get a message to any = other node. Furthermore, you have to store more neighbor information = which allows for more redundancy. The next improvement is called = 'realities.' Each reality is an entire network or coordinate space of = its own. Having multiple realities is akin to being connected to = multiple networks. In this case, each node has a location in each = reality therefore making data redundant. You insert yourself into a = random location in each reality and therefore you can be closer to a = peice of location in one reality as compared to another, possibly = allowing you to send a message to a particular node quicker (and never = slower). FUrthermore, the redundancy helps prevent network outages and = since all data is replicated r times, where r is the number of = realities, sudden node departure is not nearly as big an issue. The = third improvement is quite simple: it sends out a ping to each neighbor = and finds the neighbor that will get your message closer to your = destination with respect to its ping time. A fourth improvement is = allowing each zone in the coordinate space to have multiple peers -- = this provides redundancy and lower delays as you route through the = fastest peer. It is similar to reducing the overall number of nodes in = the system, which makes the network delay lower. The fifth improvement = is allowing multiple hash functions which would create distinct = pairs which therefore would put more redundancy into the = system. There was a section on being geographically or topologically = aware of the underlying internet structure, but it was not tested. The = seventh improvement allows nodes to be inserted in locations where zone = volume is bigger, this allows more even distribution of nodes in the = system (alloing all nodes to be within a factor of 4 from each other). = This is accomplished by having a node give the join command to a = neighbor if the neighbor occupies a higher volume than it does. Caching = and replication are standard in other p2p systems and was also detailed = here. This paper's main concentration seemed to be on improvements to = peer-to-peer systems, specifically the one introduced early in the = paper. These improvements could well be put into other systems. = Despite, or more probably because of this, there were large ommisions in = terms of security and other big issues. Security was not mentioned at = all in the paper and there are loopholes. A node could try to join in a = specific zone to try and take over a specific pair in order = to drop the packets on the ground. For example, the RIAA could hash = "Nirvana" and try to put itself in the zone associated with that, and = drop packets or send them back with faulty data. Furthermore, it seemed = as if failure was not taken into account as much as it should have been. = The solution seems to be "just wait for the data to be refreshed." = This seems to be a bad solution -- there is a compromise that has to be = made. Either data will be unavailable for a period of time or the = network will constantly be bombarded with refreshes. Furthermore, the = paper seems to think that most disconnections will be intentional, which = is not reasonable in a p2p file sharing system where people drop = unexpectadly. One part of the paper I am confused about is the neighbor = data. In the TAKEOVER part of the paper, it seems as if a node has = access to its neighbor's neighbors. The only way this could be true is = if the node held O(d^2) information about its surroundings. The paper = very clearly states that it only keeps track of O(d) neighbors. If this = is the case then obtaining the disconnected neighbor's neighbors will be = tricky and require sending potentially many packets. This part was = either omitted from the paper or I misread it. ------=_NextPart_000_0003_01C62774.59FD2610 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
CAN (Content-Addressable Network) = constantly makes=20 parallels against its infrastructure and a hash-table.  It is a = torus=20 shaped overlay that is d-dimensional.  A packet is routed in one = direction=20 along each dimension until it reaches its destination.  This allows = a=20 routing path length of (d/4)(n^(1/d)) hops.  A node picks a random = location=20 along each dimension to put itself into and gets associated with a bunch = of=20 key,value pairs.  It inherits these pairs from a node that was = taking up=20 part of that volume.  To route a packets to any specific node, you = just=20 contact your neighbor in the dimension that leads to that node and relay = the=20 message to it.  To do this you have to maintain a neighbor list = which is=20 O(d) in size.  A lot of the joining is similar to those of other = networks:=20 you have to know a node which is already in the system and then find a = place to=20 put yourself.  You then inherit a large amount of information from = your=20 new-neighbor and tell your neighbor nodes about your place in the = system. =20 It seems like a lot of the interesting parts of the paper were about the = improvements and modifications you could make to this scheme.
The = first=20 obvious improvement would be to increase the dimensionality in which you = operate.  The more dimensions you have, the fewer application-level = hops=20 you will have to take to get a message to any other node.  = Furthermore, you=20 have to store more neighbor information which allows for more = redundancy. =20 The next improvement is called 'realities.'  Each reality is an = entire=20 network or coordinate space of its own.  Having multiple realities = is akin=20 to being connected to multiple networks.  In this case, each node = has a=20 location in each reality therefore making data redundant.  You = insert=20 yourself into a random location in each reality and therefore you can be = closer=20 to a peice of location in one reality as compared to another, possibly = allowing=20 you to send a message to a particular node quicker (and never = slower). =20 FUrthermore, the redundancy helps prevent network outages and since all = data is=20 replicated r times, where r is the number of realities, sudden node = departure is=20 not nearly as big an issue.  The third improvement is quite simple: = it=20 sends out a ping to each neighbor and finds the neighbor that will get = your=20 message closer to your destination with respect to its ping time.  = A fourth=20 improvement is allowing each zone in the coordinate space to have = multiple peers=20 -- this provides redundancy and lower delays as you route through the = fastest=20 peer.  It is similar to reducing the overall number of nodes in the = system,=20 which makes the network delay lower.  The fifth improvement is = allowing=20 multiple hash functions which would create distinct <key,value> = pairs=20 which therefore would put more redundancy into the system.  There = was a=20 section on being geographically or topologically aware of the underlying = internet structure, but it was not tested.  The seventh improvement = allows=20 nodes to be inserted in locations where zone volume is bigger, this = allows more=20 even distribution of nodes in the system (alloing all nodes to be within = a=20 factor of 4 from each other).  This is accomplished by having a = node give=20 the join command to a neighbor if the neighbor occupies a higher volume = than it=20 does.  Caching and replication are standard in other p2p systems = and was=20 also detailed here.
This paper's main concentration seemed to be on=20 improvements to peer-to-peer systems, specifically the one introduced = early in=20 the paper.  These improvements could well be put into other = systems. =20 Despite, or more probably because of this, there were large ommisions in = terms=20 of security and other big issues.  Security was not mentioned at = all in the=20 paper and there are loopholes.  A node could try to join in a = specific zone=20 to try and take over a specific <key,value> pair in order to drop = the=20 packets on the ground.  For example, the RIAA could hash "Nirvana" = and try=20 to put itself in the zone associated with that, and drop packets or send = them=20 back with faulty data.  Furthermore, it seemed as if failure was = not taken=20 into account as much as it should have been.  The solution seems to = be=20 "just wait for the data to be refreshed."  This seems to be a bad = solution=20 -- there is a compromise that has to be made.  Either data will be=20 unavailable for a period of time or the network will constantly be = bombarded=20 with refreshes.  Furthermore, the paper seems to think that most=20 disconnections will be intentional, which is not reasonable in a p2p = file=20 sharing system where people drop unexpectadly.  One part of the = paper I am=20 confused about is the neighbor data.  In the TAKEOVER part of the = paper, it=20 seems as if a node has access to its neighbor's neighbors.  The = only way=20 this could be true is if the node held O(d^2) information about its=20 surroundings.  The paper very clearly states that it only keeps = track of=20 O(d) neighbors.  If this is the case then obtaining the = disconnected=20 neighbor's neighbors will be tricky and require sending potentially many = packets.  This part was either omitted from the paper or I misread=20 it.
------=_NextPart_000_0003_01C62774.59FD2610-- From lackhand Wed Feb 1 23:53:21 2006 Return-Path: Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.206] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k124rK427489 for ; Wed, 1 Feb 2006 23:53:21 -0500 (EST) Received: by zproxy.gmail.com with SMTP id s18so305091nze for ; Wed, 01 Feb 2006 20:53:20 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=gIdeDDkdzamdPpiMDFOrJmORhiszUUHjn4zH7+UkUxskDirsdj50Hnz/dtnQ9q+fuhPEn1cH7kTi3GgS+wSsfUzZSTsCOQvADJR1VR93Q7pMU+gK7DxAqq1PGsdbIiMxdnLFrw4lYJlxAoOLv1D2Vhx8SW/1ZntyWq9etEMEi+s= Received: by 10.36.74.16 with SMTP id w16mr318993nza; Wed, 01 Feb 2006 20:53:20 -0800 (PST) Received: by 10.36.141.3 with HTTP; Wed, 1 Feb 2006 20:53:20 -0800 (PST) Message-ID: <9aa7a97d0602012053r3c9b126dm95d988135d8e0bd4@mail.gmail.com> Date: Wed, 1 Feb 2006 23:53:20 -0500 From: Andrew Cunningham To: egs+summary Subject: PAPER 3 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_10361_3766995.1138856000307" X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE,RCVD_BY_IP autolearn=no version=3.0.2 X-Spam-Level: ------=_Part_10361_3766995.1138856000307 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham arc39 _A_Scalable_Content_Addressable_Network_ Sylvia Ratnasamy Paul Francis Mark Handley Richard Karp Scott Shenker CAN is the ring based network to end all ring based networks: rather than enumerating points around a one dimensional closed address space, it makes addresses d-vectors in a d-dimensional taurus. At first blush, then, it suffers as a ring based system; it lacks the logN pointers of previous iterations, and likewise routes only directly, using a greedy distance function (leveraging the metric-space properties of the address space to give sensible answers about "direction" without absolute routing knowledge)= . Nodes that share a high number of dimensions can be thought of as "in a ring" or "in a taurus" with each other -- with analagous higher order versions -- with the result that depending on node placement, a single zombie or node failure can cause searches to fail. It has a mixed blessing, as well: nodes may decide that their neighbors have perished, and thus seiz= e their zones; it's unclear what happens in a conflict, though I would presum= e that the original holder of the zone is honored! The presence of multiple coordinate spaces is, I feel, something of a loss: though it is true that they achieve good data replication, which is definitely a win, they preclud= e the easiest way of joining CANs, in that if you have two operational CANs, the only way to join them is to place all the members of each on both; by using realities with tangent points or a similar portal scheme, a union of CANs could have been acheived. In general, in fact, this kind of "going forward" scalability doesn't seem to have been applied to CAN: the number of dimensions is set at the beginning (altering it requires restructuring the address space and distribution of zones, points in zones, etc); inconsistent states can be created via partitions but, due to the low likelihood of occurrence, not healed; this means that it is difficult to perform a union of CAN nets. The ability of CAN to scale well in the "sweet spot" shouldn't be underestimated: due to it's nth-root characteristic searches, and 2*d neighbor tables, with an approximate guess as to it's size, it performs quite well. It's only unfortunate that a bad guess requires rebuilding the network. The various modifications to boost performance of the algorithm do not seem to be specific to this; they seem to help any algorithm perform better, though admittedly the low overhead of CAN works well with the required state duplication of multiple realities. As a closing note, I too hope to some day write a paper where I can route messages through another dimension to reach my alternate twin (ie, holding my space in the address space) in another reality. ------=_Part_10361_3766995.1138856000307 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham
arc39

_A_Scalable_Content_Addressable_Network_
Sylvia Ratnasamy Paul Francis Mark Handley Richard Karp
Scott Shenker

    CAN is the ring based network to end all ring based networks: rather than enumerating points around a one dimensional closed address space, it makes addresses d-vectors in a d-dimensional taurus. At first blush, then, it suffers as a ring based system; it lacks the logN pointers of previous iterations, and likewise routes only directly, using a greedy distance function (leveraging the metric-space properties of the address space to give sensible answers about "direction" without absolute routing knowledge).
    Nodes that share a high number of dimensions can be thought of as "in a ring" or "in a taurus" with each ot= her -- with analagous higher order versions -- with the result that depending on node placement, a single zombie or node failure can cause searches to fail. It has a mixed blessing, as well: nodes may decide that their neighbors have perished, and thus seize their zones; it's unclear what happens in a conflict, though I would presume that the original holder of the zone is honored! The presence of multiple coordinate spaces is, I feel, something of a loss: though it is true that they achieve good data replication, which is definitely a win, they preclude the easiest way of joining CANs, in that if you have two operational CANs, the only way to join them is to place all the members of each on both; by using realities with tangent points or a similar portal scheme, a union of CANs could have been acheived.
    In general, in fact, this kind of "going forward&qu= ot; scalability doesn't seem to have been applied to CAN: the number of dimensions is set at the beginning (altering it requires restructuring the address space and distribution of zones, points in zones, etc); inconsistent states can be created via partitions but, due to the low likelihood of occurrence, not healed; this means that it is difficult to perform a union of CAN nets.
    The ability of CAN to scale well in the "sweet spot= " shouldn't be underestimated: due to it's nth-root characteristic searches, and 2*d neighbor tables, with an approximate guess as to it's size, it performs quite well. It's only unfortunate that a bad guess requires rebuilding the network. The various modifications to boost performance of the algorithm do not seem to be specific to this; they seem to help any algorithm perform better, though admittedly the low overhead of CAN works well with the required state duplication of multiple realities.
    As a closing note, I too hope to some day write a paper where I can route messages through another dimension to reach my alternate twin (ie, holding my space in the address space) in another reality.
------=_Part_10361_3766995.1138856000307-- From asr32 Thu Feb 2 00:01:18 2006 Return-Path: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1251H429709 for ; Thu, 2 Feb 2006 00:01:17 -0500 (EST) Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1251Hrh025894 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 2 Feb 2006 00:01:17 -0500 (EST) Message-Id: <6.2.1.2.2.20060201174629.03180ea8@postoffice8.mail.cornell.edu> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Wed, 01 Feb 2006 23:59:17 -0500 To: egs+summary From: Ari Rabkin Subject: PAPER 3 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.0.2 X-Spam-Level: Whereas Pastry, Chord, and Tapestry are all variants of ring-based topology, CAN uses a d-dimensional torus. Instead of having pointers across a ring, data flows in a "straight" line through the address space to the destination. The key new features are the use of a multidimensional addressing space, and "Cartesian" distance in address space as a metric. Replication is also handled in a much better way than in other systems: the data spreads "outwards", and queries will hit the various replicas before the central node. CAN has a number of weaknesses. Malfunctioning or malicious nodes can drop or mangle traffic and are impossible to eject; as long as a node is marked as up, it will be routed through. Since the path between two nodes can be predicted in advance, an attacker able to choose where to join the network can intervene to block it. The dimension of the torus cannot be changed while the system is running, making it hard to correct if the size of the system differs from expectation. Different CAN networks cannot be readily merged. --Ari Ari Rabkin asr32 Risley Hall 454 3-2842 The resources of civilization are not yet exhausted. --William Gladstone From pjk25 Thu Feb 2 00:33:34 2006 Return-Path: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k125XY407721 for ; Thu, 2 Feb 2006 00:33:34 -0500 (EST) Received: from [192.168.0.103] (user-10mt73g.cable.mindspring.com [65.110.156.112]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k125XWeC028970 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Thu, 2 Feb 2006 00:33:33 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v746.2) To: egs+summary Message-Id: <2BBDF62B-B221-451C-9EAA-B088BA8F09E4> Content-Type: multipart/signed; micalg=sha1; boundary=Apple-Mail-7--509910823; protocol="application/pkcs7-signature" From: Philip Kuryloski Subject: PAPER 3 Date: Thu, 2 Feb 2006 00:36:01 -0500 X-Mailer: Apple Mail (2.746.2) X-Spam-Checker-Version: SpamAssassin 3.0.2 (2004-11-16) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.0.2 X-Spam-Level: --Apple-Mail-7--509910823 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed The authors propose a distributed hash table implementation designed for peer-to-peer file sharing, which provides what they call a content addressable network (CAN). Their can functions in the following manner: A hash function maps a key to some point on a d- dimensional torus. The torus is broken up into a set of zones, each of which is owned by a CAN node. The node which owns the space where the key falls is responsible for the value corresponding to that key. Each node maintains a table of it's neighbors (owners of abutting zones in the toroidal space). This allows the owner of a key to be contacted by forwarding a key request to a neighbor which is closest to the key in the toroidal space. Nodes join the network in the following manner: A node obtains the IP address of some existing CAN node. It then chooses a random point the toroidal space and sends a join message to that point. The node who owns this space splits its space in half, giving one half to the new node. Routing tables are easily obtained from the original owner of the space, as the new node shares most neighbors with that node. Node failure is address in the following manner: All nodes periodically announce their existence and zone information to their neighbors. When an update is missed, a neighboring node will take over that space, with preference given to nodes with the smallest zones. The authors introduce a number of enhancements to this scheme to improve performance. One such enhancement is to replicate the routing scheme over several "realities" which correspond to separate toroidal spaces and zones. Keys has to a point differently in each zone, allowing nodes to route in the reality which places them closest to the message target. Additional dimensions and realities increase routing table size and system maintenance traffic but decrease the average number of hops to deliver a message. The round trip time of packets through the underlying network can also be considered in message forwarding to further improve lookup speed. The authors suggest optimal tradeoff points for these factors. Overlapping of zones is also introduced, multiple hash functions on a single toroidal space, and during join operations, splitting the largest zone nearby the randomly chosen point. Caching and replication can also be used with this scheme to improve performance. The large number of optimization methods results in the primary limitation of CAN, in that these methods of improving the speed of message transfer all increase the amount of maintenance traffic and size of routing tables in the system. The optimal value for factors such as routing space dimension are difficult to determine a priori, and background processes must be running in the network to avoid partitioning of the the mapping space and to maintain uniformity of zone size between CAN nodes. Philip Kuryloski 444 Rhodes Hall Cornell University Ithaca, NY 14853-5112 607.255.4222 pjk25 --Apple-Mail-7--509910823 Content-Transfer-Encoding: base64 Content-Type: application/pkcs7-signature; name=smime.p7s Content-Disposition: attachment; filename=smime.p7s MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGFjCCAs8w ggI4oAMCAQICAw+L7TANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhh d3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVt YWlsIElzc3VpbmcgQ0EwHhcNMDUwOTI2MTc1NzM0WhcNMDYwOTI2MTc1NzM0WjBDMR8wHQYDVQQD ExZUaGF3dGUgRnJlZW1haWwgTWVtYmVyMSAwHgYJKoZIhvcNAQkBFhFwamsyNUBjb3JuZWxsLmVk dTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBALFkgyhHSufUWaYxKh+wvUSDmrM8cViE JjeRS7Ssdd5tf0ckH2iwktNuSkxSsavsmAY+8zahwJjwk/JWTVyOGW/QsjgA5zoTJeAz4ah/QcKZ hou20lN6NvlFZWA43b4/jwtpVVa2RMS1fitkEs7YA9N16akGyXCpJR2i6EVTk7tx8/zf7i7bqg4t tmbJaySQyMQ4QV1O+F00m+zms0WZN5XRDqPwU2/WZfUE5BK/pGLkkFheBGSJJssuOsct8ctup0AI fJLlLZZhBCEdNeM2x9KfQEm+Tk3Ty0zl0pOewe7oW9vgwBJ2LwTVurzQ7qXeq1VhkDmkQJOwjcxM ssGAPXMCAwEAAaMuMCwwHAYDVR0RBBUwE4ERcGprMjVAY29ybmVsbC5lZHUwDAYDVR0TAQH/BAIw ADANBgkqhkiG9w0BAQQFAAOBgQBanW/NR5+pfeGOS7lM21kLObfzzGKtHvTFZ/RS0cSgWSKaCZfx aLLhqC9EFFFxh0b4wn0zCTv4CQWhrpaPZZC7oroP70kqWypQdjFbQ2rlLrVVS8pE4gtjZnRPFMr0 BEH+1K7kWB6kTHvg2eI1EotCI92yARGzlzKrXjPonHppijCCAz8wggKooAMCAQICAQ0wDQYJKoZI hvcNAQEFBQAwgdExCzAJBgNVBAYTAlpBMRUwEwYDVQQIEwxXZXN0ZXJuIENhcGUxEjAQBgNVBAcT CUNhcGUgVG93bjEaMBgGA1UEChMRVGhhd3RlIENvbnN1bHRpbmcxKDAmBgNVBAsTH0NlcnRpZmlj YXRpb24gU2VydmljZXMgRGl2aXNpb24xJDAiBgNVBAMTG1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFp bCBDQTErMCkGCSqGSIb3DQEJARYccGVyc29uYWwtZnJlZW1haWxAdGhhd3RlLmNvbTAeFw0wMzA3 MTcwMDAwMDBaFw0xMzA3MTYyMzU5NTlaMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUg Q29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwg SXNzdWluZyBDQTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAxKY8VXNV+065yplaHmjAdQRw nd/p/6Me7L3N9VvyGna9fww6YfK/Uc4B1OVQCjDXAmNaLIkVcI7dyfArhVqqP3FWy688Cwfn8R+R NiQqE88r1fOCdz0Dviv+uxg+B79AgAJk16emu59l0cUqVIUPSAR/p7bRPGEEQB5kGXJgt/sCAwEA AaOBlDCBkTASBgNVHRMBAf8ECDAGAQH/AgEAMEMGA1UdHwQ8MDowOKA2oDSGMmh0dHA6Ly9jcmwu dGhhd3RlLmNvbS9UaGF3dGVQZXJzb25hbEZyZWVtYWlsQ0EuY3JsMAsGA1UdDwQEAwIBBjApBgNV HREEIjAgpB4wHDEaMBgGA1UEAxMRUHJpdmF0ZUxhYmVsMi0xMzgwDQYJKoZIhvcNAQEFBQADgYEA SIzRUIPqCy7MDaNmrGcPf6+svsIXoUOWlJ1/TCG4+DYfqi2fNi/A9BxQIJNwPP2t4WFiw9k6GX6E sZkbAMUaC4J0niVQlGLH2ydxVyWN3amcOY6MIE9lX5Xa9/eH1sYITq726jTlEBpbNU1341YheILc IRk13iSx0x1G/11fZU8xggLnMIIC4wIBATBpMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3 dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1h aWwgSXNzdWluZyBDQQIDD4vtMAkGBSsOAwIaBQCgggFTMBgGCSqGSIb3DQEJAzELBgkqhkiG9w0B BwEwHAYJKoZIhvcNAQkFMQ8XDTA2MDIwMjA1MzYwMlowIwYJKoZIhvcNAQkEMRYEFNqKphkcjDzD 6f3nSA0fIUkt75fTMHgGCSsGAQQBgjcQBDFrMGkwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRo YXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVl bWFpbCBJc3N1aW5nIENBAgMPi+0wegYLKoZIhvcNAQkQAgsxa6BpMGIxCzAJBgNVBAYTAlpBMSUw IwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVy c29uYWwgRnJlZW1haWwgSXNzdWluZyBDQQIDD4vtMA0GCSqGSIb3DQEBAQUABIIBAJhmsnNAr6/K LbQekKHE9ki8EkDdwrPeHIsSNg6ni2svdSMj/50gMnjJ2fhNlAfmM2UAqxcssLEW3weWNyPGyQ8n uJBlbYQ9O+pzrZhXAvKZQcQ9FG3AQf4GCd46RmKVxViUAO/pSjxw3GcH/rF+oFwqx/940HG1O/7J nUFCLSnWhmFuZZND+r36/72OyRHmL2bnsiK1rMWAv7jrkX+lCvg1irbWnm7J9dvG6pLVr6RJiFLq 4Z4edFuXvSD09gnBGeyfg6TWm8MIjbdURmaRWISJAU8yp04D0rvKpMnJjLoalas+q7QkZ6PUSd3b R0l+xBMxe7us8yBxhYUlokEqTKoAAAAAAAA= --Apple-Mail-7--509910823-- From nsg7 Thu Feb 2 09:34:26 2006 Return-Path: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12EYPj23797 for ; Thu, 2 Feb 2006 09:34:25 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k12EYNdU019268 for ; Thu, 2 Feb 2006 09:34:24 -0500 (EST) Received: from 132.236.227.119 by webmail.cornell.edu with HTTP; Thu, 2 Feb 2006 09:34:24 -0500 (EST) Message-ID: <1359.132.236.227.119.1138890864.squirrel@webmail.cornell.edu> Date: Thu, 2 Feb 2006 09:34:24 -0500 (EST) Subject: PAPER 3 From: "Nicholas S Gerner" To: egs+summary User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal "A Scalable Content-Addressable Netork" presents a distributed hash-table similar in concept to the ring based DHTs we've already seen (Pastry, Chord, etc.). In these ring-based DHTs a single dimensional overlay is produced (with wraparound). Routing succeeds as long as every node knows it's next closest neighbor in one direction (or in both directions). To achieve better than O(N) routing, other nodes are known to each node at exponentially increasing distance (achieving O(log N) routing). CAN uses a d-dimensional overlay (with wraparound producing a torus). Neighbors in both directions along each dimension are known to each node. No other information is used and routing can be performed in O(N^1/d) steps. For an appropriate choice of d routing can be as efficient as the ring based DHTs we've already studied. In addition to presenting all the necessary algorithms (routing, noid join, node failure, etc.), CAN goes on to present several additions to a torus overlay. One of these additions is the concept of "realities". A reality is an additional torus overlay with different "zone" assignment (the set of points for which a node is responsible). Each reality distributes zones independently to nodes. Because CAN is presented as an information storage network, each reality introduces another copy of each (key, value) pair, increasing availability and with high probability decreasing the distance between a request (key) and it's response (value). Experimental results are presented that indicate that introducing additional realities does significantly improve availability and significantly reduce route path lengths. The CAN paper does have several weaknesses. Similarly to the Pastry, Chord and Tapestry systems, malicious nodes could break the overlay. Also, path lengths in CAN are O(n^1/d). To get O(log N) path lengths, d must be chosen as a function of N, which might be unknown or change over time (perhaps dramatically). in this case a new d must be chosen and the overlay must be rebuilt. Analytical and experimental results presented in the paper suggest that even a moderate choice of d allows for efficient routing with even very large numbers of nodes. The paper does say that only O(d) neighbor state must be stored. However, this statement is contradicted by the TAKEOVER algorithm presented. This algorithm indicates that nodes must keep track of their neighbors' neighbors to tolerate arbitrary node failure. This means that O(d^2) neighbor state must be maintained. O(d^2) will not be overly burdensome for the kinds of d that allow for very large networks. --Nick From ids3 Thu Feb 2 09:51:41 2006 Return-Path: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12Epfj29373 for ; Thu, 2 Feb 2006 09:51:41 -0500 (EST) Received: from [128.253.212.208] (r253212208.resnet.cornell.edu [128.253.212.208]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k12Epecv021281 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 2 Feb 2006 09:51:40 -0500 (EST) Message-ID: <43E21C7E.6070005> Date: Thu, 02 Feb 2006 09:51:42 -0500 From: Ivan Stoyanov User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: egs+summary Subject: PAPER 3 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit The main idea of the paper is to organize peers in an multidimentional virual space. Similar to Chord, keys to data are uniformly distributed accross the participating peers using a hash function mapping. Nodes need to maintain links for their immediate neighbors only, so the size of the routing table is 2d. Routing in CAN takes O(n1/d). Node joins affect only O(d) nodes. CAN uses soft-state routing table updates. Absence of such update messages signals that the node has failed. The system does not have a good way of handling crashed nodes and can quickly become inconsistent when neighbors disagree on which nodes and up or down. In addition, an attacker can repeatedly report its neighbors as failed and thus easily disrupt the system. Another problem with the paper is that the parameter d cannot be adjusted dynamically. The paper, however, provides good theoretical base and does address locality issues, replication and caching. From ryanp Thu Feb 2 10:30:26 2006 Return-Path: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12FUQj13132 for ; Thu, 2 Feb 2006 10:30:26 -0500 (EST) Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 2 Feb 2006 10:30:26 -0500 Received: from [128.253.211.203] ([128.253.211.203]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 2 Feb 2006 10:30:25 -0500 Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: <876B0EF9-136A-43C0-BF95-2B4B2ED37847> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary From: "Ryan S. Peterson" Subject: PAPER 3 Date: Thu, 2 Feb 2006 10:30:24 -0500 X-Mailer: Apple Mail (2.746.2) X-OriginalArrivalTime: 02 Feb 2006 15:30:25.0752 (UTC) FILETIME=[96DEFD80:01C6280D] The Content-Addressable Network (CAN), presented by Ratnasamy, et al., introduces another peer-to-peer, self-organizing distributed hash table design and implementation. Unlike Pastry and Chord, CAN does not organize its nodes in a ring structure. Instead, nodes reside in a d-dimensional space, where each node is "responsible" for a zone within the coordinate space such that all the nodes exactly cover the space. An object in the system is hashed to a point in the coordinate space, and the node that owns the zone containing the point owns the object. Each node keeps track of its neighbors in the space, where neighbors is defined in the natural geometric sense. To route an object lookup, a node need only calculate a line from itself to the coordinate target, and send the message to the neighbor through which the line passes. The coordinate space is treated as a torus, so the line may wrap around the space. CAN handles node joins by placing the new node at a random point in the space, splitting the zone in which it lies in half. Each neighbors of the zone must be notified of the update so each zone maintains its set of neighbors. CAN's clean, elegant DHT formulation has several implications. First, it is intuitively simple, making it easier to reason about various attributes. For example, its geometric nature allows the authors to reason about the average number of node hops required to route to a particular object based on the corresponding line in the coordinate space. One must take care, however, to verify that the intuitive geometric properties do not imply misleading results in the network. As the authors point out, the nodes in neighboring zones, while close together in the coordinate space, may be far apart in terms of latency of geographic location. On the other hand, the geometric formulation leads to a nice property that distinguishes this system from other peer-to-peer DHTs: the ability to separate the amount of bookkeeping at each node from the number of nodes in the network. By increasing or decreasing the dimensionality of the coordinate space, each node will have more or fewer neighboring nodes, which translates to a different amount of state that a node must know. This provides a knob that other systems do not provide, enabling a tradeoff between lookup time and space consumed. This knob has a side effect: as the dimensionality decreases, the number of neighbors each node has decreases, decreasing the robustness of the system. Fewer nodes need to fail before the network becomes unstable, meaning a node loses all information about its neighbors, creating a hole in the coordinate space. Thus in general CAN makes it more difficult to reason about the stability of the system since it doesn't provide a guarantee of the number of failed nodes that the system can withstand. Overall, CAN provides an intriguing new formulation of DHTs, but leaves me with the question of whether, although it is conceptually quite different from other systems, results in a system that is nearly identical in practice--a new mathematical model of the same idea. Ryan From niranjan.sivakumar Thu Feb 2 10:32:12 2006 Return-Path: Received: from xproxy.gmail.com (xproxy.gmail.com [66.249.82.194] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12FWBj14211 for ; Thu, 2 Feb 2006 10:32:11 -0500 (EST) Received: by xproxy.gmail.com with SMTP id h30so304556wxd for ; Thu, 02 Feb 2006 07:32:11 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:reply-to:to:subject:date:user-agent:organization:mime-version:content-type:content-transfer-encoding:content-disposition:message-id:from; b=N/Bh5czI4YpqLlSGmdsjXPTjsSrt3d+hIAZfouBp8eY/cFECkEdw63+lJFDcdWuRwupLHcVAaWrqLRU9vmceoVAZ1fchPloDIl5gts2Jt6F8ATeNhyWlhUfCIUW7bTtIpN9wUIZ3g7jzyckybqaZTmEAkUSvos5zhMfvqRA/ShA= Received: by 10.70.108.19 with SMTP id g19mr952873wxc; Thu, 02 Feb 2006 07:32:11 -0800 (PST) Received: from ?192.168.0.101? ( [69.207.63.116]) by mx.gmail.com with ESMTP id h35sm2952147wxd.2006.02.02.07.32.10; Thu, 02 Feb 2006 07:32:10 -0800 (PST) Reply-To: ns253 To: egs+summary Subject: PAPER 3 Date: Thu, 2 Feb 2006 10:32:09 -0500 User-Agent: KMail/1.9 Organization: Cornell University MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200602021032.09750.ns253> From: Niranjan Sivakumar Niranjan Sivakumar CAN is slightly different approach to a distributed hash table based network than the previous ring-based structures that we have considered. In CAN, the network space is considered to be a virtual Cartesian coordinate space. Objects are provided hash-based key and mapped to a certain point in the coordinate space when they are enter the network. Nodes select a random point as they join the network, and are given a virtual "zone" to cover and will be responsible for objects in their zone. Nodes then keep track of neighboring zones (those that are adjacent, but not merely abutting). When nodes leave the network, if possible they will notify their neighbors before leaving and hand over their zone. In the alternate, since nodes periodically send update messages, if messages are not received beyond a certain threshold time, then the node is considered to have died and the network will automatically attempt to take actions to mend the network. Routing in the network is simply based on finding the shortest path in the coordinate space to the desired location. First, the neighbors are checked. If the neighbors do not contain the target, then a greedy algorithm is used to route the message to another node that is closes to the target. A node can route on multiple paths in the event that one path is not available. If all neighbors in a given direction fail, then the system falls back on an expanding ring search to find a node to forward the request to. One weakness seen in this paper is in the bootstrapping mechanism. Although the system is largely distributed, if bootstrapping is based on DNS this provides a target to be attacked. First, it is possible to direct DNS based attacks (e.g. poisoning) in order to dupe nodes into thinking that they are joining a particular network. Also, the bootstrap nodes of the network provide a discrete target to be attacked. Another flaw that was seen is in the recovery mechanism. It is possible for objects on the network to be unreachable until a "refresh" is performed. This is a potential problem, although there are measures such as overloading zones and replication that have been proposed that could help to alleviate this issue. From XX Thu Feb 2 11:32:57 2006 X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.207] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GWur02312 for ; Thu, 2 Feb 2006 11:32:56 -0500 (EST) Received: by zproxy.gmail.com with SMTP id 4so485241nzn for ; Thu, 02 Feb 2006 08:32:56 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:sender:to:subject:mime-version:content-type; b=guJOkoFyIANdh2kBg400//a5G3Q84LjIIGMHg6Bi+FsEHkAp2vVuMoloLHUk6VmJWJaDXLx2Z44lm7AvigDNg7zvECKb7J35gQ3RjB+WtPQxWzVDVh9hDHWlim+HtdWQJNHdSM41XAPxDUOcA1C08z9OPglpXTKbzu6syhUM0Tc= Received: by 10.64.251.2 with SMTP id y2mr482729qbh; Thu, 02 Feb 2006 08:32:56 -0800 (PST) Received: by 10.65.205.2 with HTTP; Thu, 2 Feb 2006 08:32:56 -0800 (PST) Message-ID: <554904a70602020832h16872caev6bcdd1f06e587a5f@mail.gmail.com> Date: Thu, 2 Feb 2006 11:32:56 -0500 From: "Takayuki Hoshi (TK)" Sender: tkhoshi To: egs+summary Subject: PAPER 3 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_15883_25450243.1138897976219" ------=_Part_15883_25450243.1138897976219 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable The CAN Paper In this well-written paper, authors present a distributed hash table scheme called the content-addressable network (or the CAN). It is a scalable, fault-tolerant, self-organizing system that provides hash table-like operations under a virtual d-dimensional Cartesian coordinate space on a d-torus. In this network, the whole space functions as one big hash table whose basic operations include insertion, lookup and deletion of (key,value) pairs. Thus, each nod= e holds a zone of the entire hash table -- i.e., a subset of the (key,value) pairs -- as well as the state information of its adjacent neighbor zones (2d zones). Routing in a CAN is done by a simple greedy forwarding of following a neighbor closest to the destination node through the Cartesian space. The average routing path is (d/4)(n^(1/d). Upon request, the node in zone containing th= e requested point splits the zone into two and allocates a half of its information to the joining node; similarly, upon departure, the leaving node allocates its information to the neighboring nodes. In case of failed nodes, recovery is done by a takeover algorithm that allocates a failed node's zone to its neighbor= . Authors also introduces many techniques for improvements in performance. In particuular, both increasing the dimensions of the coordinate space and maintaining multiple coordinate spaces (realities) with each node are clearly shown to increase the number of routing hops significantly. Also, having a RTT-based routing metrics, overloading coordinate zones, and maintaining multiple hash functions, are shown to yield better performance. Main contributions of the authors include the introduction of a hash table as a virtual d-dimensional Cartesian coordinate space on a d-torus idea, and of the "chunking" idea of letting each node responsible for a chunk of the entire hash table. In this respect, the ontology of CANs is different from those in other ring-shaped networks like Tapestry, Pastry, and Chord (in addition to the fact that IP addresses of each node is not coded). As authors discuss, there ar= e still issues that are left as future works like designing a secure CAN that is resistant to DoS attacks and extending CAN algorithms to handle mutable content. ------=_Part_15883_25450243.1138897976219 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable The CAN Paper

In this well-written paper, authors present a distribu= ted hash table scheme
called the content-addressable network (or the CAN= ). It is a scalable,
fault-tolerant, self-organizing system that provide= s hash table-like operations
under a virtual d-dimensional Cartesian coordinate space on a d-torus.&= nbsp; In this
network, the whole space functions as one big hash table w= hose basic operations
include insertion, lookup and deletion of (key,val= ue) pairs. Thus, each node
holds a zone of the entire hash table -- i.e., a subset of the (key,val= ue) pairs
-- as well as the state information of its adjacent neighbor z= ones (2d zones).
Routing in a CAN is done by a simple greedy forwarding = of following a neighbor
closest to the destination node through the Cartesian space.  The = average
routing path is (d/4)(n^(1/d). Upon request, the node in zone co= ntaining the
requested point splits the zone into two and allocates a ha= lf of its information
to the joining node; similarly, upon departure, the leaving node alloca= tes its
information to the neighboring nodes. In case of failed nodes, r= ecovery is done
by a takeover algorithm that allocates a failed node's z= one to its neighbor.

Authors also introduces many techniques for improvements in perform= ance. In
particuular, both increasing the dimensions of the coordinate s= pace and
maintaining multiple coordinate spaces (realities) with each no= de are clearly
shown to increase the number of routing hops significantly. Also, havin= g a
RTT-based routing metrics, overloading coordinate zones, and maintai= ning
multiple hash functions, are shown to yield better performance.=20

Main contributions of the authors include the introduction of a has= h table as a
virtual d-dimensional Cartesian coordinate space on a d-tor= us idea, and of the
"chunking" idea of letting each node respo= nsible for a chunk of the entire hash
table.  In this respect, the ontology of CANs is different from th= ose in other
ring-shaped networks like Tapestry, Pastry, and Chord (in a= ddition to the fact
that IP addresses of each node is not coded).  = As authors discuss, there are
still issues that are left as future works like designing a secure CAN = that is
resistant to DoS attacks and extending CAN algorithms to handle = mutable content.

------=_Part_15883_25450243.1138897976219-- From gp72 Thu Feb 2 11:42:13 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from sunup.cs.cornell.edu (sunup.cs.cornell.edu [128.84.96.116]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GgCr06220; Thu, 2 Feb 2006 11:42:12 -0500 (EST) Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sunup.cs.cornell.edu (8.11.7-20031020/8.11.7/R-3.18) with ESMTP id k12GMcn13224; Thu, 2 Feb 2006 11:22:38 -0500 (EST) Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k12GM6dU029008; Thu, 2 Feb 2006 11:22:06 -0500 (EST) Message-ID: <103627068.1138897326162.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 2 Feb 2006 11:22:06 -0500 (EST) From: Gopal Parameswaran To: egs Subject: PAPER 3 Cc: egs+summary Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: uPortal WEB email client 3.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k12GgCr06220 Dear Prof. Sirer, I am not currently in the list for the class and hence I am sending a cc to you too. My critique on the paper is given below. Thanks and Regards Gopal This article explores a concept for a scalable indexing and search mechanism that indexes and maps filenames to their location in the system that is central to any peer to peer system. The author tries to find solution in this paper to the problem that Gnutella network has with flooding and Napster's central storing and indexing schema. Content Addressable Network or CAN is designed as a d dimensional torus broken up into zones that are mapped into a co-ordinate system. Each message is routed to the destination co-ordinates by using the neighbouring co-ordinate set of each node that the message traverses through and is a simple greedy search algorithm. Using a greedy search algorithm by gradient descent based on a distance metric based on co-ordinates is not a great way to route messages. As the author himself states that the average path increases as O(n^1/d). So as n increases the number of nodes or zones traversed increases. To reduce the path at any time the only way in this network is to increase the dimensionality of the co-ordinate map and would be difficult while the system is running. Increasing dimensionality of the co-ordinate system can be also stated in simpler terms as increasing number of neighborhood zones which would result in changes in the neighborhood tables for all the nodes. I think this network also resembles a standard ring topology with the additional feature that you can move along the circumference of the ring instead of one node at a time but up to a maximum of d nodes at a time. Increasing dimensionality is just another way of saying that the message can jump more nodes at a time as it goes along the edge of the ring. Increasing realities can imagined as placing rings on top of the existing rings that have the same data arranged differently along it. The greedy search algorithm fails when a void is created in the co-ordinate space when a node loses all its neighbors and starts with a node that is closer to the destination than itself. This network uses caching to maintain the data key pairs that are commonly accessed at each node. The idea is good but is not really effective as it now has to keep track of a huge number of messages and have them indexed properly since he is caching individual end point key values. Instead he should be caching the path to the particular zone or another level of abstraction that has the least latency. The author even though he introduces latency into the discussion of the network, does not really explore all avenues to reduce it. If this network uses latency cost to each zone instead of individual elements in his cache , his cache would be smaller and also would result in faster and effective paths to the destination node. This network introduces the concept of realities which is a replication of the network to provide alternate paths to destination zones and the messages can be send across all the realities. Realities are implemented by using a different hash function for each reality. Thus the copies of the objects in each reality would be found in different zones and shorter paths to them could be found out by how near the destination is in terms of its key. This would mean that different routing tables would exist at each node for each reality. This also introduces fault tolerance into the system since copies exist in different realities and even if the destination node in one reality goes down it would exist in another reality. However if the realities are twisted (shifted and seared in the co-ordinate space), then there could exist shorter paths to the destination through a different reality since the destination would be closer in a different reality. Though it would mean a bigger routing table or set of tables the path cost of this system can be reduced considerably. From gp72 Thu Feb 2 11:42:31 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=unavailable version=3.1.0 X-Spam-Level: Received: from sunup.cs.cornell.edu (sunup.cs.cornell.edu [128.84.96.116]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GgCr06220; Thu, 2 Feb 2006 11:42:12 -0500 (EST) Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sunup.cs.cornell.edu (8.11.7-20031020/8.11.7/R-3.18) with ESMTP id k12GMcn13224; Thu, 2 Feb 2006 11:22:38 -0500 (EST) Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k12GM6dU029008; Thu, 2 Feb 2006 11:22:06 -0500 (EST) Message-ID: <103627068.1138897326162.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 2 Feb 2006 11:22:06 -0500 (EST) From: Gopal Parameswaran To: egs Subject: PAPER 3 Cc: egs+summary Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: uPortal WEB email client 3.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k12GgCr06220 Dear Prof. Sirer, I am not currently in the list for the class and hence I am sending a cc to you too. My critique on the paper is given below. Thanks and Regards Gopal This article explores a concept for a scalable indexing and search mechanism that indexes and maps filenames to their location in the system that is central to any peer to peer system. The author tries to find solution in this paper to the problem that Gnutella network has with flooding and Napster's central storing and indexing schema. Content Addressable Network or CAN is designed as a d dimensional torus broken up into zones that are mapped into a co-ordinate system. Each message is routed to the destination co-ordinates by using the neighbouring co-ordinate set of each node that the message traverses through and is a simple greedy search algorithm. Using a greedy search algorithm by gradient descent based on a distance metric based on co-ordinates is not a great way to route messages. As the author himself states that the average path increases as O(n^1/d). So as n increases the number of nodes or zones traversed increases. To reduce the path at any time the only way in this network is to increase the dimensionality of the co-ordinate map and would be difficult while the system is running. Increasing dimensionality of the co-ordinate system can be also stated in simpler terms as increasing number of neighborhood zones which would result in changes in the neighborhood tables for all the nodes. I think this network also resembles a standard ring topology with the additional feature that you can move along the circumference of the ring instead of one node at a time but up to a maximum of d nodes at a time. Increasing dimensionality is just another way of saying that the message can jump more nodes at a time as it goes along the edge of the ring. Increasing realities can imagined as placing rings on top of the existing rings that have the same data arranged differently along it. The greedy search algorithm fails when a void is created in the co-ordinate space when a node loses all its neighbors and starts with a node that is closer to the destination than itself. This network uses caching to maintain the data key pairs that are commonly accessed at each node. The idea is good but is not really effective as it now has to keep track of a huge number of messages and have them indexed properly since he is caching individual end point key values. Instead he should be caching the path to the particular zone or another level of abstraction that has the least latency. The author even though he introduces latency into the discussion of the network, does not really explore all avenues to reduce it. If this network uses latency cost to each zone instead of individual elements in his cache , his cache would be smaller and also would result in faster and effective paths to the destination node. This network introduces the concept of realities which is a replication of the network to provide alternate paths to destination zones and the messages can be send across all the realities. Realities are implemented by using a different hash function for each reality. Thus the copies of the objects in each reality would be found in different zones and shorter paths to them could be found out by how near the destination is in terms of its key. This would mean that different routing tables would exist at each node for each reality. This also introduces fault tolerance into the system since copies exist in different realities and even if the destination node in one reality goes down it would exist in another reality. However if the realities are twisted (shifted and seared in the co-ordinate space), then there could exist shorter paths to the destination through a different reality since the destination would be closer in a different reality. Though it would mean a bigger routing table or set of tables the path cost of this system can be reduced considerably. From sh366 Thu Feb 2 11:45:27 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GjQr06979 for ; Thu, 2 Feb 2006 11:45:26 -0500 (EST) Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k12GjPdU015328 for ; Thu, 2 Feb 2006 11:45:25 -0500 (EST) Message-ID: <295547118.1138898724862.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 2 Feb 2006 11:45:25 -0500 (EST) From: Huang Shiang-Jia To: egs+summary Subject: PAPER 3 Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 X-Mailer: uPortal WEB email client 3.0 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k12GjQr06979 • Different from Pastry/Chord/Tapestry’s ring-based topology, a Content-Address Network (CAN) is a distributed hash table based on a virtual d-dimensional Cartesian coordinate space. • CAN routes in O(dn^(1/d)) steps with routing table of size O(dr). Setting d = (logn)/2 allows CAN to match Pastry/Chord/Tapestry’s scaling properties (routing in O(logn) steps with routing table of size O(logn)). • The basic model of CAN may implement: multi-dimensional coordinate space, multiple independent coordinate spaces, RTT-weighted routing, multiple peer nodes per zone, multiple hash functions, topological-sensitive construction of overlay, uniform partitioning, caching and replication for popular data (hot spots), to reduce path latency and improve fault tolerance and load balance. • To prevent the coordinate space from becoming further fragmented (caused by the normal node leaving procedure and the immediate takeover scheme), CAN runs a background zone reassignment approach to ensure that the system tends back toward one zone per node. • The “multiple hash functions” and “multiple coordinate spaces” proposed by CAN each can map a key onto multiple points and accordingly replicate a key/value pair at multiple nodes. Pastry/Chord/Tapestry relies on applications to replicate data. • The routing of CAN described in 2.1 and 3.3 may traverse farther than necessary (in the number of hops) because each node has information of only its neighbors. • That a node can randomly choose a point P in the space and join the zone which P belongs to exposes a vulnerability of the system. An attacker may keep inserting nodes around a target node Q until Q is surrounded by malicious nodes; the message routing to and from Q is then under control of the attacker. • This paper omitted two issues of those nodes responsible for the zones around the center of the coordinate space: (i) they tend to experience more loads as intermediate nodes for other routes, and (ii) they tend to be favored in the sense that they are closer to most zones than the nodes in the corners. • Koorde combines Chord with de Bruijn graph. It demonstrates the lookup using a de Bruijn graph in a sparse-populated identifier space (that is, an incomplete graph containing a lot of imaginary nodes). • Koorde proves and achieves the lower bounds of (i) O(logn) hops per lookup with 2 neighbors per node and, to provide fault tolerance, (ii) O(logn/loglogn) hops per lookup with O(logn) neighbors per node. • Koorde optimizes the properties mentioned above at the cost of potential load imbalance. • Koorde’s de Bruijn pointer is a performance optimization; it can’t adapt node arrivals and departures/failures (that is, self-stabilize) as well as Pastry/Chord/Tapestry. From kelvinso Thu Feb 2 11:52:25 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-3.6 required=5.0 tests=ALL_TRUSTED,AWL,BAYES_00, HTML_10_20,HTML_MESSAGE autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe1.cs.cornell.edu (exchfenlb-1.cs.cornell.edu [128.84.97.33]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GqPr09868 for ; Thu, 2 Feb 2006 11:52:25 -0500 (EST) Received: from EXCHVS2.cs.cornell.edu ([128.84.97.24]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 2 Feb 2006 11:33:30 -0500 X-MimeOLE: Produced By Microsoft Exchange V6.5 Content-class: urn:content-classes:message MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C62816.66735679" Subject: Paper 3 Date: Thu, 2 Feb 2006 11:33:29 -0500 Message-ID: <2AA53C9C489B0049B43E56D28088677B2B58D4@EXCHVS2.cs.cornell.edu> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Paper 3 Thread-Index: AcYoFmZu33NezQoOQJiao9JzfmjxDQ== From: "Kelvin So" To: X-OriginalArrivalTime: 02 Feb 2006 16:33:30.0445 (UTC) FILETIME=[66B94FD0:01C62816] This is a multi-part message in MIME format. ------_=_NextPart_001_01C62816.66735679 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable =93A Scalable Content-Addressable Network=94 by Ratnasamy presents scalable, fault-tolerant and self-organizing distributed hash table = structure which is similar to Chord and Pastry. However, in CAN, the node id and = object id lie on a d-dimensional Castesian space, whereas Chord and Pastry node = id and object id lie on a circular id space. Using a uniform hash function, = a node is mapped to a point P and it is assigned to node which also maps = to point P. Each node maintains a coordinate routing table to hold each of = its immediate neighbors in the coordinate space. Using the routing table, = one can route from one point to the other by simple greedy forwarding to the = neighbor with coordinates closest to the destination. The average routing path is O(n^(1/d) and only need to maintains 2d neighbors. As d grows bigger, = the size of routing table grows bigger while the average routing path = becomes smaller.=20 To join the network, a new node randomly chooses a point P and send JOIN request. When it is routed to point P, it will simply split the = space with the node in whose zone P lies. It also gets the neighbor set from previous occupants. If a node dies, the neighbor will take over the zone which the dead node owns. To prevent a node to own more the one zone of = a time, there is zone reassignment algorithm running in the background to prevent this. There are also other different techniques to improve the = CAN algorithm, such as multiple coordinate space for replication, RTT as = routing metric, using more than one node to own the coordinate zones, multiple = hash functions, and uniform partitioning. One of the weaknesses of CAN is = that when a node joins the network, the node itself randomly picks a point P = to send JOIN request to. However, malicious nodes can pick a pick a set of points themselves to split the space to interrupt the routing. Also, it = is not as straightforward to recover from node failures. ------_=_NextPart_001_01C62816.66735679 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Paper 3

        “A = Scalable Content-Addressable Network” by Ratnasamy presents = scalable, fault-tolerant and self-organizing distributed hash table = structure which is similar to Chord and Pastry. However, in CAN, the = node id and object id lie on a d-dimensional Castesian space, whereas = Chord and Pastry node id and object id lie on a circular id space. Using = a uniform hash function, a node is mapped to a point P and it is = assigned to node which also maps to point P. Each node maintains a = coordinate routing table to hold each of its immediate neighbors in the = coordinate space. Using the routing table, one can route from one point = to the other by simple greedy forwarding to the neighbor with = coordinates closest to the destination. The average routing path is = O(n^(1/d) and only need to maintains 2d neighbors. As d grows bigger, = the size of routing table grows bigger while the average routing path = becomes smaller.
        To join the network, a new = node randomly chooses a point P and send JOIN request. When it is routed = to point P, it will simply split the space with the node in whose zone P = lies. It also gets the neighbor set from previous occupants. If a node = dies, the neighbor will take over the zone which the dead node owns. To = prevent a node to own more the one zone of a time, there is zone = reassignment algorithm running in the background to prevent this. There = are also other different techniques to improve the CAN algorithm, such = as multiple coordinate space for replication, RTT as routing metric, = using more than one node to own the coordinate zones, multiple hash = functions, and uniform partitioning. One of the weaknesses of CAN is = that when a node joins the network, the node itself randomly picks a = point P to send JOIN request to. However, malicious nodes can pick a = pick a set of points themselves to split the space to interrupt the = routing. Also, it is not as straightforward to recover from node = failures.

------_=_NextPart_001_01C62816.66735679-- From tudorm Thu Feb 2 11:53:20 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12GrKr10226 for ; Thu, 2 Feb 2006 11:53:20 -0500 (EST) Received: from exchfe1.cs.cornell.edu ([128.84.97.33]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 2 Feb 2006 11:53:20 -0500 Received: from [192.168.0.6] ([65.110.147.185]) by exchfe1.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 2 Feb 2006 11:53:19 -0500 Message-ID: <43E238D2.5010308> Date: Thu, 02 Feb 2006 11:52:34 -0500 From: Tudor Marian User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: egs+summary Subject: PAPER 3 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 02 Feb 2006 16:53:19.0765 (UTC) FILETIME=[2B9D2450:01C62819] CAN is another scalable, fault-tolerant self-organizing p2p overlay that provides a DHT interface. The way it works is by splitting the space represented as a logical d-dimensional carthesian torus into d-dimension hypercubes. Each node is responsible for such a chunk, more precisely a key-value pair (K, V) is deterministically mapped by taking the hash of K to a point in the torus, and the pair is stored at the node that is responsible for the hypercube containing the point. Each node maintains a table with 2d neighbors and the coordinate space they are responsible for, and as such for a d-dim space partitioned in n *equal* chunks the average routing path length is (d/4)(n^(1/d)). Nodes that arrive in the network split the zone of some previously extant node in the overlay, upon node departure zones are merged (if possible) into some of the neighbor's zone. The CAN logical structure suffers from fragmentation when faced with churn. Suppose one or more nodes depart, in that case at some point some node left in the region that was depleted is faced with the acquisition of the empty space after the TAKEOVER protocol (suppose for a moment that the gap is not very large). With very low probability this node will be able to seamlessly merge the new zone with its previous one, hence the node will be responsible for 2 zones that in the best case share a sub-hypercube. Therefore the node in question must maintain more than 2d routing information about the immediate neighboring zones. Moreover, without having the space split into *equal* zones, there are no asymptotic guarantees about the average routing path - worst case it could degenerate in something of a linear sequence of hypercubes (by this I mean a bunch of zones that follow one another in the same dimensions). Going back to the assumption, now suppose the gap is indeed large, and the metric of gluing the empty space to the neighbor responsible for the smallest volume zone, moreover suppose multiple nodes depart and arrive at the same time - one can notice the problem being exacerbated. Another issue is that if multiple realities are stored at the nodes, plus more than just the immediate neighbors, then the storage required by a node is quite large, especially since the ratio of the storage to the overlay-hop latency was not great to begin with - if compared to Pastry or Chord. The paper also mentions that overloading coordinate zones may be done either by splitting contents of hash tables or by replicating across the nodes in the zone, though it doesn't mention anything about the possible consistency models - this is not a trivial problem. With respect to the simulation results, Figure 6 is unreadable - it contains too many variables. Tudor From asg46 Thu Feb 2 11:58:55 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12Gwsr12634 for ; Thu, 2 Feb 2006 11:58:54 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k12GwqdU024268 for ; Thu, 2 Feb 2006 11:58:53 -0500 (EST) Received: from 128.84.98.90 by webmail.cornell.edu with HTTP; Thu, 2 Feb 2006 11:58:53 -0500 (EST) Message-ID: <2313.128.84.98.90.1138899533.squirrel@webmail.cornell.edu> Date: Thu, 2 Feb 2006 11:58:53 -0500 (EST) Subject: paper 3 From: "Abhishek Santosh Gupta" To: egs+summary User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal CAN- Content Addressable Network CAN achieves scalable peer-to-peer routing without imposing any form of rigid hierarchial structure. A CAN node maintains a routing table that holds the IP address and virtual zone of each of its immediate neighbours in a d-dimensional co-ordinate space. The routing policy is a greedy one which forwards to the neighbour node with the closest destination co-ordinates.In case a node loses all its neighbours, the protocol may temporarily fail until an expanding ring search rebuilds the tables. A better routing metric would be to forward it to the node with the maximum ratio of progress to RTT. Node Insertion : the new node randomly chooses a point in space and sends a join request for that point. This causes a part of the co-ordinate space(zone) to be split and the neighbours also have to be informed of this change. To balance zone volumes, the zone with the largest volume(in the neighbourhood) is split. Node departure : the leaving node has to explicitly hand over its zone to a neighbour node. In the case that it cannot do so ( failure), periodic update messages will detect this failure eventually and one of the neighbour nodes will takeover thiz zone ( the node which does the takeover will have the smallest zone volume amongst the neighbour nodes) DESIGN IMPROVEMENTS: multi-dimensional co-ordinate space reduces path length and path latency for a small increase in routing table size. multiple realities : Multiple independent coordinate spaces can be maintained such that each node has zones in each of these independent coordinate spaces. This enables to further reduce path lengths as a node can now choose amongst different zones. multiple realities improve fault tolerance abilities because if one co-ordinate space fails, the node can use the other independent spaces for routing. To improve data availability it suggests the use of multiple hash functions which map a single to different points on the coordinate space. Caching and Replication can be used to balance loads due to hotspots. CAN does not deal with nodes who are alive but do not forward packets. From ymir.vigfusson Thu Feb 2 12:47:22 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from uproxy.gmail.com (uproxy.gmail.com [66.249.92.193] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k12HlLr02066 for ; Thu, 2 Feb 2006 12:47:21 -0500 (EST) Received: by uproxy.gmail.com with SMTP id u40so390604ugc for ; Thu, 02 Feb 2006 09:47:21 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=L7Qx0j/a3poMLPsOfkvIQi9NUYsY6S+jJUghNWKbjJECjzKW5TWuQ2701/NptUjsUSVPdKGGaikKakbrB350zV+OwgfYNCOyIfTstyR5tph50khnfe5Le57NQAfl3mApfZsZmteVOocRKRFijsKPkcM11F8GoSrhzgEUhaUSNkA= Received: by 10.48.217.10 with SMTP id p10mr219203nfg; Thu, 02 Feb 2006 09:47:20 -0800 (PST) Received: by 10.49.43.1 with HTTP; Thu, 2 Feb 2006 09:47:20 -0800 (PST) Message-ID: <9302f1e20602020947w2692643cr5187dc9eef9e55d5@mail.gmail.com> Date: Thu, 2 Feb 2006 12:47:20 -0500 From: Ymir Vigfusson To: egs+summary Subject: PAPER 3 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_2185_22954689.1138902440051" ------=_Part_2185_22954689.1138902440051 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I didn't know exactly what you wanted us to write on CAN, so I'm giving a brief overview of the protocol, and superficial comparison to the previous entries. Like the previous work we have seen, CAN is a scheme for a distributed hash-table with support for insertion, deletion and lookups of keys to get values. A hash-function maps a (key,value) pair deterministically to a point P in the coordinate space of the network. Say a node N wants to join a CAN network. It looks up the CAN domain name in a DNS to get the IP address of a so-called bootstrap node. (To minimize the latency of routing, this step can incorporate timing measurements to some of these bootstrap nodes so you are placed in the network near physically nearby nodes). It now chooses a point P at random and sends a JOIN request to the network destined for that point. It will find a zone (area in the network) where P resides (or should reside= ) and settles there. The network structure is grid-like (or torus if we take the modulus into account) and seemingly different from the ring-based networks that we have been taking a look at. A node is responsible for a such a zone (collection of keys) and knows its immediate neighbours on all four sides. Node N now takes one of these zones (selected carefully to distribute the load) and splits it in half either horizontally or vertically. The previous occupant and N are now responsible for half as many keys as before. The neighbours are notified or this change, N retrieve= s them from the previous occupant of the zone. In a d-dimensional space this affects O(d) existing nodes. If a node N wants to part the network (or dies) either the zone will be merged back to a neighbour, or in the worst case a takeover node will be temporarily assigned to the zone entries. To avoid that hash information gets lost, holders of the actual data will periodically advertise that they are indeed holding it. As an alternative, CAN can support replication of data so that it will not get lost. The discovery of takeover nodes is based on the prefixes of the node IDs, which makes the fallback mechanism similar to that of Pastry et al, except that it's used for network structur= e recovery instead of actual routing. As for routing, we greedily forward packets to the neighbours that are closer by than we are w.r.t. key names. The main thing to notice is that fault-resistance comes naturally because there are multiple paths between nodes so if a node on the way crashes, the routing mechanism will bypass it= . Some numbers: If the d-dimensional space is uniformly partitioned, every node has 2d neighbours and the average pathlength between a pair of nodes is dn^{1/d} / 4. Picking d =3D log n we will get the special case where routing table and average length are both O(log n). In addition to the essay, I consulted Sylvia Ratnasamy's Ph.D. thesis at http://www.icir.org/sylvia/thesis.ps (which actually has the paper as a subset). - Ymir ------=_Part_2185_22954689.1138902440051 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I didn't know exactly what you wanted us to write on CAN, so I'm
giving = a brief overview of the protocol, and superficial comparison
to the prev= ious entries.

Like the previous work we have seen, CAN is a scheme f= or a distributed
hash-table with support for insertion, deletion and lookups of keys
= to get values. A hash-function maps a (key,value) pair deterministicallyto a point P in the coordinate space of the network.

Say a node N w= ants to join a CAN network. It looks up the CAN domain name
in a DNS to get the IP address of a so-called bootstrap node.
(To mi= nimize the latency of routing, this step can incorporate timing
measurem= ents to some of these bootstrap nodes so you are placed in the
network n= ear physically nearby nodes). It now chooses a point P at
random and sends a JOIN request to the network destined for that point.=
It will find a zone (area in the network) where P resides (or should re= side)
and settles there. The network structure is grid-like (or torus if= we
take the modulus into account) and seemingly different from the ring-ba= sed
networks that we have been taking a look at. A node is responsible f= or a
such a zone (collection of keys) and knows its immediate neighbours= on
all four sides. Node N now takes one of these zones (selected carefully=
to distribute the load) and splits it in half either horizontally orvertically. The previous occupant and N are now responsible for half as
many keys as before. The neighbours are notified or this change, N retr= ieves
them from the previous occupant of the zone. In a d-dimensional sp= ace this
affects O(d) existing nodes.

If a node N wants to part t= he network (or dies) either the zone will be merged
back to a neighbour, or in the worst case a takeover node will be
te= mporarily assigned to the zone entries. To avoid that hash information
g= ets lost, holders of the actual data will periodically advertise that they
are indeed holding it. As an alternative, CAN can support replication o= f
data so that it will not get lost. The discovery of takeover nodes is<= br>based on the prefixes of the node IDs, which makes the fallback mechanis= m
similar to that of Pastry et al, except that it's used for network stru= cture
recovery instead of actual routing.

As for routing, we gree= dily forward packets to the neighbours that are closer
by than we are=20 w.r.t. key names. The main thing to notice is that
fault-resistance come= s naturally because there are multiple paths between
nodes so if a node = on the way crashes, the routing mechanism will bypass it.

Some numbe= rs: If the d-dimensional space is uniformly partitioned, every
node has 2d neighbours and the average pathlength between a pair of nod= es
is dn^{1/d} / 4. Picking d =3D log n we will get the special case whe= re
routing table and average length are both O(log n).

In additio= n to the essay, I consulted Sylvia Ratnasamy's=20 Ph.D. thesis
at http://= www.icir.org/sylvia/thesis.ps (which actually has the paper
as a sub= set).

- Ymir

------=_Part_2185_22954689.1138902440051-- From victoria Tue Feb 7 10:40:21 2006 Return-Path: Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k17FeKr10794 for ; Tue, 7 Feb 2006 10:40:20 -0500 (EST) Received: by turing.cs.hmc.edu (Postfix, from userid 34382) id CD55C5323F; Tue, 7 Feb 2006 07:40:14 -0800 (PST) Date: Tue, 7 Feb 2006 07:40:14 -0800 From: Victoria Krafft To: egs+summary Subject: PAPER 3 Message-ID: <20060207154014.GA11797> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i Sorry this is late - I had typed it up before class Thursday, but apparently it never got emailed out. CAN is a very interesting system, which provides a content-addressable, scalable, network. It is based on a d-dimensional coordinate system on a torus, and each node in the system is assigned to a section of the coordinate space. All the content is hashed, and stored on the node responsible for the corresponding section of the coordinate space. To pass information between nodes, each node keeps track of all the nodes which share an edge in the coordinate space. This means that each node only needs to track 2d neighbors, and the average path length across the network grows at O(n^(1/d)). Nodes join by finding any node in the network, and splitting the coordinate space of that gateway node. Because joins do not require messages across the network to get routing information, CAN may not have as many problems with churn and sudden popularity. Node failures are detected by pinging neighbors, and one of the neighbors of the failed node replaces it. There are several optimizations which are suggested. Some of them could be used on the other p2p networks we've looked at, increasing performance and reliability. One interesting proposal is the idea of multiple realities. Here, each node is part of several coordinate spaces, providing multiple copies of each piece of data, and multiple routes to that data. This reduces the number of hops, and improves reliability. In a similar fashion, sections of the coordinate space (zones) can be overloaded, so that 3 nodes share a zone. Each of those nodes stores a route into all the neighboring zones, but it only stores the fastest route. The network can be constructed in a way which takes into account latencies in the underlying IP network. Also, the load balance on the network can be improved by splitting the largest zone in the vicinity of the gateway node. The DHTs we've already studied could take advantage of some of these ideas. Joining by getting given a random node in the network, then copying its routing table and leaf set, would cut down on the traffic generated by nodes joining the network. Multiple realities and assigning multiple nodes to the same area can both be done with DHTs, and they would improve reliability. -- Victoria Krafft