From lackhand@gmail.com Wed Mar 8 22:14:22 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.195] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k293EMt28417 for ; Wed, 8 Mar 2006 22:14:22 -0500 (EST) Received: by zproxy.gmail.com with SMTP id 13so337751nzp for ; Wed, 08 Mar 2006 19:14:18 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=E8Dxp8XSJbDPbQ24jN2jVl7TAKkkzT2nczPbUmlxwTmD5dbMLvfzLHJufm857gtgCEV/XSyy1fK9rkiVOfnkPy6hdtTZmsAilj/h6XqMx0l9UNt2TCq6MfQiM4CgWMARBMwSK9ozOHI0QeMKIKm5Jpogiedpa6cwX5tHGwKUvxk= Received: by 10.37.12.30 with SMTP id p30mr103830nzi; Wed, 08 Mar 2006 19:14:18 -0800 (PST) Received: by 10.36.147.10 with HTTP; Wed, 8 Mar 2006 19:14:18 -0800 (PST) Message-ID: <9aa7a97d0603081914s4b85c1f4j641c462d1384b336@mail.gmail.com> Date: Wed, 8 Mar 2006 22:14:18 -0500 From: "Andrew Cunningham" To: egs+summary@cs.cornell.edu Subject: PAPER 13 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_761_6438695.1141874058512" ------=_Part_761_6438695.1141874058512 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham arc39 _Resilient_Overlay_Networks_ David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris RON lets distributed internet applications detect and recover from path outages and periods of degraded performance much faster than current performance tuning systems (scant seconds versus several minutes, minimum). Though it exists on top of the internet, it operates as a sort of application level switch, determining whether it would be better to route a given packet over the internet directly, or through the overlay, to reach its destination. This allows improvement over normal Internet routing, and moreover, they discovered that forwarding packets by at most one hop along this intermediate protocol (a->b->c versus a->c) was (sufficiently) beneficial in most cases. This improvement comes at a price, namely a more complex view of the network topology separating two points; rather than the current system of using summaries, RON requires a richer view of connectivity. The RON is explicitly designed to be limited in size, to facilitate aggressive path maintenance via probing, without excessive bandwidth overhead, thus lending this (as is properly noted) to a specifically targetted platform -- despite countless comparisons, this is not intended to replace BGP, but to work alongside it, since BGP is far mor= e scalable. Another interesting feature is that as this work is being done on a 'local' scale, the heuristics used to define "failures" and "faults" may be tuned to the application, which provides much better behavior than traditional BGP for certain domains. The gains of RON are narrow and slight, though present. The basic conceit is reasonable, and so long as RON picks no incorrect route, there i= s no reason to suppose that introducing 'delay' in the form of user-level redirection would be expensive. However, the good qualities of RON rely on the correct functioning of BGP, and the paper (quietly) admits that the system is not scalable to replace the current status quo. Thus this must be used in concert with present systems as an additional layer; this too is reasonable, though significantly less impressive when the central idea is realized to be "route around trouble spots in a reactive fashion". There ar= e reasons to believe that the gains must be as slight as they are: for one thing, to get truly optimal paths would be more expensive than simply using a sub-optimal path; for another, since this must run over the internet (and not dedicated fiber), there is a natural limit to the amount of benefit tha= t controlled switching can gain. Security is addressed only to be dismissed, which is something of a shortfall: the internet already exists, and so anything intending to improve its reliability must address all of its shortcomings. This system is more easily spoofed than the internet, since less physical labor is involved -- people actually request that you forward packets for them!-- and while applications can (successfully) layer any defenses they'd like on to insecure delivery mechanisms, the system is in the unique position of providing secure primitives for not-much-more-work. _One_Hop_Lookups_for_Peer-to-Peer-Overlays Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues The take-away from this paper is that it is unnecessary to assume that routing information at each member node must be kept small to ensure that the information is fresh. Indeed, the bookkeeping required to keep this information current can be handled through dissemenation trees, scaling wel= l with even very large rates of joins/leaves. This then dictates that lookups have a lower latency, since we need not contact several nodes, but simply a= t most one other node. We've seen the algorithm used here before, and the sam= e comments hold: namely, they do this by implementing a three level hierarchy (slices and units of the address space, with each of k equal slices having = a slice leader which manages its k' units (each with leader); each unit leade= r informs the members of its unit of all information passed down from the slice leader) through a gossip-like protocol. This is lower overhead than broadcasting every message, and utilizes caching to combine several small messages into a single large message, but suffers in the end as it places enormous strain on the unit- and slice- leaders, who must collate a lot of information; similarly, if a slice- or unit- leader fails, then it must be replaced -- this is expensive as they must maintain a lot of state. The paper's (legitimate) point is that you can reduce the slice leader'= s enormous output requirement by tuning the length of time it waits to accumulate messages in order to decrease the amount of messages it must send. While this helps, it is not sufficient: They choose 23 seconds for their accumulation time, which is very long on a computer-scale, which defeats the point in many applications of one hop. For an extreme example, in the previous paper we were impressed by a few seconds for routing around minutes-long IP blocks; this paper is firmly between the two. They also mak= e the point that in many environments, there is much less churn than in the general internet. This is a boon for the paper, because the system will onl= y suffer when there are multiple concurrent events; in that case, the slice- and unit- leaders are forced to question why they were willing to serve in that position, since they are suddenly making enormous increases to their workload. In the stable case, however, this is a beautiful, sedentary algorithm -- which forces one to question why anyone in a churn-heavy environment would volunteer for slice leader duty? ------=_Part_761_6438695.1141874058512 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Andrew Cunningham
arc39
    _Resilient_Overlay_Networks_
    David Andersen, Hari Balakrishnan, Frans Kaashoek, and R= obert Morris
    
    RON lets distributed internet applications detect and recover from path outages and periods of degraded performance much faster than current performance tuning systems (scant seconds versus several minutes, minimum). Though it exists on top of the internet, it operates as a sort of application level switch, determining whether it would be better to route a given packet over the internet directly, or through the overlay, to reach its destination. This allows improvement over normal Internet routing, and moreover, they discovered that forwarding packets by at most one hop along this intermediate protocol (a->b->c versus a->c) was (sufficiently) beneficial in most cases. This improvement comes at a price, namely a more complex view of the network topology separating two points; rather than the current system of using summaries, RON requires a richer view of connectivity. The RON is explicitly designed to be limited in size, to facilitate aggressive path maintenance via probing, without excessive bandwidth overhead, thus lending this (as is properly noted) to a specifically targetted platform -- despite countless comparisons, this is not intended to replace BGP, but to work alongside it, since BGP is far more scalable. Another interesting feature is that as this work is being done on a 'local' scale, the heuristics used to define "failures= " and "faults" may be tuned to the application, which provides much better behavior than traditional BGP for certain domains.
    The gains of RON are narrow and slight, though present. The basic conceit is reasonable, and so long as RON picks no incorrect route, there is no reason to suppose that introducing 'delay' in the form of user-level redirection would be expensive. However, the good qualities of RON rely on the correct functioning of BGP, and the paper (quietly) admits that the system is not scalable to replace the current status quo. Thus this must be used in concert with present systems as an additional layer; this too is reasonable, though significantly less impressive when the central idea is realized to be "route around trouble spots in a reactive fashion". There are rea= sons to believe that the gains must be as slight as they are: for one thing, to get truly optimal paths would be more expensive than simply using a sub-optimal path; for another, since this must run over the internet (and not dedicated fiber), there is a natural limit to the amount of benefit that controlled switching can gain. Security is addressed only to be dismissed, which is something of a shortfall: the internet already exists, and so anything intending to improve its reliability must address all of its shortcomings. This system is more easily spoofed than the internet, since less physical labor is involved -- people actually request that you forward packets for them!-- and while applications can (successfully) layer any defenses they'd like on to insecure delivery mechanisms, the system is in the unique position of providing secure primitives for not-much-more-work.
    
    _One_Hop_Lookups_for_Peer-to-Peer-Overlays
    Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues
    
    The take-away from this paper is that it is unnecessary to assume that routing information at each member node must be kept small to ensure that the information is fresh. Indeed, the bookkeeping required to keep this information current can be handled through dissemenation trees, scaling well with even very large rates of joins/leaves. This then dictates that lookups have a lower latency, since we need not contact several nodes, but simply at most one other node. We've seen the algorithm used here before, and the same comments hold: namely, they do this by implementing a three level hierarchy (slices and units of the address space, with each of k equal slices having a slice leader which manages its k' units (each with leader); each unit leader informs the members of its unit of all information passed down from the slice leader) through a gossip-like protocol. This is lower overhead than broadcasting every message, and utilizes caching to combine several small messages into a single large message, but suffers in the end as it places enormous strain on the unit- and slice- leaders, who must collate a lot of information; similarly, if a slice- or unit- leader fails, then it must be replaced --  this is expensive as they must maintain a lot of state.
    The paper's (legitimate) point is that you can reduce the slice leader's enormous output requirement by tuning the length of time it waits to accumulate messages in order to decrease the amount of messages it must send. While this helps, it is not sufficient: They choose 23 seconds for their accumulation time, which is very long on a computer-scale, which defeats the point in many applications of one hop. For an extreme example, in the previous paper we were impressed by a few seconds for routing around minutes-long IP blocks; this paper is firmly between the two. They also make the point that in many environments, there is much less churn than in the general internet. This is a boon for the paper, because the system will only suffer when there are multiple concurrent events; in that case, the slice- and unit- leaders are forced to question why they were willing to serve in that position, since they are suddenly making enormous increases to their workload. In the stable case, however, this is a beautiful, sedentary algorithm -- which forces one to question why anyone in a churn-heavy environment would volunteer for slice leader duty? ------=_Part_761_6438695.1141874058512-- From sh366@cornell.edu Thu Mar 9 00:06:31 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k2956Ut25542 for ; Thu, 9 Mar 2006 00:06:30 -0500 (EST) Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k2956Uvh027927 for ; Thu, 9 Mar 2006 00:06:30 -0500 (EST) Message-ID: <1029350632.1141880788098.JavaMail.webber@orpheus3.dataserver.cornell.edu> Date: Thu, 9 Mar 2006 00:06:29 -0500 (EST) From: Huang Shiang-Jia To: egs+summary@cs.cornell.edu Subject: PAPER 13 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: uPortal WEB email client 3.0 * Both of the papers to be discussed today are overlay routing systems that aim to improve Internet reliability, to detect and recover from path outage and degraded performance. * RON works by 'deploying' nodes in different domains that monitor the quality of underlying Internet and route packets for each other. SOSR, instead, routes messages through a small set of 'randomly chosen' intermediaries when failure occurs. * The background path monitoring overhead in RON is high. It is thus suitable only if the overlay is small-scaled. (As indicated in the paper, it scales to about 50 nodes only.) * SOSR is less successful at routing around last-hop network failures. Its alternative routing scheme is thus suitable for some cases only (servers rather than broadband hosts, for instance). * RON (Resilient Overlay Network) is an application-level overlay that is designed (1) to quickly (within 20 seconds) detect and recover from path outage and (2) to improve the performance of loss rate, latency and throughput. * Nodes residing in different routing domains comprise a RON. They (1) aggressively probe and monitor the path connecting them, (2) exchange information about the quality of paths among themselves, (3) build forwarding tables on the basis of several path metrics, such as latency, loss rate and throughput, and (4) forward data on behalf of any communicating nodes in the RON. * RON tightly integrates the routing and path selection with applications. Applications can prioritize some metrics over others in their path selection. This demonstrates the benefits of moving some of the control over routing to end-systems. * The experimental results show that forwarding packets via at most one RON node is sufficient to overcome faults and improve performance in most cases. Therefore part of their design focus on the finding better paths via a single intermediate RON node. * This paper first conducts a measurement study of Internet path failures and found that most Internet paths worked well (availabilities range from 99.6% for servers and 94.4% for broadband hosts). * As no failure is the common case, they propose an approach that avoids the overhead to monitor paths and scales well: SOSR (Scalable One-hop Source Routing). SOSR attempts to recover from path failures by routing indirectly through a small set of random chosen intermediaries. Because the intermediaries are randomly chosen, no a priori knowledge of Internet states are needed. * They show that a stateless policy called 'random-4' quickly finds alternative paths for failures. It works as follows: when a node detects a path failure, it selects 4 intermediaries and attempts to reroute messages through them, until (1) one of the intermediaries succeeds, (2) the path self-repairs or (3) four attempts have failed. * According to their survey, broadband hosts experienced significantly more last-hop failures than servers. SOSR is less suitable for broadband hosts since it can't route around such last-hop failures. From lackhand@gmail.com Thu Mar 9 00:23:07 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,HTML_00_10, HTML_MESSAGE autolearn=no version=3.1.0 X-Spam-Level: Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.207] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295N5t29926 for ; Thu, 9 Mar 2006 00:23:05 -0500 (EST) Received: by zproxy.gmail.com with SMTP id l1so381010nzf for ; Wed, 08 Mar 2006 21:23:05 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=hPLbg/OJqLBD0XGEI8kyTfKiq4qIs1kmnfvDGAQcKT5STunsnI7fTex2Muwwae5C7639hkGzDg7oFhFZBPGMP+EPTyas0zOO/KG5RVsHD1hfktL07R/V+pSlkirOFUDAmvoNJICKVpRUFMXDx+YTvWyTIIn9siYvt1tkk8uKOPM= Received: by 10.36.4.18 with SMTP id 18mr3001893nzd; Wed, 08 Mar 2006 21:23:04 -0800 (PST) Received: by 10.36.147.10 with HTTP; Wed, 8 Mar 2006 21:23:04 -0800 (PST) Message-ID: <9aa7a97d0603082123y6a898d90pf20b08e81b13664d@mail.gmail.com> Date: Thu, 9 Mar 2006 00:23:04 -0500 From: "Andrew Cunningham" To: egs+summary@cs.cornell.edu Subject: PAPER 13 MIME-Version: 1.0 X-Security: message sanitized on sundial.cs.cornell.edu See http://www.impsec.org/email-tools/sanitizer-intro.html for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 X-Security: The postmaster has not enabled quarantine of poisoned messages. Content-Type: multipart/alternative; boundary="----=_Part_474_11781779.1141881784926" ------=_Part_474_11781779.1141881784926 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I *knew* something looked familiar! Andrew Cunningham arc39 _Resilient_Overlay_Networks_ David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris RON lets distributed internet applications detect and recover from path outages and periods of degraded performance much faster than current performance tuning systems (scant seconds versus several minutes, minimum). Though it exists on top of the internet, it operates as a sort of application level switch, determining whether it would be better to route a given packet over the internet directly, or through the overlay, to reach its destination. This allows improvement over normal Internet routing, and moreover, they discovered that forwarding packets by at most one hop along this intermediate protocol (a->b->c versus a->c) was (sufficiently) beneficial in most cases. This improvement comes at a price, namely a more complex view of the network topology separating two points; rather than the current system of using summaries, RON requires a richer view of connectivity. The RON is explicitly designed to be limited in size, to facilitate aggressive path maintenance via probing, without excessive bandwidth overhead, thus lending this (as is properly noted) to a specifically targetted platform -- despite countless comparisons, this is not intended to replace BGP, but to work alongside it, since BGP is far mor= e scalable. Another interesting feature is that as this work is being done on a 'local' scale, the heuristics used to define "failures" and "faults" may be tuned to the application, which provides much better behavior than traditional BGP for certain domains. The gains of RON are narrow and slight, though present. The basic conceit is reasonable, and so long as RON picks no incorrect route, there i= s no reason to suppose that introducing 'delay' in the form of user-level redirection would be expensive. However, the good qualities of RON rely on the correct functioning of BGP, and the paper (quietly) admits that the system is not scalable to replace the current status quo. Thus this must be used in concert with present systems as an additional layer; this too is reasonable, though significantly less impressive when the central idea is realized to be "route around trouble spots in a reactive fashion". There ar= e reasons to believe that the gains must be as slight as they are: for one thing, to get truly optimal paths would be more expensive than simply using a sub-optimal path; for another, since this must run over the internet (and not dedicated fiber), there is a natural limit to the amount of benefit tha= t controlled switching can gain. Security is addressed only to be dismissed, which is something of a shortfall: the internet already exists, and so anything intending to improve its reliability must address all of its shortcomings. This system is more easily spoofed than the internet, since less physical labor is involved -- people actually request that you forward packets for them!-- and while applications can (successfully) layer any defenses they'd like on to insecure delivery mechanisms, the system is in the unique position of providing secure primitives for not-much-more-work. _One_Hop_Lookups_for_Peer-to-Peer-Overlays Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues The take-away from this paper is that it is unnecessary to assume that routing information at each member node must be kept small to ensure that the information is fresh. Indeed, the bookkeeping required to keep this information current can be handled through dissemenation trees, scaling wel= l with even very large rates of joins/leaves. This then dictates that lookups have a lower latency, since we need not contact several nodes, but simply a= t most one other node. We've seen the algorithm used here before, and the sam= e comments hold: namely, they do this by implementing a three level hierarchy (slices and units of the address space, with each of k equal slices having = a slice leader which manages its k' units (each with leader); each unit leade= r informs the members of its unit of all information passed down from the slice leader) through a gossip-like protocol. This is lower overhead than broadcasting every message, and utilizes caching to combine several small messages into a single large message, but suffers in the end as it places enormous strain on the unit- and slice- leaders, who must collate a lot of information; similarly, if a slice- or unit- leader fails, then it must be replaced -- this is expensive as they must maintain a lot of state. The paper's (legitimate) point is that you can reduce the slice leader'= s enormous output requirement by tuning the length of time it waits to accumulate messages in order to decrease the amount of messages it must send. While this helps, it is not sufficient: They choose 23 seconds for their accumulation time, which is very long on a computer-scale, which defeats the point in many applications of one hop. For an extreme example, in the previous paper we were impressed by a few seconds for routing around minutes-long IP blocks; this paper is firmly between the two. They also mak= e the point that in many environments, there is much less churn than in the general internet. This is a boon for the paper, because the system will onl= y suffer when there are multiple concurrent events; in that case, the slice- and unit- leaders are forced to question why they were willing to serve in that position, since they are suddenly making enormous increases to their workload. In the stable case, however, this is a beautiful, sedentary algorithm -- which forces one to question why anyone in a churn-heavy environment would volunteer for slice leader duty? _Improving_the_Reliability_of_Internet_Paths_With_One_Hop_Source_Routing_ Krishna P. Gummadi, Harsha V. Madhyastha, Steven D. Gribble, Henry M. Levy, and David Wetherall This paper proposes a simple, scalable approach to recover from Interne= t path failures. It reiterates the take-away of the previous paper, ie, that most path failures may be mitigated through the simple one-hop rerouting technique. It improves on RON's background monitoring storm, thus increasin= g the level of scalability. Both are bounded by the discovery that many failures are located that many failures are located so close to the destination that no alternative routing or overlay scheme can avoid them: 16% of failures on paths to servers, and 60% of failures on paths to broadband hosts were last-hop or end-system failures, which neither scheme can address. However, 66% of all popular server failures and 39% of all broadband host path failures are potentially soluble with a simple hop. Mor= e attention is also given to the analysis of how to choose alternative routin= g ideas; eventually, evidence dictates that attempting one-hop routing throug= h (iteratively) each of 4 separate nodes is effective. No attention is paid to security, which is somewhat of a problem since this paper is building upon the current internet standards; since this is built on top of the current internet stratum, it should make guarantees at least as strong as the internet for the integrity, not just speed, of transmitted data; this is not done. There is no workload sharing done -- I presume that repeated connections to a given node are disallowed via that node refusing to host further routes than some maximum. Since we only try 4 random-4 samples, this is a total of 16 nodes out of however many participate in this system; this is unlikely to scale out of control but it is possible for this system to deny services it could have granted (due to = a poor choice of random intermediaries). ------=_Part_474_11781779.1141881784926 Content-Type: text/html; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable I *knew* something looked familiar!
Andrew Cunningham
arc39
    _Resilient_Overlay_Networks_
    David Andersen, Hari Balakrishnan, Frans Kaashoek, and R= obert Morris
    
    RON lets distributed internet applications detect and recover from path outages and periods of degraded performance much faster than current performance tuning systems (scant seconds versus several minutes, minimum). Though it exists on top of the internet, it operates as a sort of application level switch, determining whether it would be better to route a given packet over the internet directly, or through the overlay, to reach its destination. This allows improvement over normal Internet routing, and moreover, they discovered that forwarding packets by at most one hop along this intermediate protocol (a->b->c versus a->c) was (sufficiently) beneficial in most cases. This improvement comes at a price, namely a more complex view of the network topology separating two points; rather than the current system of using summaries, RON requires a richer view of connectivity. The RON is explicitly designed to be limited in size, to facilitate aggressive path maintenance via probing, without excessive bandwidth overhead, thus lending this (as is properly noted) to a specifically targetted platform -- despite countless comparisons, this is not intended to replace BGP, but to work alongside it, since BGP is far more scalable. Another interesting feature is that as this work is being done on a 'local' scale, the heuristics used to define "failures= " and "faults" may be tuned to the application, which provides much better behavior than traditional BGP for certain domains.
    The gains of RON are narrow and slight, though present. The basic conceit is reasonable, and so long as RON picks no incorrect route, there is no reason to suppose that introducing 'delay' in the form of user-level redirection would be expensive. However, the good qualities of RON rely on the correct functioning of BGP, and the paper (quietly) admits that the system is not scalable to replace the current status quo. Thus this must be used in concert with present systems as an additional layer; this too is reasonable, though significantly less impressive when the central idea is realized to be "route around trouble spots in a reactive fashion". There are rea= sons to believe that the gains must be as slight as they are: for one thing, to get truly optimal paths would be more expensive than simply using a sub-optimal path; for another, since this must run over the internet (and not dedicated fiber), there is a natural limit to the amount of benefit that controlled switching can gain. Security is addressed only to be dismissed, which is something of a shortfall: the internet already exists, and so anything intending to improve its reliability must address all of its shortcomings. This system is more easily spoofed than the internet, since less physical labor is involved -- people actually request that you forward packets for them!-- and while applications can (successfully) layer any defenses they'd like on to insecure delivery mechanisms, the system is in the unique position of providing secure primitives for not-much-more-work.
    
    _One_Hop_Lookups_for_Peer-to-Peer-Overlays
    Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues
    
    The take-away from this paper is that it is unnecessary to assume that routing information at each member node must be kept small to ensure that the information is fresh. Indeed, the bookkeeping required to keep this information current can be handled through dissemenation trees, scaling well with even very large rates of joins/leaves. This then dictates that lookups have a lower latency, since we need not contact several nodes, but simply at most one other node. We've seen the algorithm used here before, and the same comments hold: namely, they do this by implementing a three level hierarchy (slices and units of the address space, with each of k equal slices having a slice leader which manages its k' units (each with leader); each unit leader informs the members of its unit of all information passed down from the slice leader) through a gossip-like protocol. This is lower overhead than broadcasting every message, and utilizes caching to combine several small messages into a single large message, but suffers in the end as it places enormous strain on the unit- and slice- leaders, who must collate a lot of information; similarly, if a slice- or unit- leader fails, then it must be replaced --  this is expensive as they must maintain a lot of state.
    The paper's (legitimate) point is that you can reduce the slice leader's enormous output requirement by tuning the length of time it waits to accumulate messages in order to decrease the amount of messages it must send. While this helps, it is not sufficient: They choose 23 seconds for their accumulation time, which is very long on a computer-scale, which defeats the point in many applications of one hop. For an extreme example, in the previous paper we were impressed by a few seconds for routing around minutes-long IP blocks; this paper is firmly between the two. They also make the point that in many environments, there is much less churn than in the general internet. This is a boon for the paper, because the system will only suffer when there are multiple concurrent events; in that case, the slice- and unit- leaders are forced to question why they were willing to serve in that position, since they are suddenly making enormous increases to their workload. In the stable case, however, this is a beautiful, sedentary algorithm -- which forces one to question why anyone in a churn-heavy environment would volunteer for slice leader duty?
    
    _Improving_the_Reliability_of_Internet_Paths_With_One_Ho= p_Source_Routing_
    Krishna P. Gummadi, Harsha V. Madhyastha, Steven D. Grib= ble, Henry M. Levy, and David Wetherall
    
    This paper proposes a simple, scalable approach to recover from Internet path failures. It reiterates the take-away of the previous paper, ie, that most path failures may be mitigated through the simple one-hop rerouting technique. It improves on RON's background monitoring storm, thus increasing the level of scalability. Both are bounded by the discovery that many failures are located that many failures are located so close to the destination that no alternative routing or overlay scheme can avoid them: 16% of failures on paths to servers, and 60% of failures on paths to broadband hosts were last-hop or end-system failures, which neither scheme can address. However, 66% of all popular server failures and 39% of all broadband host path failures are potentially soluble with a simple hop. More attention is also given to the analysis of how to choose alternative routing ideas; eventually, evidence dictates that attempting one-hop routing through (iteratively) each of 4 separate nodes is effective.
    No attention is paid to security, which is somewhat of a problem since this paper is building upon the current internet standards; since this is built on top of the current internet stratum, it should make guarantees at least as strong as the internet for the integrity, not just speed, of transmitted data; this is not done. There is no workload sharing done -- I presume that repeated connections to a given node are disallowed via that node refusing to host further routes than some maximum. Since we only try 4 random-4 samples, this is a total of 16 nodes out of however many participate in this system; this is unlikely to scale out of control but it is possible for this system to deny services it could have granted (due to a poor choice of random intermediaries). ------=_Part_474_11781779.1141881784926-- From ids3@cornell.edu Thu Mar 9 00:30:16 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295UGt01279 for ; Thu, 9 Mar 2006 00:30:16 -0500 (EST) Received: from [128.253.212.208] (r253212208.resnet.cornell.edu [128.253.212.208]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k295UFEb024033 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 9 Mar 2006 00:30:15 -0500 (EST) Message-ID: <440FBD68.70107@cornell.edu> Date: Thu, 09 Mar 2006 00:30:16 -0500 From: Ivan Stoyanov User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: egs+summary@cs.cornell.edu Subject: PAPER 13 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Resilient Overlay Networks Lower level routing algorithms like BGP are sometimes slow to respond to failures and reconfigure themselves to use alternative paths. The paper proposes a "do-it-yourself" solution to this problem, where nodes in a distributed application form an overlay which can be used as an alternative routing substrate. Nodes monitor the state of the links between themselves and choose an alternative path when an outage occurs. This happens much faster then the adaptation of the underlying routing. Since the system keeps n^2 number of links only a single node is enough to form an alternative path. The tight coupling between the client application and the routing provides additional benefits like the ability to choose a specific path metric for each application, as well as application specific notions of what constitutes a network failure. Overlay routing also allows routes to use private peering paths between different autonomous systems that are unavailable for traditional routing. The paper presents a great and simple idea. The biggest drawback is that it proposes that every node knows every other node, which does not scale well. Improving the Reliability of Internet Paths with One-hop Source Routing The paper presents optimization over RON. Its main point is that maintaining n^2 links is unnecessary. Instead, the authors suggest keeping a smaller number of links would be enough because the source node can try its peers in sequence until it find a path to the destination. The problem that the papers solves seems very much like searching in an unstructured p2p network. Perhaps a simple flood will be much more efficient. It is not expensive in this case, because it will happen rarely. In addition, it guarantees that the destination will be reached in the shortest possible time. The path found will also most likely be the one with lowest latency. From asr32@cornell.edu Thu Mar 9 00:30:30 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295UUt01327 for ; Thu, 9 Mar 2006 00:30:30 -0500 (EST) Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k295UTAi024064 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 9 Mar 2006 00:30:30 -0500 (EST) Message-Id: <6.2.1.2.2.20060307201509.0324a090@postoffice8.mail.cornell.edu> X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2 Date: Thu, 09 Mar 2006 00:28:29 -0500 To: egs+summary@cs.cornell.edu From: Ari Rabkin Subject: PAPER 13 Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed RON: A RON is a resilient overlay network; an overlay network designed to have data tunneled through it. By doing routing at the application layer instead of the network layer, a RON can adapt to bad links much more quickly than a lower-level protocol such as BGP. In addition, a RON is able to employ much more flexible criteria about routing, thus allowing use of routes that are not publicly advertised. It is unclear how well RONs would work if deployed on a larger scale. How would the internet's routing perform if many nodes were rapidly altering their traffic patterns in response to perceived poor service? And if it would work well, why not simply alter BGP, rather than do this at the application layer? Also, using a RON requires trusting that nodes are not lying about the origin of their data to get around awkward usage policies. This, as the authors say, is possible in small systems, but presumably would be awkward as RON systems become larger and more common. One-hop: How much benefit can we get fro mclever routing? The authors of this paper did extensive path traces from Planetlab nodes to triangulate routing failures. As it happens, a significant fraction of server failures, and a large preponderance of broadband user machine failures, are end-system or last-link failures, which no routing algorithm can ameliorate. For those failures which can be fixed by routing, picking four nodes at random and trying to route through them ("one-hop" indirection) worked 92% of the time. The authors conclude that trying to route through a small number of random peers is often an effective technique for routing around failures. Interestingly, if one is trying more than 2 or 3 indirect routes, cleverness does not seem to help; it's hard to do better than choosing indirection points at random. This suggests that a RON may in fact be overkill for most applications needing higher reliability. How well would this system work for those of us who aren't running on planetlab? For the system to work, nodes must have some set of 'peers' who are willing to route for them, with whom they have tolerably fast connections, but who are far enough away in the network graph that routing through them will miss the damaged link[s]. Moreover, they must be willing to forward data. This is by no means impossible, but neither is it straightforward. Unlike disk space, bandwidth is an ephemeral resource, and particularly in this case, used seldom. Therefore a straightforward swap seems inapplicable. Ari Rabkin asr32@cornell.edu Risley Hall 454 3-2842 The resources of civilization are not yet exhausted. --William Gladstone From ns253@cornell.edu Thu Mar 9 01:18:41 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k296Iet14762 for ; Thu, 9 Mar 2006 01:18:40 -0500 (EST) Received: from localhost (cpe-69-207-49-126.twcny.res.rr.com [69.207.49.126]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k296Idu9027581 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for ; Thu, 9 Mar 2006 01:18:40 -0500 (EST) Date: Thu, 9 Mar 2006 01:18:40 -0500 From: Niranjan Sivakumar To: egs+summary@cs.cornell.edu Subject: PAPER 13 Message-Id: <20060309011840.d4666ebb.ns253@cornell.edu> Organization: Cornell Law School X-Mailer: Sylpheed version 2.2.0 (GTK+ 2.8.13; i686-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Niranjan Sivakumar Resilient Overlay Networks Improving the Reliability of Internet Paths with One-hop Source Routing A Resilient Overlay Network (RON) is designed to run on top of the existing Internet and facilitate avoiding and dealing with instability and outages that may not be efficiently resolved with existing Internet protocols. Nodes in different segments of the network are selected to be a part of the RON. These nodes actively monitor path conditions in order to aid their forwarding of packets that they receive from any other RON clients. Forwarding can be done based on some particular metrics or for specified applications. Types of traffic on links can be limited through a policy routing mechanism that is implemented. Experimental results show that this approach is able to deal with some some types of outages and performance issues better than simply relying on BGP and the current Internet routing infrastructure. Scalable One-hop Source Routing (SOSR) is another technique to deal with routing around problems in a way that is more scalable and efficient than a RON. FIrst, the researchers investigated failures on the Internet to get an idea of what problem they were actually trying to solve. They found that a large number of failures cannot be routed around because they are so close to the destination. To deal with those errors that can be routed around, in the event of a failure, traffic is attempted to be routed through k randomly chosen intermediaries (random-k). In this case, random-4 was determined to be a balanced choice that performed reasonably well. The theory is that if traffic can be routed to an intermediary that is sufficiently far away from the problem area of the network, another route will be found to the destination. Some more sophisticated routing methods were tested (history-k and BGP-paths-k), but it was shown that the random-k system performs comparably and i! s less complex. As noted in the SOSR paper, one of the main issues with RON is its inability to scale well. Some issues with SOSR seem to be that its simplicity takes away some of the advantages seen in RON, such as adjusting routing based on some metrics. Also, for an implementation on a large scale on the Internet, it is not clear what the incentives are to participate in this system, even though the overhead is considered to be "negligible." From nsg7@cornell.edu Thu Mar 9 11:07:27 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29G7Rt03561 for ; Thu, 9 Mar 2006 11:07:27 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29G7MlG012526 for ; Thu, 9 Mar 2006 11:07:23 -0500 (EST) Received: from 128.84.154.13 (proxying for unknown) by webmail.cornell.edu with HTTP; Thu, 9 Mar 2006 11:07:23 -0500 (EST) Message-ID: <28009.128.84.154.13.1141920443.squirrel@webmail.cornell.edu> Date: Thu, 9 Mar 2006 11:07:23 -0500 (EST) Subject: PAPER 13 From: "Nicholas S Gerner" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Resilient Overlay Networks (RON) provides an application level overlay for general data routing on the internet. Specifically Andersen et al. identify problems with BGP, the underlying internet routing protocol. Specifically, path failures are frequently not detected and repaired by BGP for tens of seconds. RON seeks to address some of these issues. RON is implemented using FreeBSD's Divert Socket software to intercept IP traffic so that RON can perform its own application level routing (providing an IP abstraction to higher level applications). RON forwards packets to the final estination using standard IP in most cases. However, if a link or path failure is detected RON is able to use an intermediary RON node to forward packets. This failure detection and intermediary route discover is accomplished through a full connectivity overlay where every node exchanges probes with every other node (incurring O(N^2) overhead) to detect failures and measure several metrics (latency, packet-loss and throughput) for route selection. These metrics can be augmented by application level metrics and intermediate nodes not supporting these metrics can fallback on some application defined default. RON also includes route policies to decide which paths are acceptable when choosing an indirection (such as disallowing commercial traffic on Internet2 links). Anderson et al. show that RON can route around most underlying network failures (50-60% of such failures) in two test deployments of 12 and 16 nodes. However, this improvement is at the cost of significant overhead (estimated at 2.2Kbps for N=10 to 33Kbps for N=50, scaling with O(N^2) and latency (200ms for their implementation). Anderson et al. characterize this cost as a tradeoff for performance, however the analysis and evaluation of such a tradeoff and the parameters involved are not shown in detail. In the end RON is not presented as a scalable solution for many nodes and is instead suggested for deployment in small applications such as conferencing. However many of these applications may not be able to tolerate additional overhead for failures which may be unlikely over the short lifetime of some of these applications. In "Improving the Reliability of Internet Paths..." Gummadi et al. present two important contributions. First a simple study of internet connectivity explores routing failures from different vantage points. The data from this study is used to analyze the effectiveness of several indirection strategies in the face of routing failures. Specifically, given a set of "vantage points" probes are sent to a diverse set of internet destinations from vantage points. When connectivity failure is encountered alternative probes are sent from different vantage points and the path from source to stination is probed via a modified traceroute. Gumamdi et al. conclude that only between 40 and 60% of connectivity failures can be recovered (because last-hop connectivity has failed for which no indirection routing could be performed). To address this random-4 indirection is presented which consists of randomly and in parallel choosing four random other hosts as one-hop indirection hosts to attempt to recover. This stateless and low-cost indirection mechanism is shown to be very competetive with stateful and much more expensive alternative strategies. Second, the SOSR implementation is presented which is a kernel level routing indirection which automatically contacts a configured intermediary according to random-4 to forward packets NAT-style in the face of failures. This indirection is transparent to the user application and the destination host. Experimentation suggests that SOSR recovers from 56% of network level failures (slightly below the measured 66% which could have been recoverable). "Improving the Relaiablity of Internet Paths.." makes important contributions to this space. Specifically a careful survey of network failures is made characterizing the frequency of failures and the percentage of those recoverable by indirection and the random-4 protocol which suggests that these failures can be overcome by simple, stateless mechanisms. However, the experimental study presented only considers between 383 and 486 failure instances over a 72 hour period. These failures account for between .14 and .18% of the interactions attempted. The improvement shown by SOSR is a .04% increase in end reliability with a .01% increase in application level failures. These margins may be on the order of statistically insignificant differences, and no such hypothesis test is presented to show otherwise or what margin can be inferred to generalize. The increase in application level failures is worrysome and is not explored in-depth. Such failures could be much larger for other classes of applications. From tc99@cornell.edu Thu Mar 9 11:35:33 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GZWt13200 for ; Thu, 9 Mar 2006 11:35:32 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29GZV5M002590 for ; Thu, 9 Mar 2006 11:35:31 -0500 (EST) Received: from 128.84.153.30 by webmail.cornell.edu with HTTP; Thu, 9 Mar 2006 11:35:32 -0500 (EST) Message-ID: <1107.128.84.153.30.1141922132.squirrel@webmail.cornell.edu> Date: Thu, 9 Mar 2006 11:35:32 -0500 (EST) Subject: paper 13 From: "Theodore Ming Shiuan Chao" To: egs@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal Neither of the two papers, RON and One-Hop OSDI, deal with P2P. Rather, they deal with the background internet routing between physical links that all the P2P papers gloss over when they deal with the virtual links. Both the papers seek to improve the fault-detection and recovery that exist in current internet routing protocols (BGP). However, the two papers go about it in different ways. RON uses active monitoring and storing link-states to route around failures while One-Hop OSDI uses multiple redundant alternative paths in the case of failures. While RON gains policy flexibility and active routing around problematic paths, it does so at the expense of scalability and a polynomial growth in the number of nodes for the background traffic required to keep the RON network updated. One-Hop OSDI doesn't require any background traffic, but it does not actively route around failed paths. Instead, when node detects a failed path, it sends it to a number of randomly selected intermediaries to attempt to route around the failed link(s). However, One-Hop generates additional traffic when a path failure is detected (on the order of the k intermediaries they concurrently route to). From kelvinso@gmail.com Thu Mar 9 11:42:11 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.200] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GgBt15831 for ; Thu, 9 Mar 2006 11:42:11 -0500 (EST) Received: by wproxy.gmail.com with SMTP id 69so736431wri for ; Thu, 09 Mar 2006 08:42:11 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=uPpyrDmz47+Qegq55JMNoWWGr3CSSVv7ReBzs+z3HRIx9cPggIEokAGBqSIuGxw3iGOXbOLCvf0wLOTwNelyjjUNmRwnyPhaWzqoX20vJ51Ty5zHrtWVMirXW9I8fnm8HVSRZAjtJZ/cPAcWuzM9b2DvQIJDbQq8G0W99Sh3Xl0= Received: by 10.54.105.3 with SMTP id d3mr2379519wrc; Thu, 09 Mar 2006 08:42:10 -0800 (PST) Received: by 10.54.80.9 with HTTP; Thu, 9 Mar 2006 08:42:10 -0800 (PST) Message-ID: <6e1ca4560603090842g27c40a6wbb2dc8b2199902a6@mail.gmail.com> Date: Thu, 9 Mar 2006 11:42:10 -0500 From: "Chiu Wah Kelvin So" To: egs+summary@cs.cornell.edu Subject: Paper 13 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k29GgBt15831 The first paper, "Resilient Overlay Networks," presents an architecture that detects and recovers from path failures using overlay routing. To detects and recovers from path failures quickly, each node in Resilient Overlay Networks (RONs) actively probes and monitors paths, which include the latency, loss rate, and throughput, to all other members. These aggressive probing may lead to excessive bandwidth overhead when the size of membership in RONs is large. Therefore, RONs decides to limit the number of nodes participated in RONs under 50 nodes. Similar to link-state routing protocol, nodes construct routing table by periodically exchanging routing information of the different performance metrics to the other nodes. When nodes detect failure of a path, it can use other nodes to route through the failure. Not only RONs allow application to recover from path failure, it also allows application to route based on specific application. Since RONs keep track of various performance metrics of paths, application can route through different paths based on application-need. For example, some applications may find low latency is more important while others application may find loss rate more important. Also, when a packet enters the network, it is given a particular policy tag and it can only route through the path where the type of traffic is allowed. However, the drawback of the system is that it does not scale as more members join the RONs because of the aggressive probing. Also, it assumes all members in RONs are cooperative. The second paper, "Improving the Reliability of Internet Paths with One-hop Source Routing," presents an architecture that improves path failure by routing one more hop. This paper first presents a measurement study on 67 nodes in PlanetLab probing 3000 other nodes. The main point of the study shows that 66% of path failure can be recover by one extra hope through one of the 67 nodes in PlanetLab. They also shows that broadband hosts tend to have more last-hop failure which is not recoverable by extra hops. From the study, they implement an architecture where if a node fails to send packets to other nodes, then it will randomly pick k of the 76 nodes to route the packet. Since about 66% of path failure can be recovered by one extra hope from the studies, it can reduce path failure by 60%. This simple technique can avoid most path failures. Because SOSR does not have any background traffic, it can scale a lot better than RONs. However, the intermediate nodes have to be server class node or else it is not reliable to use them to route packets. From km266@cornell.edu Thu Mar 9 11:44:48 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29Gimt16265 for ; Thu, 9 Mar 2006 11:44:48 -0500 (EST) Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k29GilW2015011 (version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT) for ; Thu, 9 Mar 2006 11:44:48 -0500 (EST) Message-ID: <000301c64398$c733d270$4425cf45@KEVSTOY> Reply-To: "Kevin" From: "Kevin" To: Subject: PAPER 13 Date: Thu, 9 Mar 2006 11:44:47 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2527 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 RON creates an overlay on top of the internet that actively looks for better paths to route packets. It keeps a complete list of neighbors and periodically (very often, actually) pings them to determine whether they are down. If they are down, then another set of packets is sent to determine whether it was a freak accident (some minimal packet loss) or a real outage. This second stream of packets is done in quick succession and once it has determined that the path is down, an alternate path to the destination is picked. Because of this active approach and its application-level implementation, RON can determine whether it wants to send packets to a certain host depending on latency, throughput, or packet loss. FTP, for example, should have a high throughput while latency does not matter. On the other hand, we have HTTP which cares about latency but generally does not care about throughput since most web pages are small in size. You can also imagine an application that needs to get a packet to its destination, no matter how long or how slowly it takes to get there, that would favor a packet-loss light link. RON reroutes packets with just one intermediary node, claiming that this was sufficient to see marked improvement. In the results section, RON is shown to have route packets much more reliably than the current BGP system. It delivers packets quicker and recovers from faults much quicker (on average less than 20 seconds as opposed to several minutes). The major downfall of RON seems to be the massive amount of background information that is flowing through it. The authors put a chart of bandwidth usage vs number of nodes in the system. At 50 nodes, the system is using ~35kpbs, which is incredible. They note that this is one tenth of the current broadband speed. This seems too high for the gains the system produces. My cable modem seems to have problem getting to the internet about once a week for half an hour (as a pessimistic estimate, it is usually more like once a month for 15 minutes). My upload bandwidth is close to 150kpbs. I would not want to allocate almost a quarter of my bandwidth for those thirty minute of internet connectivity. Furthermore, it is usually the gateway I am connected to at my ISP that is down. No internet traffic can get to me and vice versa: RON would not be able to solve that problem. RON allows quicker and more robust communication at the cost of bandwidth, a tradeoff that might be made when bandwidth is cheaper. SOSR takes a different approach than RON. It seems to be more on the passive rather than overly active side of things. The paper first delves into a detailed study of link failures and packet loss. They created their own traceroute program to actively find the node that is down in order to determine exactly if it was possible to route around the failure. They found that a large part of failures are at last-hop or end-systems. These kinds of failures cannot be routed around. If my computer or my ISP goes down, there is no way for anyone to reach me. They suggest multi-homing (having multiple ISPs) as a solution to this, noting that there is no routing protocol that can handle this kind of failure. The paper finds that active RON-like systems have huge overhead and decide to develop a system that has no a-priori knowledge of internet states. They create something called random-k. Basically, when a path outage is detected, you route to k random intermediaries that forward the packet for you. Unless all k intermediaries fail to route the packet or the path reapirs itself, the packet is successfully routed. You can then talk through the successful intermediary in order to get your packets out to your destination. They found that round-4 was the best tradeoff and worked well. In the end, the system's passive approach scales better. SOSR is not as successful as RON in getting a path to a destination. This seems obvious from RON's active pinging and fault recovery. On the other hand, SOSR has minimal bandwidth overhead. The lack of security seems to also be an issue, but something that might be better for a followup paper. From victoria@cs.hmc.edu Thu Mar 9 11:49:28 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GnRt17843 for ; Thu, 9 Mar 2006 11:49:28 -0500 (EST) Received: by turing.cs.hmc.edu (Postfix, from userid 34382) id 7B26053267; Thu, 9 Mar 2006 08:49:21 -0800 (PST) Date: Thu, 9 Mar 2006 08:49:18 -0800 From: Victoria Krafft To: egs+summary@cs.cornell.edu Subject: PAPER 13 Message-ID: <20060309164918.GA16553@cs.hmc.edu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.4.2.1i With the current BGP routing protocols, the Internet can take several minutes to reconfigure the routes used when failures occur. Andersen et al. presents a scheme for building resilient overlay networks (RONs), which will detect failures, and recover from them, much more rapidly. Essentially, a RON is made up of several nodes in different routing domains. Those nodes form virtual links to each other, and route data between nodes. They keep track of link performance, and re-route to get around virtual link failures, and to improve performance. While the results of their studies show that RONs improve performance for the nodes connected to the RON, they have not examined the extra load a RON places on a network. The aggressive probing of link conditions is likely to slow down other network traffic, and place increased load on the network. While this may be a worthwhile tradeoff for some applications, it should probably not be used to improve general Internet traffic. In addition, RONs only work for a relatively small set of nodes. The authors propose running multiple RONs to get around this limitation, but that would use up even more resources from the underlying networks. Gummadi et al. present a much less resource intensive scheme for handling failures, one-hop routing. In this plan, if the direct route to a host fails, then the machine trying to reach it will attempt to route the packet through some number of intermediary nodes; if none of the intermediaries can reach the destination either, then the packet is lost. Their experiments show that if four intermediaries (out of the 39) are randomly chosen, and the packet is routed through them, then about 60% of failures to reach popular servers could be routed around. Only 66% of these failures could possibly be routed around; the rest are failures too close to the server. The characterization of Internet failures in this paper is also interesting; it shows that for most broadband hosts, and many popular servers, routing protocols cause temporary, and unnecessary, outages fairly frequently. Network reliability for popular servers is only about 99.6%, which suggests that the Internet is still inherently unreliable, even for very popular sites, which presumably have expensive and high-quality connections to the rest of the network. -- Victoria Krafft From asg46@cornell.edu Thu Mar 9 12:15:12 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29HFCt26474 for ; Thu, 9 Mar 2006 12:15:12 -0500 (EST) Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29HFA5h029537 for ; Thu, 9 Mar 2006 12:15:10 -0500 (EST) Received: from 128.84.98.90 by webmail.cornell.edu with HTTP; Thu, 9 Mar 2006 12:15:11 -0500 (EST) Message-ID: <4719.128.84.98.90.1141924511.squirrel@webmail.cornell.edu> Date: Thu, 9 Mar 2006 12:15:11 -0500 (EST) Subject: paper 13 - routing From: "Abhishek Santosh Gupta" To: egs+summary@cs.cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal RON the main goal of a RON is to enable a group of nodes to communicate with each other in the face of problems with the underlying Internet paths connecting them. RON nodes exchange information about the quality of the path among themselves via a link-state routing protocol and build forwarding tables on a variety of metrics. In order to constrain the amount of bandwidth overhead, the authors suggest limiting the RON to less than 50 nodes. This scheme is not scalable for networks with very large peers. it also requires each node or a local group of nodes to create databases to store samples related to various metrics. the databases ( in case of a group of nodes) serve as a single point of failure and attack. It assumes that nodes in a RON will not be malicious since malicious behavior could be easily detected in a such small world. the paper does not explicitly deal with such detection mechanisms especially in the case of nodes who might forward garbage information (related to path quality). Optimizations are achieved using flow-cache entries. One-hop Source Routing the paper carries out experiments to determine the location of path failures while routing. the authors distinguish between a true pah failure and short-lived congestion using multiple probes. TCP-ACK packets were used instead of UDP for various reasons such as firewalls and routers dropping them. basic idea: once a node detects a path failure, it selects k intermediaries and attempts to reroute packets through them. If the new indirect path is sufficiently disjoint from the older direct path, the faulty component will be avoided resulting in successful communication. better policies: maintain a history of the intermediary that correctly forwarded to a particular destination and set this intermediary as the k th intermediary for subsequent routing to the same destination. (although the authors do not discuss as to how a node detects that a particular intermediary was successful given that it forwards to all the intermediaries in parallel - does the response returned from the destination indicate this intermediary ? - or does the intermediary himself inform about successful routing ?) experiments show that for 3000 nodes random-4 (k=4 here) provides good results and it should be invoked after having observed just a single packet drop. the value of the k would be different for different sized networks. The authors have not provided mathematical equations for the value of k and the frequency/eagerness of invoke for different sized networks. k should be set based on the number of nodes in the network for optimal performance. (randomly choosing k will improve performance by a small amount only) Knowing the number of nodes in the network (since it can change dynamically) seems difficult. From tudorm@cs.cornell.edu Thu Mar 9 12:37:31 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29HbUt03245 for ; Thu, 9 Mar 2006 12:37:31 -0500 (EST) Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Thu, 9 Mar 2006 12:37:30 -0500 Received: from [128.84.223.148] ([128.84.223.148]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 9 Mar 2006 12:37:30 -0500 Message-ID: <441067D9.8040707@cs.cornell.edu> Date: Thu, 09 Mar 2006 12:37:29 -0500 From: Tudor Marian User-Agent: Thunderbird 1.5 (X11/20051201) MIME-Version: 1.0 To: egs@cs.cornell.edu Subject: PAPER 13 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 09 Mar 2006 17:37:30.0384 (UTC) FILETIME=[23F6C100:01C643A0] RON is an architecture that improves the reliability of underlying Internet packet transport by detecting and recovering from transient path failures more rapidly than BGP does. It works by deploying a set of nodes into an app-level overlay network, and having them aggressively probing and monitoring the quality of the underlying links between themselves. Packets are routed according to application specified metrics, either by direct sink into the Internet or by way of some other RON node. The paper found that forwarding packets via at most one RON node was sufficient to overcome faults and improve performance in most of the cases, moreover the link failure detection and recovery took on average 18 seconds as opposed to tens of minutes it takes BGP to recover. One can argue that the architecture is as dirty as it gets, even if it is a good hack that works it only does so because of the shortcomings of the BGP. The paper proposes a simple approach to recover from Internet path failures, called one-hop source routing. The way it works is by routing indirectly through a small set of randomly chosen peers, and in contrast to RON, performs no background path monitoring, thereby scaling well and avoiding the overhead in the common case of no failures. In order to motivate the approach, the paper starts by presenting a measurement study of Internet path failures and then evaluate their implementation prototype on PlanetLab. They conclude that although 16% of the failures on paths to servers and 60% on paths to broadband hosts are unrecoverable since it's impossible to route around the last hop, the simple random-4 technique was able to recover from 61% of the path failures for popular servers and 35% for broadband hosts. Arguably, the netfilter choice of implementation is much more elegant than the one chosen by RON. Tudor From ymir.vigfusson@gmail.com Thu Mar 9 12:39:41 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from nproxy.gmail.com (nproxy.gmail.com [64.233.182.201] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29Hdft03556 for ; Thu, 9 Mar 2006 12:39:41 -0500 (EST) Received: by nproxy.gmail.com with SMTP id x4so402520nfb for ; Thu, 09 Mar 2006 09:39:40 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=RpxAj2UoM2gaVBvnlX4P3SQVLNdhWVg68avVSzBFfRYmqJ4zKNQ7+Yt8ELCJhxJ6JoySztH8TQ/2lnDKdh++NRw9kRHnNctHNeH/FJH8vRv2xjKXboem6u3s9VxnERQzk+0HpOrbZlx/8vup4Br+YrUO0qVg6XciRWAzNcFbsdA= Received: by 10.48.199.10 with SMTP id w10mr1002532nff; Thu, 09 Mar 2006 09:39:39 -0800 (PST) Received: by 10.48.246.18 with HTTP; Thu, 9 Mar 2006 09:39:39 -0800 (PST) Message-ID: <9302f1e20603090939t37a9dd3etd11130968469a21a@mail.gmail.com> Date: Thu, 9 Mar 2006 12:39:39 -0500 From: "Ymir Vigfusson" To: egs+summary@cs.cornell.edu Subject: PAPER 13 MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k29Hdft03556 RON (Resilient Overlay Network) is an application-layer overlay network designed to detect and recover from network failures (namely link failures and performance failures) quickly. While the underlying AS/BGP architecture of the Internet is designed for the same purpose, the reroute recovery time is in the order of several minutes due to scalability issues whereas RON does this in the order of several seconds (namely 18 seconds on average in the experiment with 12/16 RON nodes and 132/240 distinct paths). This agility is accomplished by aggresive probing and monitoring of the paths that connects the RON nodes. The paper cites results from Labovitz et al. that 10% of considered routes on the Internet were available less than 95% of the time, and 35% of all routes were unavailable more than 99.99% of the time. Furthermore, 40% of path outages take longer than 30 minutes to repair. RON nodes exchange information about the quality of the paths they know. The authors present results where at most one intermediate RON node is sufficient to improve routing paths. RON, unlike BGP-4, is capable of expressive fine-grained policies aimed at users or hosts. Another feature of RON is integration with applications, since they may prioritize different metrics over others (e.g. latency, low loss or throughput) of which RON enables simultaneous optimization. The implementation of RON was explicitly designed to be limited in size, and an underlying assumption in the paper is that RON is small relative to the Internet. There is a huge amount of background data and paths that needs to be recorded so its scalability seems to be a large drawback. Additionally, it provides a means of 'selfish routing' which may be suspect to downsides like the Braess' paradox where adding a cheap link to the network slows down traffic. However, Tim Roughgarden and Eva Tardos showed that this should only make latency 4/3 times the best coordinated routing. The second paper "Improving the Reliability of Internet Path with One-hop Source Routing" starts off by investigating path failures on the Internet. From a 7 day data of probing paths from PlanetLab servers to a set of destination hosts, it turns out that the vast majority of paths suffered at least one failure. Many failures (16% of failures to servers and 60% of failures to broadband hosts) occurred close to the destination (last-hop), so alternative routing/overlay schemes may not work to alleviate them. The paper then proposes a 'one-hop source routing' scheme called SOSR where a node tries to reroute packets through one or more intermediaries after it discovers a path failure. If the paths are disjoint and the intermediary also avoids faults, end-to-end communication is restored. It turns out that the most effective policy is to pick the intermediaries at random (smarter policies don't buy much extra), and the paper in particular picks 4 intermediaries as a good value deduced by the failure sample. If all of the intermediaries fail, the policy is to wait for the default path to recover itself. Notice how the policy does not need to keep track of state. The paper concludes by evaluating an implementation of SOSR on Linux which was able to reduce recovable failures by 56%, close to the predicted value. The paper does not give any theoretical insight into the failures observed or theoretical basis for why one-hop source routing may be viable. From xthemage@mac.com Thu Mar 9 13:15:18 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,DNS_FROM_RFC_POST autolearn=no version=3.1.0 X-Spam-Level: Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.97] (may be forged)) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29IFHt15636 for ; Thu, 9 Mar 2006 13:15:17 -0500 (EST) Received: from mac.com (smtpin01-en2 [10.13.10.146]) by smtpout.mac.com (Xserve/8.12.11/smtpout06/MantshX 4.0) with ESMTP id k29IFG90021718 for ; Thu, 9 Mar 2006 10:15:16 -0800 (PST) Received: from [128.84.98.36] (dhcp98-36.cs.cornell.edu [128.84.98.36]) (authenticated bits=0) by mac.com (Xserve/smtpin01/MantshX 4.0) with ESMTP id k29IFDKl025963 for ; Thu, 9 Mar 2006 10:15:16 -0800 (PST) Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: <8490D031-582C-48E1-8E1E-3B7C49D54CF2@mac.com> Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary@cs.cornell.edu From: Oliver Kennedy Subject: PAPER 13 Date: Thu, 9 Mar 2006 13:15:09 -0500 X-Mailer: Apple Mail (2.746.2) RON is based on the assumption that the current internet routing protocol BGP is crap. To an extent, this is true. BGP is designed to operate with minimum overhead, and has to deal with the economic policies of all the organizations that provide connections along a particular route. Consequently, sub-optimal routes are often chosen due to out of date information or unusual policy decisions. RON attempts to out-think BGP by choosing its own path, and forcing BGP to route packets accordingly. It does this by creating an overlay network. Clients insert packets into the overlay at the closest node and the node closest to the destination forwards them appropriately. Since a RON is a fully connected graph, in the ideal situation a route will be only one hop. If the network connection between two nodes begins to degrade, the RON will attempt to reroute the packet through another node. There are a handful of concerns with this system. Firstly, the fact that the RON is a fully connected graph means that each node has O (N^2) state. This could cause significant scaling issues as the RON begins to see more users. Secondly, since each RON node is constantly pinging every other RON node, the network overhead associated with the above state is likely to get out of hand quickly. The paper places a theoretical 50 node size limit on the network, but as the traffic flowing through the RON grows, the individual nodes will become swamped unless more router nodes are added. Moreover, if the internet were to shift over to RON completely, then a large number of existing routes would become obsolete and unnecessary. One hop source routing takes a similar approach. Instead of using a network of nodes, it places a series of routers on the internet. If a routing failure occurs between the source and the destination, it instead tries routing through an intermediary node rather than using a complex overlay network. The downside to this is that the nodes must be known to each source in advance. A centralized discovery service provides a single point of failure, and a distributed protocol still requires some form of bootstrap host. Furthermore, deployment of these nodes could get difficult. Deploying them over a global testbed is trivial, but it would be difficult to get a corporation to deploy such a router. Why should they pay for someone else's bandwidth? -Oliver Kennedy Computer science isn't about computers any more than astronomy is about telescopes. -- Dijkstra - Oliver Kennedy They laughed at Einstein. They laughed at the Wright Brothers. But they also laughed at Bozo the Clown. -- Carl Sagan From pjk25@cornell.edu Tue Mar 14 02:33:21 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k2E7XLt18617 for ; Tue, 14 Mar 2006 02:33:21 -0500 (EST) Received: from [128.84.92.3] (wislpjk25.ece.cornell.edu [128.84.92.3]) (authenticated bits=0) by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k2E7XKhE013033 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 14 Mar 2006 02:33:20 -0500 (EST) Mime-Version: 1.0 (Apple Message framework v746.2) Content-Transfer-Encoding: 7bit Message-Id: Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed To: egs+summary@cs.cornell.edu From: Philip Kuryloski Subject: PAPER 13 Date: Tue, 14 Mar 2006 02:33:19 -0500 X-Mailer: Apple Mail (2.746.2) PREDICTING INTERNET NETWORK DISTANCE WITH COORDINATES-BASED APPROACHES We would like to be able to predict network distances (RTTs) for the internet without having to send probes each time. One potential solution is to have a set of servers which maintains a simplified view of the network topology with RTTs. A node can request data from the server and predict a desired RTT. This however, produces a bottleneck at these servers, and additional latency is incurred requesting the data from the server. A second solution is to use triangulated heuristic coordinates, or hop counts to some base stations nodes in the network. The accuracy is often less than desired. The authors propose Global Network Positioning (GNP) where there is a set of well known landmark nodes in the network, and the node in question trilaterates its position in a 3-D euclidian space relative to the landmark nodes. The authors find that this scheme functions relatively well, although the slides do not mention how, due to inconsistencies in link latencies in the network, inconsistent localization from more than 3 landmark nodes is handled. VIVALDI Vivaldi, like GNP, places nodes in a virtual 3D (more precisely a 2D with directionless height, or 2.5D) space where euclidian distance between nodes is indicative of RTT. The primary difference between the two is that Vivaldi does not use a set of landmark nodes like GNP. Vivaldi implements a decentralized version of the following optimization: Consider the network as a complete graph, with each edge as a spring. The resting length of the spring is the true RTT between to vertices (nodes). Nodes have been positioned optimally for a given virtual space when the sum of the potential energies of all springs is minimized (the springs will be stretched to satisfy low number of dimensions in the virtual space). This in fact will naturally occur if the mass of nodes and springs is allowed to float freely in space and stabilize under it's own forces. As potential energy is proportional to distortion squared for springs, this solves a least squares fitting problem. It is possible to simulate this self-stabilization in both a centralized and decentralized manner, Vivaldi uses the latter. Optimizations of the distributed simulation allow the system to react gracefully to the introduction of a large number of nodes into the network. The authors find the system to be comparable in accuracy to GNP, a centralized scheme. It is further refined by the use of a 2D + directionless height vector space, rather than a 3D space. MERIDIAN Meridian does not use a virtual coordinate system like GNP or Vivaldi, despite serving the same goals. Each node keeps O(log N) state in the form of a set of peers organized by concentric rings of increasing network distance radii. Much in the spirit of prefix routings ability to move exponentially closer to a target in an identifier space, Meridian jumps exponentially shorter distances towards a target. Membership structure in these rings is loose, allowing a gossip protocol to be used to refresh and fill rings. Simulation demonstrates that Meridian produces superior performance to GNP and Vivaldi. From gp72@cornell.edu Sat May 13 01:10:54 2006 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on sundial.cs.cornell.edu X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham version=3.1.0 X-Spam-Level: Received: from exchfe1.cs.cornell.edu (exchfe1.cs.cornell.edu [128.84.97.27]) by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k4D5Ar221808 for ; Sat, 13 May 2006 01:10:53 -0400 (EDT) Received: from iago.cs.cornell.edu ([128.84.96.10]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Sat, 13 May 2006 01:10:48 -0400 Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by iago.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830); Sat, 13 May 2006 01:10:45 -0400 Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20]) by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k4D5Aha8012433; Sat, 13 May 2006 01:10:43 -0400 (EDT) Received: from 128.84.98.245 by webmail.cornell.edu with HTTP; Sat, 13 May 2006 01:10:43 -0400 (EDT) Message-ID: <1255.128.84.98.245.1147497043.squirrel@webmail.cornell.edu> In-Reply-To: <1157.128.84.98.245.1147492519.squirrel@webmail.cornell.edu> References: <2537.128.84.98.245.1147406914.squirrel@webmail.cornell.edu> <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu> <4952.128.84.98.245.1147484280.squirrel@webmail.cornell.edu> <1157.128.84.98.245.1147492519.squirrel@webmail.cornell.edu> Date: Sat, 13 May 2006 01:10:43 -0400 (EDT) Subject: PAPER 13 From: "Gopal Parameswaran" To: egs+summary@cs.cornell.edu Cc: gp72@cornell.edu User-Agent: SquirrelMail/1.4.5 MIME-Version: 1.0 Content-Type: text/plain;charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Priority: 3 (Normal) Importance: Normal X-OriginalArrivalTime: 13 May 2006 05:10:45.0106 (UTC) FILETIME=[96C9A520:01C6764B] Resilient Overlay Networks: A Resilient Overlay Network (RON) is an architecture that allows distributed applications across a network to detect and recover from path outages and choose optimal paths for routing and redirect in case of path outage and provide a path for communication when the underlying network has problems in routing. RON nodes maintain and exchange information regarding the existing routing and network status between other RON nodes and uses this information to build forwarding table for routing packets based on the path metrics that would include latency, packet loss and the current available throughput. Each RON node periodically assimilates path metrics from its neighbors and in case of outages transmits the information to other RON nodes. Thus the protocol maintains performance metrics on itself and also at the same time ensures that all RON nodes have compete information unless they are isolated from the RON network. RON networks however do not try to find the optimal routing path for maximum throughput but instead strives to avoid paths that have a low throughput. Thus the system is subject to selfish routing and would cause fluctuations in network usages based on throughput. Instead if the system were to also try to establish Nash Equilibrium in the network routes, it would lead to a more stable routing protocol where a taxation policy based on a utility function can be used to ensure that the system converges to an optimal equilibrium. One Hop Routing In this paper the authors discuss a routing protocol where the optimal route is discovered by sending packets across a set of randomly chosen intermediate nodes and claims that performance matches those obtained by using path monitoring performance metrics on the network. It is based on detection of path failures that results in a corrective action based on rerouting it through an alternate route. This result in polices being formulated at the level of intermediary nodes and whenever a failed node is detected also a resulting optimal path is also chosen and also by selecting more than one random intermediate node the selection of bad re-routes are avoided.