From lackhand@gmail.com  Wed Mar  8 22:14:22 2006
Return-Path: <lackhand@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_00,HTML_00_10,
	HTML_MESSAGE autolearn=no version=3.1.0
X-Spam-Level: 
Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.195] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k293EMt28417
	for <egs+summary@cs.cornell.edu>; Wed, 8 Mar 2006 22:14:22 -0500 (EST)
Received: by zproxy.gmail.com with SMTP id 13so337751nzp
        for <egs+summary@cs.cornell.edu>; Wed, 08 Mar 2006 19:14:18 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type;
        b=E8Dxp8XSJbDPbQ24jN2jVl7TAKkkzT2nczPbUmlxwTmD5dbMLvfzLHJufm857gtgCEV/XSyy1fK9rkiVOfnkPy6hdtTZmsAilj/h6XqMx0l9UNt2TCq6MfQiM4CgWMARBMwSK9ozOHI0QeMKIKm5Jpogiedpa6cwX5tHGwKUvxk=
Received: by 10.37.12.30 with SMTP id p30mr103830nzi;
        Wed, 08 Mar 2006 19:14:18 -0800 (PST)
Received: by 10.36.147.10 with HTTP; Wed, 8 Mar 2006 19:14:18 -0800 (PST)
Message-ID: <9aa7a97d0603081914s4b85c1f4j641c462d1384b336@mail.gmail.com>
Date: Wed, 8 Mar 2006 22:14:18 -0500
From: "Andrew Cunningham" <lackhand@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative; 
	boundary="----=_Part_761_6438695.1141874058512"

------=_Part_761_6438695.1141874058512
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham
arc39
    _Resilient_Overlay_Networks_
    David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

    RON lets distributed internet applications detect and recover from path
outages and periods of degraded performance much faster than current
performance tuning systems (scant seconds versus several minutes, minimum).
Though it exists on top of the internet, it operates as a sort of
application level switch, determining whether it would be better to route a
given packet over the internet directly, or through the overlay, to reach
its destination. This allows improvement over normal Internet routing, and
moreover, they discovered that forwarding packets by at most one hop along
this intermediate protocol (a->b->c versus a->c) was (sufficiently)
beneficial in most cases. This improvement comes at a price, namely a more
complex view of the network topology separating two points; rather than the
current system of using summaries, RON requires a richer view of
connectivity. The RON is explicitly designed to be limited in size, to
facilitate aggressive path maintenance via probing, without excessive
bandwidth overhead, thus lending this (as is properly noted) to a
specifically targetted platform -- despite countless comparisons, this is
not intended to replace BGP, but to work alongside it, since BGP is far mor=
e
scalable. Another interesting feature is that as this work is being done on
a 'local' scale, the heuristics used to define "failures" and "faults" may
be tuned to the application, which provides much better behavior than
traditional BGP for certain domains.
    The gains of RON are narrow and slight, though present. The basic
conceit is reasonable, and so long as RON picks no incorrect route, there i=
s
no reason to suppose that introducing 'delay' in the form of user-level
redirection would be expensive. However, the good qualities of RON rely on
the correct functioning of BGP, and the paper (quietly) admits that the
system is not scalable to replace the current status quo. Thus this must be
used in concert with present systems as an additional layer; this too is
reasonable, though significantly less impressive when the central idea is
realized to be "route around trouble spots in a reactive fashion". There ar=
e
reasons to believe that the gains must be as slight as they are: for one
thing, to get truly optimal paths would be more expensive than simply using
a sub-optimal path; for another, since this must run over the internet (and
not dedicated fiber), there is a natural limit to the amount of benefit tha=
t
controlled switching can gain. Security is addressed only to be dismissed,
which is something of a shortfall: the internet already exists, and so
anything intending to improve its reliability must address all of its
shortcomings. This system is more easily spoofed than the internet, since
less physical labor is involved -- people actually request that you forward
packets for them!-- and while applications can (successfully) layer any
defenses they'd like on to insecure delivery mechanisms, the system is in
the unique position of providing secure primitives for not-much-more-work.

    _One_Hop_Lookups_for_Peer-to-Peer-Overlays
    Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues

    The take-away from this paper is that it is unnecessary to assume that
routing information at each member node must be kept small to ensure that
the information is fresh. Indeed, the bookkeeping required to keep this
information current can be handled through dissemenation trees, scaling wel=
l
with even very large rates of joins/leaves. This then dictates that lookups
have a lower latency, since we need not contact several nodes, but simply a=
t
most one other node. We've seen the algorithm used here before, and the sam=
e
comments hold: namely, they do this by implementing a three level hierarchy
(slices and units of the address space, with each of k equal slices having =
a
slice leader which manages its k' units (each with leader); each unit leade=
r
informs the members of its unit of all information passed down from the
slice leader) through a gossip-like protocol. This is lower overhead than
broadcasting every message, and utilizes caching to combine several small
messages into a single large message, but suffers in the end as it places
enormous strain on the unit- and slice- leaders, who must collate a lot of
information; similarly, if a slice- or unit- leader fails, then it must be
replaced --  this is expensive as they must maintain a lot of state.
    The paper's (legitimate) point is that you can reduce the slice leader'=
s
enormous output requirement by tuning the length of time it waits to
accumulate messages in order to decrease the amount of messages it must
send. While this helps, it is not sufficient: They choose 23 seconds for
their accumulation time, which is very long on a computer-scale, which
defeats the point in many applications of one hop. For an extreme example,
in the previous paper we were impressed by a few seconds for routing around
minutes-long IP blocks; this paper is firmly between the two. They also mak=
e
the point that in many environments, there is much less churn than in the
general internet. This is a boon for the paper, because the system will onl=
y
suffer when there are multiple concurrent events; in that case, the slice-
and unit- leaders are forced to question why they were willing to serve in
that position, since they are suddenly making enormous increases to their
workload. In the stable case, however, this is a beautiful, sedentary
algorithm -- which forces one to question why anyone in a churn-heavy
environment would volunteer for slice leader duty?

------=_Part_761_6438695.1141874058512
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham<br>
arc39<br>
&nbsp;&nbsp; &nbsp;_Resilient_Overlay_Networks_<br>
&nbsp;&nbsp; &nbsp;David Andersen, Hari Balakrishnan, Frans Kaashoek, and R=
obert Morris<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;RON lets distributed internet applications detect
and recover from path outages and periods of degraded performance much
faster than current performance tuning systems (scant seconds versus
several minutes, minimum). Though it exists on top of the internet, it
operates as a sort of application level switch, determining whether it
would be better to route a given packet over the internet directly, or
through the overlay, to reach its destination. This allows improvement
over normal Internet routing, and moreover, they discovered that
forwarding packets by at most one hop along this intermediate protocol
(a-&gt;b-&gt;c versus a-&gt;c) was (sufficiently) beneficial in most
cases. This improvement comes at a price, namely a more complex view of
the network topology separating two points; rather than the current
system of using summaries, RON requires a richer view of connectivity.
The RON is explicitly designed to be limited in size, to facilitate
aggressive path maintenance via probing, without excessive bandwidth
overhead, thus lending this (as is properly noted) to a specifically
targetted platform -- despite countless comparisons, this is not
intended to replace BGP, but to work alongside it, since BGP is far
more scalable. Another interesting feature is that as this work is
being done on a 'local' scale, the heuristics used to define &quot;failures=
&quot;
and &quot;faults&quot; may be tuned to the application, which provides much
better behavior than traditional BGP for certain domains.<br>
&nbsp;&nbsp; &nbsp;The gains of RON are narrow and slight, though
present. The basic conceit is reasonable, and so long as RON picks no
incorrect route, there is no reason to suppose that introducing 'delay'
in the form of user-level redirection would be expensive. However, the
good qualities of RON rely on the correct functioning of BGP, and the
paper (quietly) admits that the system is not scalable to replace the
current status quo. Thus this must be used in concert with present
systems as an additional layer; this too is reasonable, though
significantly less impressive when the central idea is realized to be
&quot;route around trouble spots in a reactive fashion&quot;. There are rea=
sons
to believe that the gains must be as slight as they are: for one thing,
to get truly optimal paths would be more expensive than simply using a
sub-optimal path; for another, since this must run over the internet
(and not dedicated fiber), there is a natural limit to the amount of
benefit that controlled switching can gain. Security is addressed only
to be dismissed, which is something of a shortfall: the internet
already exists, and so anything intending to improve its reliability
must address all of its shortcomings. This system is more easily
spoofed than the internet, since less physical labor is involved --
people actually request that you forward packets for them!-- and while
applications can (successfully) layer any defenses they'd like on to
insecure delivery mechanisms, the system is in the unique position of
providing secure primitives for not-much-more-work.<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;_One_Hop_Lookups_for_Peer-to-Peer-Overlays<br>
&nbsp;&nbsp; &nbsp;Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;The take-away from this paper is that it is
unnecessary to assume that routing information at each member node must
be kept small to ensure that the information is fresh. Indeed, the
bookkeeping required to keep this information current can be handled
through dissemenation trees, scaling well with even very large rates of
joins/leaves. This then dictates that lookups have a lower latency,
since we need not contact several nodes, but simply at most one other
node. We've seen the algorithm used here before, and the same comments
hold: namely, they do this by implementing a three level hierarchy
(slices and units of the address space, with each of k equal slices
having a slice leader which manages its k' units (each with leader);
each unit leader informs the members of its unit of all information
passed down from the slice leader) through a gossip-like protocol. This
is lower overhead than broadcasting every message, and utilizes caching
to combine several small messages into a single large message, but
suffers in the end as it places enormous strain on the unit- and slice-
leaders, who must collate a lot of information; similarly, if a slice-
or unit- leader fails, then it must be replaced --&nbsp; this is
expensive as they must maintain a lot of state.<br>
&nbsp;&nbsp; &nbsp;The paper's (legitimate) point is that you can
reduce the slice leader's enormous output requirement by tuning the
length of time it waits to accumulate messages in order to decrease the
amount of messages it must send. While this helps, it is not
sufficient: They choose 23 seconds for their accumulation time, which
is very long on a computer-scale, which defeats the point in many
applications of one hop. For an extreme example, in the previous paper
we were impressed by a few seconds for routing around minutes-long IP
blocks; this paper is firmly between the two. They also make the point
that in many environments, there is much less churn than in the general
internet. This is a boon for the paper, because the system will only
suffer when there are multiple concurrent events; in that case, the
slice- and unit- leaders are forced to question why they were willing
to serve in that position, since they are suddenly making enormous
increases to their workload. In the stable case, however, this is a
beautiful, sedentary algorithm -- which forces one to question why
anyone in a churn-heavy environment would volunteer for slice leader
duty?

------=_Part_761_6438695.1141874058512--

From sh366@cornell.edu  Thu Mar  9 00:06:31 2006
Return-Path: <sh366@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k2956Ut25542
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:06:30 -0500 (EST)
Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k2956Uvh027927
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:06:30 -0500 (EST)
Message-ID: <1029350632.1141880788098.JavaMail.webber@orpheus3.dataserver.cornell.edu>
Date: Thu, 9 Mar 2006 00:06:29 -0500 (EST)
From: Huang Shiang-Jia <sh366@cornell.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-Mailer: uPortal WEB email client 3.0

* Both of the papers to be discussed today are overlay routing systems that aim to improve Internet reliability, to detect and recover from path outage and degraded performance.
* RON works by 'deploying' nodes in different domains that monitor the quality of underlying Internet and route packets for each other. SOSR, instead, routes messages through a small set of 'randomly chosen' intermediaries when failure occurs.

<Issues of RON and SOSR>
* The background path monitoring overhead in RON is high. It is thus suitable only if the overlay is small-scaled. (As indicated in the paper, it scales to about 50 nodes only.)
* SOSR is less successful at routing around last-hop network failures. Its alternative routing scheme is thus suitable for some cases only (servers rather than broadband hosts, for instance).

<Summary of RON>
* RON (Resilient Overlay Network) is an application-level overlay that is designed (1) to quickly (within 20 seconds) detect and recover from path outage and (2) to improve the performance of loss rate, latency and throughput.
* Nodes residing in different routing domains comprise a RON. They (1) aggressively probe and monitor the path connecting them, (2) exchange information about the quality of paths among themselves, (3) build forwarding tables on the basis of several path metrics, such as latency, loss rate and throughput, and (4) forward data on behalf of any communicating nodes in the RON.
* RON tightly integrates the routing and path selection with applications. Applications can prioritize some metrics over others in their path selection. This demonstrates the benefits of moving some of the control over routing to end-systems.
* The experimental results show that forwarding packets via at most one RON node is sufficient to overcome faults and improve performance in most cases. Therefore part of their design focus on the finding better paths via a single intermediate RON node.

<Summary of SOSR>
* This paper first conducts a measurement study of Internet path failures and found that most Internet paths worked well (availabilities range from 99.6% for servers and 94.4% for broadband hosts).
* As no failure is the common case, they propose an approach that avoids the overhead to monitor paths and scales well: SOSR (Scalable One-hop Source Routing). SOSR attempts to recover from path failures by routing indirectly through a small set of random chosen intermediaries. Because the intermediaries are randomly chosen, no a priori knowledge of Internet states are needed.
* They show that a stateless policy called 'random-4' quickly finds alternative paths for failures. It works as follows: when a node detects a path failure, it selects 4 intermediaries and attempts to reroute messages through them, until (1) one of the intermediaries succeeds, (2) the path self-repairs or (3) four attempts have failed.
* According to their survey, broadband hosts experienced significantly more last-hop failures than servers. SOSR is less suitable for broadband hosts since it can't route around such last-hop failures.

From lackhand@gmail.com  Thu Mar  9 00:23:07 2006
Return-Path: <lackhand@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_00,HTML_00_10,
	HTML_MESSAGE autolearn=no version=3.1.0
X-Spam-Level: 
Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.207] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295N5t29926
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:23:05 -0500 (EST)
Received: by zproxy.gmail.com with SMTP id l1so381010nzf
        for <egs+summary@cs.cornell.edu>; Wed, 08 Mar 2006 21:23:05 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type;
        b=hPLbg/OJqLBD0XGEI8kyTfKiq4qIs1kmnfvDGAQcKT5STunsnI7fTex2Muwwae5C7639hkGzDg7oFhFZBPGMP+EPTyas0zOO/KG5RVsHD1hfktL07R/V+pSlkirOFUDAmvoNJICKVpRUFMXDx+YTvWyTIIn9siYvt1tkk8uKOPM=
Received: by 10.36.4.18 with SMTP id 18mr3001893nzd;
        Wed, 08 Mar 2006 21:23:04 -0800 (PST)
Received: by 10.36.147.10 with HTTP; Wed, 8 Mar 2006 21:23:04 -0800 (PST)
Message-ID: <9aa7a97d0603082123y6a898d90pf20b08e81b13664d@mail.gmail.com>
Date: Thu, 9 Mar 2006 00:23:04 -0500
From: "Andrew Cunningham" <lackhand@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative; 
	boundary="----=_Part_474_11781779.1141881784926"

------=_Part_474_11781779.1141881784926
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I *knew* something looked familiar!
Andrew Cunningham
arc39
    _Resilient_Overlay_Networks_
    David Andersen, Hari Balakrishnan, Frans Kaashoek, and Robert Morris

    RON lets distributed internet applications detect and recover from path
outages and periods of degraded performance much faster than current
performance tuning systems (scant seconds versus several minutes, minimum).
Though it exists on top of the internet, it operates as a sort of
application level switch, determining whether it would be better to route a
given packet over the internet directly, or through the overlay, to reach
its destination. This allows improvement over normal Internet routing, and
moreover, they discovered that forwarding packets by at most one hop along
this intermediate protocol (a->b->c versus a->c) was (sufficiently)
beneficial in most cases. This improvement comes at a price, namely a more
complex view of the network topology separating two points; rather than the
current system of using summaries, RON requires a richer view of
connectivity. The RON is explicitly designed to be limited in size, to
facilitate aggressive path maintenance via probing, without excessive
bandwidth overhead, thus lending this (as is properly noted) to a
specifically targetted platform -- despite countless comparisons, this is
not intended to replace BGP, but to work alongside it, since BGP is far mor=
e
scalable. Another interesting feature is that as this work is being done on
a 'local' scale, the heuristics used to define "failures" and "faults" may
be tuned to the application, which provides much better behavior than
traditional BGP for certain domains.
    The gains of RON are narrow and slight, though present. The basic
conceit is reasonable, and so long as RON picks no incorrect route, there i=
s
no reason to suppose that introducing 'delay' in the form of user-level
redirection would be expensive. However, the good qualities of RON rely on
the correct functioning of BGP, and the paper (quietly) admits that the
system is not scalable to replace the current status quo. Thus this must be
used in concert with present systems as an additional layer; this too is
reasonable, though significantly less impressive when the central idea is
realized to be "route around trouble spots in a reactive fashion". There ar=
e
reasons to believe that the gains must be as slight as they are: for one
thing, to get truly optimal paths would be more expensive than simply using
a sub-optimal path; for another, since this must run over the internet (and
not dedicated fiber), there is a natural limit to the amount of benefit tha=
t
controlled switching can gain. Security is addressed only to be dismissed,
which is something of a shortfall: the internet already exists, and so
anything intending to improve its reliability must address all of its
shortcomings. This system is more easily spoofed than the internet, since
less physical labor is involved -- people actually request that you forward
packets for them!-- and while applications can (successfully) layer any
defenses they'd like on to insecure delivery mechanisms, the system is in
the unique position of providing secure primitives for not-much-more-work.

    _One_Hop_Lookups_for_Peer-to-Peer-Overlays
    Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues

    The take-away from this paper is that it is unnecessary to assume that
routing information at each member node must be kept small to ensure that
the information is fresh. Indeed, the bookkeeping required to keep this
information current can be handled through dissemenation trees, scaling wel=
l
with even very large rates of joins/leaves. This then dictates that lookups
have a lower latency, since we need not contact several nodes, but simply a=
t
most one other node. We've seen the algorithm used here before, and the sam=
e
comments hold: namely, they do this by implementing a three level hierarchy
(slices and units of the address space, with each of k equal slices having =
a
slice leader which manages its k' units (each with leader); each unit leade=
r
informs the members of its unit of all information passed down from the
slice leader) through a gossip-like protocol. This is lower overhead than
broadcasting every message, and utilizes caching to combine several small
messages into a single large message, but suffers in the end as it places
enormous strain on the unit- and slice- leaders, who must collate a lot of
information; similarly, if a slice- or unit- leader fails, then it must be
replaced --  this is expensive as they must maintain a lot of state.
    The paper's (legitimate) point is that you can reduce the slice leader'=
s
enormous output requirement by tuning the length of time it waits to
accumulate messages in order to decrease the amount of messages it must
send. While this helps, it is not sufficient: They choose 23 seconds for
their accumulation time, which is very long on a computer-scale, which
defeats the point in many applications of one hop. For an extreme example,
in the previous paper we were impressed by a few seconds for routing around
minutes-long IP blocks; this paper is firmly between the two. They also mak=
e
the point that in many environments, there is much less churn than in the
general internet. This is a boon for the paper, because the system will onl=
y
suffer when there are multiple concurrent events; in that case, the slice-
and unit- leaders are forced to question why they were willing to serve in
that position, since they are suddenly making enormous increases to their
workload. In the stable case, however, this is a beautiful, sedentary
algorithm -- which forces one to question why anyone in a churn-heavy
environment would volunteer for slice leader duty?


 _Improving_the_Reliability_of_Internet_Paths_With_One_Hop_Source_Routing_
    Krishna P. Gummadi, Harsha V. Madhyastha, Steven D. Gribble, Henry M.
Levy, and David Wetherall

    This paper proposes a simple, scalable approach to recover from Interne=
t
path failures. It reiterates the take-away of the previous paper, ie, that
most path failures may be mitigated through the simple one-hop rerouting
technique. It improves on RON's background monitoring storm, thus increasin=
g
the level of scalability. Both are bounded by the discovery that many
failures are located that many failures are located so close to the
destination that no alternative routing or overlay scheme can avoid them:
16% of failures on paths to servers, and 60% of failures on paths to
broadband hosts were last-hop or end-system failures, which neither scheme
can address. However, 66% of all popular server failures and 39% of all
broadband host path failures are potentially soluble with a simple hop. Mor=
e
attention is also given to the analysis of how to choose alternative routin=
g
ideas; eventually, evidence dictates that attempting one-hop routing throug=
h
(iteratively) each of 4 separate nodes is effective.
    No attention is paid to security, which is somewhat of a problem since
this paper is building upon the current internet standards; since this is
built on top of the current internet stratum, it should make guarantees at
least as strong as the internet for the integrity, not just speed, of
transmitted data; this is not done. There is no workload sharing done -- I
presume that repeated connections to a given node are disallowed via that
node refusing to host further routes than some maximum. Since we only try 4
random-4 samples, this is a total of 16 nodes out of however many
participate in this system; this is unlikely to scale out of control but it
is possible for this system to deny services it could have granted (due to =
a
poor choice of random intermediaries).

------=_Part_474_11781779.1141881784926
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

I *knew* something looked familiar!<br>
Andrew Cunningham<br>
arc39<br>
&nbsp;&nbsp; &nbsp;_Resilient_Overlay_Networks_<br>
&nbsp;&nbsp; &nbsp;David Andersen, Hari Balakrishnan, Frans Kaashoek, and R=
obert Morris<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;RON lets distributed internet applications detect
and recover from path outages and periods of degraded performance much
faster than current performance tuning systems (scant seconds versus
several minutes, minimum). Though it exists on top of the internet, it
operates as a sort of application level switch, determining whether it
would be better to route a given packet over the internet directly, or
through the overlay, to reach its destination. This allows improvement
over normal Internet routing, and moreover, they discovered that
forwarding packets by at most one hop along this intermediate protocol
(a-&gt;b-&gt;c versus a-&gt;c) was (sufficiently) beneficial in most
cases. This improvement comes at a price, namely a more complex view of
the network topology separating two points; rather than the current
system of using summaries, RON requires a richer view of connectivity.
The RON is explicitly designed to be limited in size, to facilitate
aggressive path maintenance via probing, without excessive bandwidth
overhead, thus lending this (as is properly noted) to a specifically
targetted platform -- despite countless comparisons, this is not
intended to replace BGP, but to work alongside it, since BGP is far
more scalable. Another interesting feature is that as this work is
being done on a 'local' scale, the heuristics used to define &quot;failures=
&quot;
and &quot;faults&quot; may be tuned to the application, which provides much
better behavior than traditional BGP for certain domains.<br>
&nbsp;&nbsp; &nbsp;The gains of RON are narrow and slight, though
present. The basic conceit is reasonable, and so long as RON picks no
incorrect route, there is no reason to suppose that introducing 'delay'
in the form of user-level redirection would be expensive. However, the
good qualities of RON rely on the correct functioning of BGP, and the
paper (quietly) admits that the system is not scalable to replace the
current status quo. Thus this must be used in concert with present
systems as an additional layer; this too is reasonable, though
significantly less impressive when the central idea is realized to be
&quot;route around trouble spots in a reactive fashion&quot;. There are rea=
sons
to believe that the gains must be as slight as they are: for one thing,
to get truly optimal paths would be more expensive than simply using a
sub-optimal path; for another, since this must run over the internet
(and not dedicated fiber), there is a natural limit to the amount of
benefit that controlled switching can gain. Security is addressed only
to be dismissed, which is something of a shortfall: the internet
already exists, and so anything intending to improve its reliability
must address all of its shortcomings. This system is more easily
spoofed than the internet, since less physical labor is involved --
people actually request that you forward packets for them!-- and while
applications can (successfully) layer any defenses they'd like on to
insecure delivery mechanisms, the system is in the unique position of
providing secure primitives for not-much-more-work.<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;_One_Hop_Lookups_for_Peer-to-Peer-Overlays<br>
&nbsp;&nbsp; &nbsp;Anjali Gupta, Barbara Liskov, Rodrigo Rdrigues<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;The take-away from this paper is that it is
unnecessary to assume that routing information at each member node must
be kept small to ensure that the information is fresh. Indeed, the
bookkeeping required to keep this information current can be handled
through dissemenation trees, scaling well with even very large rates of
joins/leaves. This then dictates that lookups have a lower latency,
since we need not contact several nodes, but simply at most one other
node. We've seen the algorithm used here before, and the same comments
hold: namely, they do this by implementing a three level hierarchy
(slices and units of the address space, with each of k equal slices
having a slice leader which manages its k' units (each with leader);
each unit leader informs the members of its unit of all information
passed down from the slice leader) through a gossip-like protocol. This
is lower overhead than broadcasting every message, and utilizes caching
to combine several small messages into a single large message, but
suffers in the end as it places enormous strain on the unit- and slice-
leaders, who must collate a lot of information; similarly, if a slice-
or unit- leader fails, then it must be replaced --&nbsp; this is
expensive as they must maintain a lot of state.<br>
&nbsp;&nbsp; &nbsp;The paper's (legitimate) point is that you can
reduce the slice leader's enormous output requirement by tuning the
length of time it waits to accumulate messages in order to decrease the
amount of messages it must send. While this helps, it is not
sufficient: They choose 23 seconds for their accumulation time, which
is very long on a computer-scale, which defeats the point in many
applications of one hop. For an extreme example, in the previous paper
we were impressed by a few seconds for routing around minutes-long IP
blocks; this paper is firmly between the two. They also make the point
that in many environments, there is much less churn than in the general
internet. This is a boon for the paper, because the system will only
suffer when there are multiple concurrent events; in that case, the
slice- and unit- leaders are forced to question why they were willing
to serve in that position, since they are suddenly making enormous
increases to their workload. In the stable case, however, this is a
beautiful, sedentary algorithm -- which forces one to question why
anyone in a churn-heavy environment would volunteer for slice leader
duty?<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;_Improving_the_Reliability_of_Internet_Paths_With_One_Ho=
p_Source_Routing_<br>
&nbsp;&nbsp; &nbsp;Krishna P. Gummadi, Harsha V. Madhyastha, Steven D. Grib=
ble, Henry M. Levy, and David Wetherall<br>
&nbsp;&nbsp; &nbsp;<br>
&nbsp;&nbsp; &nbsp;This paper proposes a simple, scalable approach to
recover from Internet path failures. It reiterates the take-away of the
previous paper, ie, that most path failures may be mitigated through
the simple one-hop rerouting technique. It improves on RON's background
monitoring storm, thus increasing the level of scalability. Both are
bounded by the discovery that many failures are located that many
failures are located so close to the destination that no alternative
routing or overlay scheme can avoid them: 16% of failures on paths to
servers, and 60% of failures on paths to broadband hosts were last-hop
or end-system failures, which neither scheme can address. However, 66%
of all popular server failures and 39% of all broadband host path
failures are potentially soluble with a simple hop. More attention is
also given to the analysis of how to choose alternative routing ideas;
eventually, evidence dictates that attempting one-hop routing through
(iteratively) each of 4 separate nodes is effective.<br>
&nbsp;&nbsp; &nbsp;No attention is paid to security, which is somewhat
of a problem since this paper is building upon the current internet
standards; since this is built on top of the current internet stratum,
it should make guarantees at least as strong as the internet for the
integrity, not just speed, of transmitted data; this is not done. There
is no workload sharing done -- I presume that repeated connections to a
given node are disallowed via that node refusing to host further routes
than some maximum. Since we only try 4 random-4 samples, this is a
total of 16 nodes out of however many participate in this system; this
is unlikely to scale out of control but it is possible for this system
to deny services it could have granted (due to a poor choice of random
intermediaries).

------=_Part_474_11781779.1141881784926--

From ids3@cornell.edu  Thu Mar  9 00:30:16 2006
Return-Path: <ids3@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295UGt01279
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:30:16 -0500 (EST)
Received: from [128.253.212.208] (r253212208.resnet.cornell.edu [128.253.212.208])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k295UFEb024033
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:30:15 -0500 (EST)
Message-ID: <440FBD68.70107@cornell.edu>
Date: Thu, 09 Mar 2006 00:30:16 -0500
From: Ivan Stoyanov <ids3@cornell.edu>
User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Resilient Overlay Networks

Lower level routing algorithms like BGP are sometimes slow to respond to 
failures and reconfigure themselves to use alternative paths. The paper 
proposes a "do-it-yourself" solution to this problem, where nodes in a 
distributed application form an overlay which can be used as an 
alternative routing substrate. Nodes monitor the state of the links 
between themselves and choose an alternative path when an outage occurs. 
This happens much faster then the adaptation of the underlying routing. 
Since the system keeps n^2 number of links only a single node is enough 
to form an alternative path.

The tight coupling between the client application and the routing 
provides additional benefits like the ability to choose a specific path 
metric for each application, as well as application specific notions of 
what constitutes a network failure. Overlay routing also allows routes 
to use private peering paths between different autonomous systems that 
are unavailable for traditional routing.

The paper presents a great and simple idea. The biggest drawback is that 
it proposes that every node knows every other node, which does not scale 
well.


Improving the Reliability of Internet Paths with One-hop Source Routing

The paper presents optimization over RON. Its main point is that 
maintaining n^2 links is unnecessary. Instead, the authors suggest 
keeping a smaller number of links would be enough because the source 
node can try its peers in sequence until it find a path to the destination.

The problem that the papers solves seems very much like searching in an 
unstructured p2p network. Perhaps a simple flood will be much more 
efficient. It is not expensive in this case, because it will happen 
rarely. In addition, it guarantees that the destination will be reached 
in the shortest possible time. The path found will also most likely be 
the one with lowest latency.


From asr32@cornell.edu  Thu Mar  9 00:30:30 2006
Return-Path: <asr32@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k295UUt01327
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:30:30 -0500 (EST)
Received: from dreadnought.cornell.edu (r253240123.resnet.cornell.edu [128.253.240.123])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k295UTAi024064
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 00:30:30 -0500 (EST)
Message-Id: <6.2.1.2.2.20060307201509.0324a090@postoffice8.mail.cornell.edu>
X-Mailer: QUALCOMM Windows Eudora Version 6.2.1.2
Date: Thu, 09 Mar 2006 00:28:29 -0500
To: egs+summary@cs.cornell.edu
From: Ari Rabkin <asr32@cornell.edu>
Subject: PAPER 13
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format=flowed


RON:
         A RON is a resilient overlay network; an overlay network designed 
to have data tunneled through it.  By doing routing at the application 
layer instead of the network layer, a RON can adapt to bad links much more 
quickly than a lower-level protocol such as BGP.  In addition, a RON is 
able to employ much more flexible criteria about routing, thus allowing use 
of routes that are not publicly advertised.
         It is unclear how well RONs would work if deployed on a larger 
scale.  How would the internet's routing perform if many nodes were rapidly 
altering their traffic patterns in response to perceived poor 
service?   And if it would work well, why not simply alter BGP, rather than 
do this at the application layer?  Also, using a RON requires trusting that 
nodes are not lying about the origin of their data to get around awkward 
usage policies.  This, as the authors say, is possible in small systems, 
but presumably would be awkward as RON systems become larger and more common.


One-hop:

         How much benefit can we get fro mclever routing?  The authors of 
this paper did extensive path traces from Planetlab nodes to triangulate 
routing failures.  As it happens, a significant fraction of server 
failures, and a large preponderance of broadband user machine failures, are 
end-system or last-link failures, which no routing algorithm can 
ameliorate.  For those failures which can be fixed by routing, picking four 
nodes at random and trying to route through them ("one-hop" indirection) 
worked 92% of the time.  The authors conclude that trying to route through 
a small number of random peers is often an effective technique for routing 
around failures.  Interestingly, if one is trying more than 2 or 3 indirect 
routes, cleverness does not seem to help; it's hard to do better than 
choosing indirection points at random.  This suggests that a RON may in 
fact be overkill for most applications needing higher reliability.
         How well would this system work for those of us who aren't running 
on planetlab?  For the system to work, nodes must have some set of 'peers' 
who are willing to route for them, with whom they have tolerably fast 
connections, but who are far enough away in the network graph that routing 
through them will miss the damaged link[s].  Moreover, they must be willing 
to forward data.  This is by no means impossible, but neither is it 
straightforward.  Unlike disk space, bandwidth is an ephemeral resource, 
and particularly in this case, used seldom.  Therefore a straightforward 
swap seems inapplicable.


Ari Rabkin  asr32@cornell.edu      Risley Hall 454   3-2842

The resources of civilization are not yet exhausted.
         --William Gladstone  

From ns253@cornell.edu  Thu Mar  9 01:18:41 2006
Return-Path: <ns253@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k296Iet14762
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 01:18:40 -0500 (EST)
Received: from localhost (cpe-69-207-49-126.twcny.res.rr.com [69.207.49.126])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k296Idu9027581
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 01:18:40 -0500 (EST)
Date: Thu, 9 Mar 2006 01:18:40 -0500
From: Niranjan Sivakumar <ns253@cornell.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
Message-Id: <20060309011840.d4666ebb.ns253@cornell.edu>
Organization: Cornell Law School
X-Mailer: Sylpheed version 2.2.0 (GTK+ 2.8.13; i686-pc-linux-gnu)
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit

Niranjan Sivakumar

Resilient Overlay Networks
Improving the Reliability of Internet Paths with One-hop Source Routing

A Resilient Overlay Network (RON) is designed to run on top of the existing Internet and facilitate avoiding and dealing with instability and outages that may not be efficiently resolved with existing Internet protocols.  Nodes in different segments of the network are selected to be a part of the RON.  These nodes actively monitor path conditions in order to aid their forwarding of packets that they receive from any other RON clients.  Forwarding can be done based on some particular metrics or for specified applications.  Types of traffic on links can be limited through a policy routing mechanism that is implemented.  Experimental results show that this approach is able to deal with some some types of outages and performance issues better than simply relying on BGP and the current Internet routing infrastructure.

Scalable One-hop Source Routing (SOSR) is another technique to deal with routing around problems in a way that is more scalable and efficient than a RON.  FIrst, the researchers investigated failures on the Internet to get an idea of what problem they were actually trying to solve.  They found that a large number of failures cannot be routed around because they are so close to the destination.  To deal with those errors that can be routed around, in the event of a failure, traffic is attempted to be routed through k randomly chosen intermediaries (random-k).  In this case, random-4 was determined to be a balanced choice that performed reasonably well.  The theory is that if traffic can be routed to an intermediary that is sufficiently far away from the problem area of the network, another route will be found to the destination.  Some more sophisticated routing methods were tested (history-k and BGP-paths-k), but it was shown that the random-k system performs comparably and i!
 s less complex.

As noted in the SOSR paper, one of the main issues with RON is its inability to scale well.  Some issues with SOSR seem to be that its simplicity takes away some of the advantages seen in RON, such as adjusting routing based on some metrics.  Also, for an implementation on a large scale on the Internet, it is not clear what the incentives are to participate in this system, even though the overhead is considered to be "negligible."

From nsg7@cornell.edu  Thu Mar  9 11:07:27 2006
Return-Path: <nsg7@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29G7Rt03561
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:07:27 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29G7MlG012526
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:07:23 -0500 (EST)
Received: from 128.84.154.13 (proxying for unknown)
        by webmail.cornell.edu with HTTP;
        Thu, 9 Mar 2006 11:07:23 -0500 (EST)
Message-ID: <28009.128.84.154.13.1141920443.squirrel@webmail.cornell.edu>
Date: Thu, 9 Mar 2006 11:07:23 -0500 (EST)
Subject: PAPER 13
From: "Nicholas S Gerner" <nsg7@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Resilient Overlay Networks (RON) provides an application level overlay for
general data routing on the internet.  Specifically Andersen et al.
identify problems with BGP, the underlying internet routing protocol. 
Specifically, path failures are frequently not detected and repaired by
BGP for tens of seconds.  RON seeks to address some of these issues.  RON
is implemented using FreeBSD's Divert Socket software to intercept IP
traffic so that RON can perform its own application level routing
(providing an IP abstraction to higher level applications).  RON forwards
packets to the final estination using standard IP in most cases.  However,
if a link or path failure is detected RON is able to use an intermediary
RON node to forward packets.  This failure detection and intermediary
route discover is accomplished through a full connectivity overlay where
every node exchanges probes with every other node (incurring O(N^2)
overhead) to detect failures and measure several metrics (latency,
packet-loss and throughput) for route selection.  These metrics can be
augmented by application level metrics and intermediate nodes not
supporting these metrics can fallback on some application defined default.
 RON also includes route policies to decide which paths are acceptable
when choosing an indirection (such as disallowing commercial traffic on
Internet2 links).  Anderson et al. show that RON can route around most
underlying network failures (50-60% of such failures) in two test
deployments of 12 and 16 nodes.

However, this improvement is at the cost of significant overhead
(estimated at 2.2Kbps for N=10 to 33Kbps for N=50, scaling with O(N^2) and
latency (200ms for their implementation).  Anderson et al. characterize
this cost as a tradeoff for performance, however the analysis and
evaluation of such a tradeoff and the parameters involved are not shown in
detail.  In the end RON is not presented as a scalable solution for many
nodes and is instead suggested for deployment in small applications such
as conferencing.  However many of these applications may not be able to
tolerate additional overhead for failures which may be unlikely over the
short lifetime of some of these applications.

In "Improving the Reliability of Internet Paths..." Gummadi et al. present
two important contributions.  First a simple study of internet
connectivity explores routing failures from different vantage points.  The
data from this study is used to analyze the effectiveness of several
indirection strategies in the face of routing failures.  Specifically,
given a set of "vantage points" probes are sent to a diverse set of
internet destinations from vantage points.  When connectivity failure is
encountered alternative probes are sent from different vantage points and
the path from source to stination is probed via a modified traceroute. 
Gumamdi et al. conclude that only between 40 and 60% of connectivity
failures can be recovered (because last-hop connectivity has failed for
which no indirection routing could be performed).  To address this
random-4 indirection is presented which consists of randomly and in
parallel choosing four random other hosts as one-hop indirection hosts to
attempt to recover.  This stateless and low-cost indirection mechanism is
shown to be very competetive with stateful and much more expensive
alternative strategies.  Second, the SOSR implementation is presented
which is a kernel level routing indirection which automatically contacts a
configured intermediary according to random-4 to forward packets NAT-style
in the face of failures.  This indirection is transparent to the user
application and the destination host.  Experimentation suggests that SOSR
recovers from 56% of network level failures (slightly below the measured
66% which could have been recoverable).

"Improving the Relaiablity of Internet Paths.." makes important
contributions to this space.  Specifically a careful survey of network
failures is made characterizing the frequency of failures and the
percentage of those recoverable by indirection and the random-4 protocol
which suggests that these failures can be overcome by simple, stateless
mechanisms.  However, the experimental study presented only considers
between 383 and 486 failure instances over a 72 hour period.  These
failures account for between .14 and .18% of the interactions attempted. 
The improvement shown by SOSR is a .04% increase in end reliability with a
.01% increase in application level failures.  These margins may be on the
order of statistically insignificant differences, and no such hypothesis
test is presented to show otherwise or what margin can be inferred to
generalize.  The increase in application level failures is worrysome and
is not explored in-depth.  Such failures could be much larger for other
classes of applications.

From tc99@cornell.edu  Thu Mar  9 11:35:33 2006
Return-Path: <tc99@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GZWt13200
	for <egs@cs.cornell.edu>; Thu, 9 Mar 2006 11:35:32 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29GZV5M002590
	for <egs@cs.cornell.edu>; Thu, 9 Mar 2006 11:35:31 -0500 (EST)
Received: from 128.84.153.30
        by webmail.cornell.edu with HTTP;
        Thu, 9 Mar 2006 11:35:32 -0500 (EST)
Message-ID: <1107.128.84.153.30.1141922132.squirrel@webmail.cornell.edu>
Date: Thu, 9 Mar 2006 11:35:32 -0500 (EST)
Subject: paper 13
From: "Theodore Ming Shiuan Chao" <tc99@cornell.edu>
To: egs@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Neither of the two papers, RON and One-Hop OSDI, deal with P2P. Rather,
they deal with the background internet routing between physical links that
all the P2P papers gloss over when they deal with the virtual links.

Both the papers seek to improve the fault-detection and recovery that
exist in current internet routing protocols (BGP). However, the two papers
go about it in different ways. RON uses active monitoring and storing
link-states to route around failures while One-Hop OSDI uses multiple
redundant alternative paths in the case of failures. While RON gains
policy flexibility and active routing around problematic paths, it does so
at the expense of scalability and a polynomial growth in the number of
nodes for the background traffic required to keep the RON network updated.
One-Hop OSDI doesn't require any background traffic, but it does not
actively route around failed paths. Instead, when node detects a failed
path, it sends it to a number of randomly selected intermediaries to
attempt to route around the failed link(s). However, One-Hop generates
additional traffic when a path failure is detected (on the order of the k
intermediaries they concurrently route to).

From kelvinso@gmail.com  Thu Mar  9 11:42:11 2006
Return-Path: <kelvinso@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from wproxy.gmail.com (wproxy.gmail.com [64.233.184.200] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GgBt15831
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:42:11 -0500 (EST)
Received: by wproxy.gmail.com with SMTP id 69so736431wri
        for <egs+summary@cs.cornell.edu>; Thu, 09 Mar 2006 08:42:11 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
        b=uPpyrDmz47+Qegq55JMNoWWGr3CSSVv7ReBzs+z3HRIx9cPggIEokAGBqSIuGxw3iGOXbOLCvf0wLOTwNelyjjUNmRwnyPhaWzqoX20vJ51Ty5zHrtWVMirXW9I8fnm8HVSRZAjtJZ/cPAcWuzM9b2DvQIJDbQq8G0W99Sh3Xl0=
Received: by 10.54.105.3 with SMTP id d3mr2379519wrc;
        Thu, 09 Mar 2006 08:42:10 -0800 (PST)
Received: by 10.54.80.9 with HTTP; Thu, 9 Mar 2006 08:42:10 -0800 (PST)
Message-ID: <6e1ca4560603090842g27c40a6wbb2dc8b2199902a6@mail.gmail.com>
Date: Thu, 9 Mar 2006 11:42:10 -0500
From: "Chiu Wah Kelvin So" <kelvinso@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: Paper 13
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k29GgBt15831

	The first paper, "Resilient Overlay Networks," presents an
architecture that detects and recovers from path failures using
overlay routing. To detects and recovers from path failures quickly,
each node in Resilient Overlay Networks (RONs) actively probes and
monitors paths, which include the latency, loss rate, and throughput,
to all other members. These aggressive probing may lead to excessive
bandwidth overhead when the size of membership in RONs is large.
Therefore, RONs decides to limit the number of nodes participated in
RONs under 50 nodes. Similar to link-state routing protocol, nodes
construct routing table by periodically exchanging routing information
of the different performance metrics to the other nodes. When nodes
detect failure of a path, it can use other nodes to route through the
failure.
        Not only RONs allow application to recover from path failure,
it also allows application to route based on specific application.
Since RONs keep track of various performance metrics of paths,
application can route through different paths based on
application-need. For example, some applications may find low latency
is more important while others application may find loss rate more
important. Also, when a packet enters the network, it is given a
particular policy tag and it can only route through the path where the
type of traffic is allowed. However, the drawback of the system is
that it does not scale as more members join the RONs because of the
aggressive probing. Also, it assumes all members in RONs are
cooperative.
        The second paper, "Improving the Reliability of Internet Paths
with One-hop Source Routing," presents an architecture that improves
path failure by routing one more hop. This paper first presents a
measurement study on 67 nodes in PlanetLab probing 3000 other nodes.
The main point of the study shows that 66% of path failure can be
recover by one extra hope through one of the 67 nodes in PlanetLab.
They also shows that broadband hosts tend to have more last-hop
failure which is not recoverable by extra hops. From the study, they
implement an architecture where if a node fails to send packets to
other nodes, then it will randomly pick k of the 76 nodes to route the
packet. Since about 66% of path failure can be recovered by one extra
hope from the studies, it can reduce path failure by 60%. This simple
technique can avoid most path failures. Because SOSR does not have any
background traffic, it can scale a lot better than RONs. However, the
intermediate nodes have to be server class node or else it is not
reliable to use them to route packets.

From km266@cornell.edu  Thu Mar  9 11:44:48 2006
Return-Path: <km266@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.3 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29Gimt16265
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:44:48 -0500 (EST)
Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k29GilW2015011
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:44:48 -0500 (EST)
Message-ID: <000301c64398$c733d270$4425cf45@KEVSTOY>
Reply-To: "Kevin" <yobz@cornell.edu>
From: "Kevin" <km266@cornell.edu>
To: <egs+summary@cs.cornell.edu>
Subject: PAPER 13
Date: Thu, 9 Mar 2006 11:44:47 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2527
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527

RON creates an overlay on top of the internet that actively looks for better 
paths to route packets.  It keeps a complete list of neighbors and 
periodically (very often, actually) pings them to determine whether they are 
down.  If they are down, then another set of packets is sent to determine 
whether it was a freak accident (some minimal packet loss) or a real outage. 
This second stream of packets is done in quick succession and once it has 
determined that the path is down, an alternate path to the destination is 
picked.  Because of this active approach and its application-level 
implementation, RON can determine whether it wants to send packets to a 
certain host depending on latency, throughput, or packet loss.  FTP, for 
example, should have a high throughput while latency does not matter.   On 
the other hand, we have HTTP which cares about latency but generally does 
not care about throughput since most web pages are small in size.  You can 
also imagine an application that needs to get a packet to its destination, 
no matter how long or how slowly it takes to get there, that would favor a 
packet-loss light link.  RON reroutes packets with just one intermediary 
node, claiming that this was sufficient to see marked improvement.  In the 
results section, RON is shown to have route packets much more reliably than 
the current BGP system.  It delivers packets quicker and recovers from 
faults much quicker (on average less than 20 seconds as opposed to several 
minutes).
    The major downfall of RON seems to be the massive amount of background 
information that is flowing through it.  The authors put a chart of 
bandwidth usage vs number of nodes in the system.  At 50 nodes, the system 
is using ~35kpbs, which is incredible.  They note that this is one tenth of 
the current broadband speed.  This seems too high for the gains the system 
produces.  My cable modem seems to have problem getting to the internet 
about once a week for half an hour (as a pessimistic estimate, it is usually 
more like once a month for 15 minutes).  My upload bandwidth is close to 
150kpbs.  I would not want to allocate almost a quarter of my bandwidth for 
those thirty minute of internet connectivity.  Furthermore, it is usually 
the gateway I am connected to at my ISP that is down.  No internet traffic 
can get to me and vice versa: RON would not be able to solve that problem. 
RON allows quicker and more robust communication at the cost of bandwidth, a 
tradeoff that might be made when bandwidth is cheaper.

SOSR takes a different approach than RON.  It seems to be more on the 
passive rather than overly active side of things.  The paper first delves 
into a detailed study of link failures and packet loss.  They created their 
own traceroute program to actively find the node that is down in order to 
determine exactly if it was possible to route around the failure.  They 
found that a large part of failures are at last-hop or end-systems.  These 
kinds of failures cannot be routed around.  If my computer or my ISP goes 
down, there is no way for anyone to reach me.  They suggest multi-homing 
(having multiple ISPs) as a solution to this, noting that there is no 
routing protocol that can handle this kind of failure.  The paper finds that 
active RON-like systems have huge overhead and decide to develop a system 
that has no a-priori knowledge of internet states.  They create something 
called random-k.  Basically, when a path outage is detected, you route to k 
random intermediaries that forward the packet for you.  Unless all k 
intermediaries fail to route the packet or the path reapirs itself, the 
packet is successfully routed.  You can then talk through the successful 
intermediary in order to get your packets out to your destination.  They 
found that round-4 was the best tradeoff and worked well.  In the end, the 
system's passive approach scales better.
    SOSR is not as successful as RON in getting a path to a destination. 
This seems obvious from RON's active pinging and fault recovery.  On the 
other hand, SOSR has minimal bandwidth overhead.  The lack of security seems 
to also be an issue, but something that might be better for a followup 
paper. 

From victoria@cs.hmc.edu  Thu Mar  9 11:49:28 2006
Return-Path: <victoria@cs.hmc.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29GnRt17843
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 11:49:28 -0500 (EST)
Received: by turing.cs.hmc.edu (Postfix, from userid 34382)
	id 7B26053267; Thu,  9 Mar 2006 08:49:21 -0800 (PST)
Date: Thu, 9 Mar 2006 08:49:18 -0800
From: Victoria Krafft <vmk@cs.hmc.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
Message-ID: <20060309164918.GA16553@cs.hmc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i

With the current BGP routing protocols, the Internet can take several
minutes to reconfigure the routes used when failures occur.  Andersen
et al. presents a scheme for building resilient overlay networks
(RONs), which will detect failures, and recover from them, much more
rapidly.  Essentially, a RON is made up of several nodes in different
routing domains.  Those nodes form virtual links to each other, and
route data between nodes.  They keep track of link performance, and
re-route to get around virtual link failures, and to improve
performance.

While the results of their studies show that RONs improve performance
for the nodes connected to the RON, they have not examined the extra
load a RON places on a network.  The aggressive probing of link
conditions is likely to slow down other network traffic, and place
increased load on the network.  While this may be a worthwhile
tradeoff for some applications, it should probably not be used to
improve general Internet traffic.  In addition, RONs only work for a
relatively small set of nodes.  The authors propose running multiple
RONs to get around this limitation, but that would use up even more
resources from the underlying networks.

Gummadi et al. present a much less resource intensive scheme for
handling failures, one-hop routing.  In this plan, if the direct route
to a host fails, then the machine trying to reach it will attempt to
route the packet through some number of intermediary nodes; if none of
the intermediaries can reach the destination either, then the packet
is lost.  Their experiments show that if four intermediaries (out of
the 39) are randomly chosen, and the packet is routed through them,
then about 60% of failures to reach popular servers could be routed
around.  Only 66% of these failures could possibly be routed around;
the rest are failures too close to the server.

The characterization of Internet failures in this paper is also
interesting; it shows that for most broadband hosts, and many popular
servers, routing protocols cause temporary, and unnecessary, outages
fairly frequently.  Network reliability for popular servers is only
about 99.6%, which suggests that the Internet is still inherently
unreliable, even for very popular sites, which presumably have
expensive and high-quality connections to the rest of the network.

-- Victoria Krafft

From asg46@cornell.edu  Thu Mar  9 12:15:12 2006
Return-Path: <asg46@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29HFCt26474
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 12:15:12 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k29HFA5h029537
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 12:15:10 -0500 (EST)
Received: from 128.84.98.90
        by webmail.cornell.edu with HTTP;
        Thu, 9 Mar 2006 12:15:11 -0500 (EST)
Message-ID: <4719.128.84.98.90.1141924511.squirrel@webmail.cornell.edu>
Date: Thu, 9 Mar 2006 12:15:11 -0500 (EST)
Subject: paper 13 - routing
From: "Abhishek Santosh Gupta" <asg46@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

RON

the main goal of a RON is to enable a group of nodes to communicate with
each other in the face of problems with the underlying Internet paths
connecting them.

RON nodes exchange information about the quality of the path among
themselves via a link-state routing protocol and build forwarding tables
on a variety of metrics. In order to constrain the amount of bandwidth
overhead, the authors suggest limiting the RON to less than 50 nodes.
This scheme is not scalable for networks with very large peers.
it also requires each node or a local group of nodes to create databases
to store samples related to various metrics. the databases ( in case of a
group of nodes) serve as a single point of failure and attack.


It assumes that nodes in a RON will not be malicious since malicious
behavior could be easily detected in a such small world. the paper does
not explicitly deal with such detection mechanisms especially in the case
of nodes who might forward garbage information (related to path quality).


Optimizations are achieved using flow-cache entries.


One-hop Source Routing

the paper carries out experiments to determine the location of path
failures while routing. the authors distinguish between a true pah failure
and short-lived congestion using multiple probes. TCP-ACK packets were
used instead of UDP for various reasons such as firewalls and routers
dropping them.


 basic idea:
once a node detects a path failure, it selects k intermediaries and
attempts to reroute packets through them. If the new indirect path is
sufficiently disjoint from the older direct path, the faulty component
will be avoided resulting in successful communication.

better policies:
maintain a history of the intermediary that correctly forwarded to a
particular destination and set this intermediary as the k th intermediary
for subsequent routing to the same destination.
(although the authors do not discuss as to how a node detects that a
particular intermediary was successful given that it forwards to all the
intermediaries in parallel - does the response returned from the
destination indicate this intermediary ?  - or does the intermediary
himself inform about successful routing ?)

experiments show that for 3000 nodes random-4 (k=4 here) provides good
results and it should be invoked after having observed just a single
packet drop.


the value of the k would be different for different sized networks. The
authors have not provided mathematical equations for the value of k and
the frequency/eagerness of invoke for different sized networks.
k should be set based on the number of nodes in the network for optimal
performance.
(randomly choosing k will improve performance by a small amount only)
Knowing the number of nodes in the network (since it can change
dynamically) seems difficult.


From tudorm@cs.cornell.edu  Thu Mar  9 12:37:31 2006
Return-Path: <tudorm@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29HbUt03245
	for <egs@unix.cucs>; Thu, 9 Mar 2006 12:37:31 -0500 (EST)
Received: from exchfe2.cs.cornell.edu ([128.84.97.28]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 9 Mar 2006 12:37:30 -0500
Received: from [128.84.223.148] ([128.84.223.148]) by exchfe2.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 9 Mar 2006 12:37:30 -0500
Message-ID: <441067D9.8040707@cs.cornell.edu>
Date: Thu, 09 Mar 2006 12:37:29 -0500
From: Tudor Marian <tudorm@cs.cornell.edu>
User-Agent: Thunderbird 1.5 (X11/20051201)
MIME-Version: 1.0
To: egs@cs.cornell.edu
Subject: PAPER 13
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 09 Mar 2006 17:37:30.0384 (UTC) FILETIME=[23F6C100:01C643A0]

<RON>

RON is an architecture that improves the reliability of underlying 
Internet packet transport by detecting and recovering from transient 
path failures more rapidly than BGP does. It works by deploying a set of 
nodes  into an app-level overlay network, and having them aggressively 
probing and monitoring the quality of the underlying links between 
themselves. Packets are routed according to application specified 
metrics, either by direct sink into the Internet or by way of some other 
RON node.

The paper found that forwarding packets via at most one RON node was 
sufficient to overcome faults and improve performance in most of the 
cases, moreover the link failure detection and recovery took on average 
18 seconds as opposed to tens of minutes it takes BGP to recover.

One can argue that the architecture is as dirty as it gets, even if it 
is a good hack that works it only does so because of the shortcomings of 
the BGP.

</RON>

<One Hop Source Routing>

The paper proposes a simple approach to recover from Internet path 
failures, called one-hop source routing. The way it works is by routing 
indirectly through a small set of randomly chosen peers, and in contrast 
to RON, performs no background path monitoring, thereby scaling well and 
avoiding the overhead in the common case of no failures. In order to 
motivate the approach, the paper starts by presenting a measurement 
study of Internet path failures and then evaluate their implementation 
prototype on PlanetLab. They conclude that although 16% of the failures 
on paths to servers and 60% on paths to broadband hosts are 
unrecoverable since it's impossible to route around the last hop, the 
simple random-4 technique was able to recover from 61% of the path 
failures for popular servers and 35% for broadband hosts.

Arguably, the netfilter choice of implementation is much more elegant 
than the one chosen by RON.

</One Hop Source Routing>

Tudor

From ymir.vigfusson@gmail.com  Thu Mar  9 12:39:41 2006
Return-Path: <ymir.vigfusson@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.7 required=5.0 tests=AWL,BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from nproxy.gmail.com (nproxy.gmail.com [64.233.182.201] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29Hdft03556
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 12:39:41 -0500 (EST)
Received: by nproxy.gmail.com with SMTP id x4so402520nfb
        for <egs+summary@cs.cornell.edu>; Thu, 09 Mar 2006 09:39:40 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
        b=RpxAj2UoM2gaVBvnlX4P3SQVLNdhWVg68avVSzBFfRYmqJ4zKNQ7+Yt8ELCJhxJ6JoySztH8TQ/2lnDKdh++NRw9kRHnNctHNeH/FJH8vRv2xjKXboem6u3s9VxnERQzk+0HpOrbZlx/8vup4Br+YrUO0qVg6XciRWAzNcFbsdA=
Received: by 10.48.199.10 with SMTP id w10mr1002532nff;
        Thu, 09 Mar 2006 09:39:39 -0800 (PST)
Received: by 10.48.246.18 with HTTP; Thu, 9 Mar 2006 09:39:39 -0800 (PST)
Message-ID: <9302f1e20603090939t37a9dd3etd11130968469a21a@mail.gmail.com>
Date: Thu, 9 Mar 2006 12:39:39 -0500
From: "Ymir Vigfusson" <ymir.vigfusson@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 13
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k29Hdft03556

RON (Resilient Overlay Network) is an application-layer overlay
network designed to detect and recover from network failures
(namely link failures and performance failures) quickly. While
the underlying AS/BGP architecture of the Internet is designed
for the same purpose, the reroute recovery time is in the order of
several minutes due to scalability issues whereas RON does this
in the order of several seconds (namely 18 seconds on average
in the experiment with 12/16 RON nodes and 132/240 distinct paths).
This agility is accomplished by aggresive probing and monitoring
of the paths that connects the RON nodes. The paper cites results
from Labovitz et al. that 10% of considered routes on the Internet
were available less than 95% of the time, and 35% of all routes
were unavailable more than 99.99% of the time. Furthermore,
40% of path outages take longer than 30 minutes to repair.
RON nodes exchange information about the quality of the paths they know.
The authors present results where at most one intermediate RON node is
sufficient to improve routing paths. RON, unlike BGP-4, is capable of
expressive fine-grained policies aimed at users or hosts.
Another feature of RON is integration with applications, since they
may prioritize different metrics over others (e.g. latency, low loss
or throughput) of which RON enables simultaneous optimization.

The implementation of RON was explicitly designed to be limited
in size, and an underlying assumption in the paper is that RON
is small relative to the Internet. There is a huge amount of background
data and paths that needs to be recorded so its scalability seems to be
a large drawback. Additionally, it provides a means of 'selfish
routing' which may be suspect to downsides like the Braess' paradox
where adding a cheap link to the network slows down traffic.
However, Tim Roughgarden and Eva Tardos showed that this should only
make latency 4/3 times the best coordinated routing.


The second paper "Improving the Reliability of Internet Path with
One-hop Source Routing" starts off by investigating path failures
on the Internet. From a 7 day data of probing paths from PlanetLab
servers to a set of destination hosts, it turns out that the vast
majority of paths suffered at least one failure. Many failures
(16% of failures to servers and 60% of failures to broadband hosts)
occurred close to the destination (last-hop), so alternative
routing/overlay schemes may not work to alleviate them.
The paper then proposes a 'one-hop source routing' scheme called SOSR
where a node tries to reroute packets through one or more
intermediaries after it discovers a path failure. If the paths are
disjoint and the intermediary also avoids faults, end-to-end
communication is restored. It turns out that the most effective policy
is to pick the intermediaries at random (smarter policies don't buy
much extra), and the paper in particular picks 4 intermediaries as a
good value deduced by the failure sample. If all of the intermediaries
fail, the policy is to wait for the default path to recover itself.
Notice how the policy does not need to keep track of state.
The paper concludes by evaluating an implementation of SOSR on Linux
which was able to reduce recovable failures by 56%, close to the
predicted value. The paper does not give any theoretical insight into
the failures observed or theoretical basis for why one-hop source
routing may be viable.

From xthemage@mac.com  Thu Mar  9 13:15:18 2006
Return-Path: <xthemage@mac.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-0.9 required=5.0 tests=BAYES_00,DNS_FROM_RFC_POST 
	autolearn=no version=3.1.0
X-Spam-Level: 
Received: from smtpout.mac.com (smtpout.mac.com [17.250.248.97] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k29IFHt15636
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 13:15:17 -0500 (EST)
Received: from mac.com (smtpin01-en2 [10.13.10.146])
	by smtpout.mac.com (Xserve/8.12.11/smtpout06/MantshX 4.0) with ESMTP id k29IFG90021718
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 10:15:16 -0800 (PST)
Received: from [128.84.98.36] (dhcp98-36.cs.cornell.edu [128.84.98.36])
	(authenticated bits=0)
	by mac.com (Xserve/smtpin01/MantshX 4.0) with ESMTP id k29IFDKl025963
	for <egs+summary@cs.cornell.edu>; Thu, 9 Mar 2006 10:15:16 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Transfer-Encoding: 7bit
Message-Id: <8490D031-582C-48E1-8E1E-3B7C49D54CF2@mac.com>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: egs+summary@cs.cornell.edu
From: Oliver Kennedy <xthemage@mac.com>
Subject: PAPER 13
Date: Thu, 9 Mar 2006 13:15:09 -0500
X-Mailer: Apple Mail (2.746.2)

RON is based on the assumption that the current internet routing  
protocol BGP is crap.  To an extent, this is true.  BGP is designed  
to operate with minimum overhead, and has to deal with the economic  
policies of all the organizations that provide connections along a  
particular route.  Consequently, sub-optimal routes are often chosen  
due to out of date information or unusual policy decisions.  RON  
attempts to out-think BGP by choosing its own path, and forcing BGP  
to route packets accordingly.  It does this by creating an overlay  
network.  Clients insert packets into the overlay at the closest node  
and the node closest to the destination forwards them appropriately.   
Since a RON is a fully connected graph, in the ideal situation a  
route will be only one hop.  If the network connection between two  
nodes begins to degrade, the RON will attempt to reroute the packet  
through another node.

There are a handful of concerns with this system.  Firstly, the fact  
that the RON is a fully connected graph means that each node has O 
(N^2) state.  This could cause significant scaling issues as the RON  
begins to see more users.  Secondly, since each RON node is  
constantly pinging every other RON node, the network overhead  
associated with the above state is likely to get out of hand  
quickly.  The paper places a theoretical 50 node size limit on the  
network, but as the traffic flowing through the RON grows, the  
individual nodes will become swamped unless more router nodes are  
added.  Moreover, if the internet were to shift over to RON  
completely, then a large number of existing routes would become  
obsolete and unnecessary.

One hop source routing takes a similar approach.  Instead of using a  
network of nodes, it places a series of routers on the internet.  If  
a routing failure occurs between the source and the destination, it  
instead tries routing through an intermediary node rather than using  
a complex overlay network.  The downside to this is that the nodes  
must be known to each source in advance.  A centralized discovery  
service provides a single point of failure, and a distributed  
protocol still requires some form of bootstrap host.  Furthermore,  
deployment of these nodes could get difficult.  Deploying them over a  
global testbed is trivial, but it would be difficult to get a  
corporation to deploy such a router.  Why should they pay for someone  
else's bandwidth?


-Oliver Kennedy

Computer science isn't about computers any more than astronomy is about
telescopes.
                 -- Dijkstra


- Oliver Kennedy

They laughed at Einstein.  They laughed at the Wright Brothers.  But  
they
also laughed at Bozo the Clown.
                 -- Carl Sagan

From pjk25@cornell.edu  Tue Mar 14 02:33:21 2006
Return-Path: <pjk25@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k2E7XLt18617
	for <egs+summary@cs.cornell.edu>; Tue, 14 Mar 2006 02:33:21 -0500 (EST)
Received: from [128.84.92.3] (wislpjk25.ece.cornell.edu [128.84.92.3])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k2E7XKhE013033
	(version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Tue, 14 Mar 2006 02:33:20 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Transfer-Encoding: 7bit
Message-Id: <FBD1705B-216D-4FDC-956A-B21D77860BD6@cornell.edu>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: egs+summary@cs.cornell.edu
From: Philip Kuryloski <pjk25@cornell.edu>
Subject: PAPER 13
Date: Tue, 14 Mar 2006 02:33:19 -0500
X-Mailer: Apple Mail (2.746.2)

PREDICTING INTERNET NETWORK DISTANCE WITH COORDINATES-BASED APPROACHES

We would like to be able to predict network distances (RTTs) for the  
internet without having to send probes each time.  One potential  
solution is to have a set of servers which maintains a simplified  
view of the network topology with RTTs.  A node can request data from  
the server and predict a desired RTT.  This however, produces a  
bottleneck at these servers, and additional latency is incurred  
requesting the data from the server.

A second solution is to use triangulated heuristic coordinates, or  
hop counts to some base stations nodes in the network.  The accuracy  
is often less than desired.

The authors propose Global Network Positioning (GNP) where there is a  
set of well known landmark nodes in the network, and the node in  
question trilaterates its position in a 3-D euclidian space relative  
to the landmark nodes.  The authors find that this scheme functions  
relatively well, although the slides do not mention how, due to  
inconsistencies in link latencies in the network, inconsistent  
localization from more than 3 landmark nodes is handled.


VIVALDI

Vivaldi, like GNP, places nodes in a virtual 3D (more precisely a 2D  
with directionless height, or 2.5D) space where euclidian distance  
between nodes is indicative of RTT.  The primary difference between  
the two is that Vivaldi does not use a set of landmark nodes like  
GNP.  Vivaldi implements a decentralized version of the following  
optimization:  Consider the network as a complete graph, with each  
edge as a spring.  The resting length of the spring is the true RTT  
between to vertices (nodes).  Nodes have been positioned optimally  
for a given virtual space when the sum of the potential energies of  
all springs is minimized (the springs will be stretched to satisfy  
low number of dimensions in the virtual space).  This in fact will  
naturally occur if the mass of nodes and springs is allowed to float  
freely in space and stabilize under it's own forces.  As potential  
energy is proportional to distortion squared for springs, this solves  
a least squares fitting problem.  It is possible to simulate this  
self-stabilization in both a centralized and decentralized manner,  
Vivaldi uses the latter.  Optimizations of the distributed simulation  
allow the system to react gracefully to the introduction of a large  
number of nodes into the network.

The authors find the system to be comparable in accuracy to GNP, a  
centralized scheme.  It is further refined by the use of a 2D +  
directionless height vector space, rather than a 3D space.


MERIDIAN

Meridian does not use a virtual coordinate system like GNP or  
Vivaldi, despite serving the same goals.  Each node keeps O(log N)  
state in the form of a set of peers organized by concentric rings of  
increasing network distance radii.  Much in the spirit of prefix  
routings ability to move exponentially closer to a target in an  
identifier space, Meridian jumps exponentially shorter distances  
towards a target.  Membership structure in these rings is loose,  
allowing a gossip protocol to be used to refresh and fill rings.

Simulation demonstrates that Meridian produces superior performance  
to GNP and Vivaldi.

From gp72@cornell.edu  Sat May 13 01:10:54 2006
Return-Path: <gp72@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe1.cs.cornell.edu (exchfe1.cs.cornell.edu [128.84.97.27])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.25) with ESMTP id k4D5Ar221808
	for <egs@unix.cucs>; Sat, 13 May 2006 01:10:53 -0400 (EDT)
Received: from iago.cs.cornell.edu ([128.84.96.10]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Sat, 13 May 2006 01:10:48 -0400
Received: from postoffice10.mail.cornell.edu ([132.236.56.14]) by iago.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Sat, 13 May 2006 01:10:45 -0400
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k4D5Aha8012433;
	Sat, 13 May 2006 01:10:43 -0400 (EDT)
Received: from 128.84.98.245
        by webmail.cornell.edu with HTTP;
        Sat, 13 May 2006 01:10:43 -0400 (EDT)
Message-ID: <1255.128.84.98.245.1147497043.squirrel@webmail.cornell.edu>
In-Reply-To: <1157.128.84.98.245.1147492519.squirrel@webmail.cornell.edu>
References: <2537.128.84.98.245.1147406914.squirrel@webmail.cornell.edu>     
     <3469.128.84.98.245.1147411785.squirrel@webmail.cornell.edu>   
    <4952.128.84.98.245.1147484280.squirrel@webmail.cornell.edu>
    <1157.128.84.98.245.1147492519.squirrel@webmail.cornell.edu>
Date: Sat, 13 May 2006 01:10:43 -0400 (EDT)
Subject: PAPER 13
From: "Gopal Parameswaran" <gp72@cornell.edu>
To: egs+summary@cs.cornell.edu
Cc: gp72@cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal
X-OriginalArrivalTime: 13 May 2006 05:10:45.0106 (UTC) FILETIME=[96C9A520:01C6764B]

Resilient Overlay Networks:
A Resilient Overlay Network (RON) is an architecture that allows
distributed applications across a network to detect and recover from path
outages and choose optimal paths for routing and redirect in case of path
outage and provide a path for communication when the underlying network
has problems in routing. RON nodes maintain and exchange information
regarding the existing routing and network status between other RON nodes
and uses this information to build forwarding table for routing packets
based on the path metrics that would include latency, packet loss and the
current available throughput. Each RON node periodically assimilates path
metrics from its neighbors and in case of outages transmits the
information to other RON nodes. Thus the protocol maintains performance
metrics on itself and also at the same time ensures that all RON nodes
have compete information unless they are isolated from the RON network.
RON networks however do not try to find the optimal routing path for
maximum throughput but instead strives to avoid paths that have a low
throughput. Thus the system is subject to selfish routing and would cause
fluctuations in network usages based on throughput. Instead if the system
were to also try to establish Nash Equilibrium in the network routes, it
would lead to a more stable routing protocol where a taxation policy based
on a utility function can be used to ensure that the system converges to
an optimal equilibrium.

One Hop Routing
In this paper the authors discuss a routing protocol where the optimal
route is discovered by sending packets across a set of randomly chosen
intermediate nodes and claims that performance matches those obtained by
using path monitoring performance metrics on the network. It is based on
detection of path failures that results in a corrective action based on
rerouting it through an alternate route. This result in polices being
formulated at the level of intermediary nodes and whenever a failed node
is detected also a resulting optimal path is also chosen and also by
selecting more than one random intermediate node the selection of bad
re-routes are avoided.