From asg46@cornell.edu  Tue Feb 14 09:52:37 2006
Return-Path: <asg46@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1EEqat18801
	for <egs+summary@cs.cornell.edu>; Tue, 14 Feb 2006 09:52:36 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1EEqZle026306
	for <egs+summary@cs.cornell.edu>; Tue, 14 Feb 2006 09:52:35 -0500 (EST)
Received: from 128.84.98.90
        by webmail.cornell.edu with HTTP;
        Tue, 14 Feb 2006 09:52:35 -0500 (EST)
Message-ID: <1938.128.84.98.90.1139928755.squirrel@webmail.cornell.edu>
Date: Tue, 14 Feb 2006 09:52:35 -0500 (EST)
Subject: paper 7 - search and replication
From: "Abhishek Santosh Gupta" <asg46@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

The authors discuss a query algorithm based on random walks which reduces
the amount of network traffic by two orders of magnitude (when compared to
flooding) along with with a square-root replication strategy.

They point out to the fact that when replication strategy is
uniform, the query distribution is irrelevant whereas when the query
distribution is uniform, all replication strategies are  equivalent.

LIMITATIONS OF FLOODING:

the load on each individual node is controlled by the value of TTL. But
choosing the appropriate value of TTL is not easy. Higher TTL results in
unnecessary burdens on the network traffic while a low TTL may result in
not finding an object although a copy existed. TTL must be set high as the
replication ratio of an object is generally unknown so as to prevent the
previous scenario.

In flooding, it is not possible to increase the number of nodes covered
without increasing the duplication in the search. Expanding ring technique
solves the TTL selection problem but does not address the message
duplication inherent in flooding.


RANDOM WALKS:

a node forwards a query message to a randomly chosen subset of its neighbors.
The node which follows the above policy is termed a "walker".
note: in  a system, only some walkers exist (other nodes follow flooding
policy) - experimental results indicate that 16 to 64 walkers give good
results.

in order to terminate the walks 2 methods have been suggested (TTL and
checking)
TTL is the same as in flooding.
"checking" means that a walker periodically checks with the original
requester before walking to the next node. The checking method also uses a
TTL which is used to prevent loops( note that TTL is not so significant in
this case as opposed to flooding).

The authors have not provided a discussion as to what the above periodic
interval must be especially in the case that no direct path exists between
current node and the original requester.( the message could be appended
with the path it has taken so far and this path must be retraced   to the
original requester- this check message must also be routed in a different
manner than ordinary   query messages -- no random subset to be chosen if
message appended with path)


a further improvement involves storing the state for a particular query -
i.e. the walker remembers the nodes to which it had earlier forwarded the
same message preventing it from sending the same message again to the same
node(since the node to which forwarding takes place is selected randomly)


INSIGHTS DEVELOPED INTO SCALABLE SEARCHING TECHNIQUES:

1) Adaptive termination is important (checking being a good example)
2) message duplication should be minimized

3) granularity of coverage should be small - this is based on the concept
that if an additional step  results in covering a large number of nodes,
some of these nodes covered would be extra - so   covering extra nodes
should be prevented.

REPLICATION THEORY:

the paper draws light on the fact that uniform and proportional
replication strategies yield exactly the same average search size
independent of the query distribution. these different replication
strategies differ only in the average search size of each object vs the
utilization rate for each object.

they suggest that Square-Root Replication is optimal based on a result
from another paper. Square-Root strategy results in  smaller variance in
average search size as compared to Proportional.


IMPLEMENTATION:

they derive the result that replicating proportional to number of sites
probed would result in a square root distribution.

path replication results in replication on the nodes that are
topologically on the same path.This would make the system fragile and
increase traffic on these nodes for popular objects. The authors suggest
"random replication" as a solution to the above situation.


 POTENTIAL FLAWS:

while deriving the result for square-root replication using a differential
equation, it assumes that it knows ri (replication ratio). Assuming that
the destination node knows ri, it would have to communicate with other
nodes for replication and this would involve another set of messages
especially in the case of random replication( the destination node would
also have to provide each node(that replicates i), with a value
of ri in the case that the destination node fails -- assuming non-pointer
based replication of objects. In case of pointer based replication - the
amount of "keep-alive" messages would increase as each node that
replicates i must check status of node hosting i periodically -thereby
increasing the
traffic in case of increased replication)


The discussion on the square-root replication schemes ignores the case
where a node fails and the protocol that must be carried out to replicate
the replicated objects stored at this failed node (especially during the
initial phase where equilibrium has not been achieved - i.e creation rate
is
not equal to deletion rate).

The authors assume that they can somehow control the deletion rate which
does not seem possible.


From lackhand@gmail.com  Wed Feb 15 16:32:49 2006
Return-Path: <lackhand@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.5 required=5.0 tests=AWL,HTML_00_10,HTML_MESSAGE 
	autolearn=no version=3.1.0
X-Spam-Level: 
Received: from zproxy.gmail.com (zproxy.gmail.com [64.233.162.195] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1FLWmt04099
	for <egs+summary@cs.cornell.edu>; Wed, 15 Feb 2006 16:32:49 -0500 (EST)
Received: by zproxy.gmail.com with SMTP id z31so24313nzd
        for <egs+summary@cs.cornell.edu>; Wed, 15 Feb 2006 13:32:48 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type;
        b=Ps3foNqhMjIESzlLMJhNra6dSIPxMDTLSdfQSxLNbXbebDEyNQT/O6SPH2NmLT5DK8/+4kOBcr4VnfjZdfVg5pscL8TbZJCBsf9juNwK3m5foZfdFxQQuDXvwhqChLc+k6IFUlsGft77h0y6zuu4wvG8C5UpwpTsWLHvA9iKL7w=
Received: by 10.36.65.19 with SMTP id n19mr230642nza;
        Wed, 15 Feb 2006 13:32:46 -0800 (PST)
Received: by 10.36.146.8 with HTTP; Wed, 15 Feb 2006 13:32:45 -0800 (PST)
Message-ID: <9aa7a97d0602151332pd6a83eama5bde998e0a69ef7@mail.gmail.com>
Date: Wed, 15 Feb 2006 16:32:45 -0500
From: Andrew Cunningham <lackhand@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative; 
	boundary="----=_Part_8512_21758432.1140039165873"

------=_Part_8512_21758432.1140039165873
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham
arc39
    _Geopeer_

    Unusual in that it focuses mostly on overall network topology with
relatively little attention paid to individual node state, Geopeer is a pee=
r
to peer system for dissemenating geographically relevant information throug=
h
zones (in a fashion analgous, but unrelated, to CAN's zone-splitting in
n-space; Geopeer uses triangle boundaries from Delauny triangulations of th=
e
system). We can determine to which node a point in space belongs by using
modified Voronoi cells that define the set of points which are closer to a
given node than to any other node, without crossing triangle boundaries.
Each node uses, for its identifier, some function of its location in
physical space, ensuring a non-evenly distributed covering of the space.
    For a system that relies on such detailed knowledge of geographical
proximity, Geopeer is vulnerable to several sorts of attacks. The first is
that the construction of Long Range Connections must be triggered, which
means that attempts at widening the network rely on putting the load onto
other nodes -- an attacker with more resources can spray the network with
long range requests, bringing it to its knees. The paper also addresses a
variety of specific applications/optimizations of the network; as a whole,
the picture of per-node state is so sparse as to be nearly useless.

    _PTrees_

    Unlike other systems, which search for a specific point in node-space
corresponding to a datum or node, the P Trees system allows ranges to be
efficiently searched for. It does this by maintaining parts of
semi-independant B+ trees at each peer. Each peer maintains only the left
most root-to-leaf path of its corresponding B+ tree, and relies on other
peers to complete its tree. It builds this over a ring of connected peers,
and achieves O(m+log[d](N)) search cost for range queries, for N the number
of peers in the system, m the number of peers in the range, and d the order
of the tree. It requires O(d*log[d](N)) space at each peer, and resilient t=
o
even large failures. It is a P2P application, not routing scheme, and thus
usable over a system such as Chord.
    The paper allows peers to function under relatively loose constraints t=
o
optimize performance for insertions/deletions; it then aggravates this
condition by permitting local inconsistency; it has a 'consumer' protocol
running in the background -- Stabilization Process -- which smoothes out th=
e
errors. The reason that this is relevant is that the logarithmic performanc=
e
of queries relies on reaching a new B+ tree level / Chord hop, which in an
inconsistent system is not guaranteed -- this reduces to the normal case of
searching around a ring. Finally, as presented, the entire system is
required per quality searched for, which means that the entire overhead -- =
a
separate B+ Tree portion -- must be maintained per file, so the performance
is actually multiplied by some M/N, for M the number of files in the system
-- precisely what this factor is is a question that this paper doesn't deal
with, but anecdotally is likely to be large. However, the system *is*
scalable in number of peers, as they do not assign peers roles as nodes of
the tree, but rather as leaves, and each maintains a full path to the root
(node, not peer).

    _Mercury_

    Another protocol for supporting multi-attribute range-based searches, i=
t
supports multiple attributes better, and performs load balancing. It does
this through sampling random nodes in a dynamic overlay network, enabling
logarithmic-hop routing and nigh-uniform load balancing. The test
application is a publish-subscribe mechanism over the Mercury protocol, and
the authors provide a range-based query language to higher layer
applications. It does this by partitioning the nodes in the system into
attribute hubs (logical groupings), and each node may participate in
several; it does not scale well in number of types of attributes. Then, eac=
h
node in the hub connects in a circular overlay, with each node responsible
for a contiguous range of attributes -- this is similar in many ways to CAN=
,
in that it is multidimensional and has each node responsible for some range
in a given direction, with neighbors in each direction along a single ring
dimension; it is different in that each ring is logically separate from the
others; we do not operate in n-space but instead n (mostly!) disjoint
1-spaces. Each hub then broadcasts the query to the correct sector along it=
s
ring.
    This paper is difficult to spot (other than security) holes in. The bes=
t
performance is achieved by using the most selective attribute of the query;
however, there is clearly a tradeoff here, as "most selective" doesn't mean
most specific -- there is a commonality to consider as well. The number of
queries for "Linux", though it is a precise string which must be matched
exactly, is clearly not as precise as looking for Linux news from, say, a
single date. The method of load balancing, however, explicitly creates an
opportunity for sybil attacks, as we invite lightly loaded nodes to share
our workload -- thus giving them an opportunity to seize control of any
segment they please by spraying that area with requests while executing a
sybil attack.

------=_Part_8512_21758432.1140039165873
Content-Type: text/html; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Andrew Cunningham<br>arc39<br>&nbsp;&nbsp; &nbsp;_Geopeer_<br>&nbsp;&nbsp; =
&nbsp;<br>&nbsp;&nbsp; &nbsp;Unusual in that it focuses mostly on overall n=
etwork topology with relatively little attention paid to individual node st=
ate, Geopeer is a peer to peer system for dissemenating geographically rele=
vant information through zones (in a fashion analgous, but unrelated, to CA=
N's zone-splitting in n-space; Geopeer uses triangle boundaries from Delaun=
y triangulations of the system). We can determine to which node a point in =
space belongs by using modified Voronoi cells that define the set of points=
 which are closer to a given node than to any other node, without crossing =
triangle boundaries. Each node uses, for its identifier, some function of i=
ts location in physical space, ensuring a non-evenly distributed covering o=
f the space.
<br>&nbsp;&nbsp; &nbsp;For a system that relies on such detailed knowledge =
of geographical proximity, Geopeer is vulnerable to several sorts of attack=
s. The first is that the construction of Long Range Connections must be tri=
ggered, which means that attempts at widening the network rely on putting t=
he load onto other nodes -- an attacker with more resources can spray the n=
etwork with long range requests, bringing it to its knees. The paper also a=
ddresses a variety of specific applications/optimizations of the network; a=
s a whole, the picture of per-node state is so sparse as to be nearly usele=
ss.
<br>&nbsp;&nbsp; &nbsp;<br>&nbsp;&nbsp; &nbsp;_PTrees_<br>&nbsp;&nbsp; &nbs=
p;<br>&nbsp;&nbsp; &nbsp;Unlike other systems, which search for a specific =
point in node-space corresponding to a datum or node, the P Trees system al=
lows ranges to be efficiently searched for. It does this by maintaining par=
ts of semi-independant B+ trees at each peer. Each peer maintains only the =
left most root-to-leaf path of its corresponding B+ tree, and relies on oth=
er peers to complete its tree. It builds this over a ring of connected peer=
s, and achieves O(m+log[d](N)) search cost for range queries, for N the num=
ber of peers in the system, m the number of peers in the range, and d the o=
rder of the tree. It requires O(d*log[d](N)) space at each peer, and resili=
ent to even large failures. It is a P2P application, not routing scheme, an=
d thus usable over a system such as Chord.
<br>&nbsp;&nbsp; &nbsp;The paper allows peers to function under relatively =
loose constraints to optimize performance for insertions/deletions; it then=
 aggravates this condition by permitting local inconsistency; it has a 'con=
sumer' protocol running in the background -- Stabilization Process -- which=
 smoothes out the errors. The reason that this is relevant is that the loga=
rithmic performance of queries relies on reaching a new B+ tree level / Cho=
rd hop, which in an inconsistent system is not guaranteed -- this reduces t=
o the normal case of searching around a ring. Finally, as presented, the en=
tire system is required per quality searched for, which means that the enti=
re overhead -- a separate B+ Tree portion -- must be maintained per file, s=
o the performance is actually multiplied by some M/N, for M the number of f=
iles in the system -- precisely what this factor is is a question that this=
 paper doesn't deal with, but anecdotally is likely to be large. However, t=
he system *is* scalable in number of peers, as they do not assign peers rol=
es as nodes of the tree, but rather as leaves, and each maintains a full pa=
th to the root (node, not peer).
<br>&nbsp;&nbsp; &nbsp;<br>&nbsp;&nbsp; &nbsp;_Mercury_<br>&nbsp;&nbsp; &nb=
sp;<br>&nbsp;&nbsp; &nbsp;Another protocol for supporting multi-attribute r=
ange-based searches, it supports multiple attributes better, and performs l=
oad balancing. It does this through sampling random nodes in a dynamic over=
lay network, enabling logarithmic-hop routing and nigh-uniform load balanci=
ng. The test application is a publish-subscribe mechanism over the Mercury =
protocol, and the authors provide a range-based query language to higher la=
yer applications. It does this by partitioning the nodes in the system into=
 attribute hubs (logical groupings), and each node may participate in sever=
al; it does not scale well in number of types of attributes. Then, each nod=
e in the hub connects in a circular overlay, with each node responsible for=
 a contiguous range of attributes -- this is similar in many ways to CAN, i=
n that it is multidimensional and has each node responsible for some range =
in a given direction, with neighbors in each direction along a single ring =
dimension; it is different in that each ring is logically separate from the=
 others; we do not operate in n-space but instead n (mostly!) disjoint 1-sp=
aces. Each hub then broadcasts the query to the correct sector along its ri=
ng.
<br>&nbsp;&nbsp; &nbsp;This paper is difficult to spot (other than security=
) holes in. The best performance is achieved by using the most selective at=
tribute of the query; however, there is clearly a tradeoff here, as &quot;m=
ost selective&quot; doesn't mean most specific -- there is a commonality to=
 consider as well. The number of queries for &quot;Linux&quot;, though it i=
s a precise string which must be matched exactly, is clearly not as precise=
 as looking for Linux news from, say, a single date. The method of load bal=
ancing, however, explicitly creates an opportunity for sybil attacks, as we=
 invite lightly loaded nodes to share our workload -- thus giving them an o=
pportunity to seize control of any segment they please by spraying that are=
a with requests while executing a sybil attack.
<br>&nbsp;&nbsp;&nbsp; <br>

------=_Part_8512_21758432.1140039165873--

From kash@cs.cornell.edu  Wed Feb 15 17:56:18 2006
Return-Path: <kash@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-1.3 required=5.0 tests=ALL_TRUSTED,AWL,HTML_MESSAGE 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from exchfe1.cs.cornell.edu (exchfenlb-1.cs.cornell.edu [128.84.97.33])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1FMuHt26958
	for <egs@unix.cucs>; Wed, 15 Feb 2006 17:56:17 -0500 (EST)
Received: from EXCHVS2.cs.cornell.edu ([128.84.97.24]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Wed, 15 Feb 2006 17:56:17 -0500
Content-class: urn:content-classes:message
MIME-Version: 1.0
X-Security: message sanitized on sundial.cs.cornell.edu
	See http://www.impsec.org/email-tools/sanitizer-intro.html
	for details. $Revision: 1.148 $Date: 2004-12-19 11:59:17-08 
X-Security: The postmaster has not enabled quarantine of poisoned messages.
Content-Type: multipart/alternative;
	boundary="----_=_NextPart_001_01C63283.072C1EB0"
Subject: PAPER 7
X-MimeOLE: Produced By Microsoft Exchange V6.5
Date: Wed, 15 Feb 2006 17:56:15 -0500
Message-ID: <2EE48095D8C21643B0B70EC95F9BFBAF011011BA@EXCHVS1.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: PAPER 7
Thread-Index: AcYygwaNAJiTsXA9Tc6ZIs8KRzCkWw==
From: "Ian Kash" <kash@cs.cornell.edu>
To: <egs+summary@cs.cornell.edu>
X-OriginalArrivalTime: 15 Feb 2006 22:56:17.0264 (UTC) FILETIME=[07628700:01C63283]

This is a multi-part message in MIME format.

------_=_NextPart_001_01C63283.072C1EB0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

GeoPeer builds a structured P2P network based on geography.  The

lowest level routing infrastructure is a Delaunay triangulation which

ensures that a peer's neighbors are nearby nodes.  This allows it to

handle location based queries or multicasts restricted to specific

locations.  To efficiently route long range messages, the paper

explores several different techniques for establishing long range

contacts.  The methods for maintaining the Delaunay triangulation do

not seem to be robust against churn.  They essentially require nodes

to reach agreement about the correct triangulation.  If all the nodes

agree about who is present and the nodes are in general position, this

is unique so not an issue.  However when things are not in general

position and / or several nodes make changes in the same area that

are seen at different times by different nodes it may be very messy to

get the triangulation established.  Also, this system is vulnerable to

coordinated failures.  If an ISP loses connectivity, many or all of

the people in a particular geographic region may disappear.  As a

result they may also all attempt to rejoin at about the same time.

Finally, the 24 hour cycle means that the population of an area will

vary greatly over the course of the day while systems that map people

randomly may have a varying number of users but each region should

stay proportional.

=20

P-trees introduces a distributed version of a B+tree to allow for

range queries.  It achieves this by having each node store an

incomplete tree and relying on requests to peers to retrieve the

portions needed to complete a query.  Mercury is designed to support

range queries on multiple attributes.  It does this by breaking the

system into logical "attribute hubs", with each hub handling range

queries on a single attribute.  The nodes in that hub can then answer

the multi-attribute query by filtering their results.  Neither of

these papers appeared to present a compelling application for this

technology.  The example given by Ptrees seems like it could be done

as easily with discretizing the value range.  For the example Mercury

uses, it seems a bad idea to build this in a P2P fashion at all

because of the possibility of cheating.  Additionally, Mercury's

solution seems highly inefficient.  They not that it does not scale

well in the number of attributes because each attribute hub has to

maintain all the records and queries are done by searching through all

the matches to filter on the other attribute.  If this is only

workable on small sets of attributes, why not just find an efficient

way to discretize that small set?


------_=_NextPart_001_01C63283.072C1EB0
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns=3D"http://www.w3.org/TR/REC-html40">

<head>
<DEFANGED_META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<DEFANGED_meta name=3DGenerator content=3D"Microsoft Word 11 (filtered medium)">
 <!-- <DEFANGED_STYLE>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:Arial;
	color:windowtext;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
 --> </DEFANGED_STYLE>

</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>GeoPeer builds a structured P2P network based on =
geography.&nbsp;
The<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>lowest level routing infrastructure is a Delaunay
triangulation which<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>ensures that a peer's neighbors are nearby =
nodes.&nbsp; This
allows it to<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>handle location based queries or multicasts =
restricted to
specific<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>locations.&nbsp; To efficiently route long range =
messages, the
paper<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>explores several different techniques for =
establishing long
range<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>contacts.&nbsp; The methods for maintaining the =
Delaunay triangulation
do<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>not seem to be robust against churn.&nbsp; They =
essentially
require nodes<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>to reach agreement about the correct =
triangulation.&nbsp; If all
the nodes<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>agree about who is present and the nodes are in =
general
position, this<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>is unique so not an issue.&nbsp; However when things =
are not in
general<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>position and / or several nodes make changes in the =
same
area that<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>are seen at different times by different nodes it may =
be
very messy to<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>get the triangulation established.&nbsp; Also, this =
system is
vulnerable to<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>coordinated failures.&nbsp; If an ISP loses =
connectivity, many or
all of<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>the people in a particular geographic region may =
disappear.&nbsp;
As a<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>result they may also all attempt to rejoin at about =
the same
time.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Finally, the 24 hour cycle means that the population =
of an
area will<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>vary greatly over the course of the day while systems =
that
map people<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>randomly may have a varying number of users but each =
region
should<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>stay proportional.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>P-trees introduces a distributed version of a B+tree =
to
allow for<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>range queries.&nbsp; It achieves this by having each =
node store
an<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>incomplete tree and relying on requests to peers to =
retrieve
the<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>portions needed to complete a query.&nbsp; Mercury is =
designed to
support<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>range queries on multiple attributes.&nbsp; It does =
this by
breaking the<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>system into logical &quot;attribute hubs&quot;, with =
each
hub handling range<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>queries on a single attribute.&nbsp; The nodes in =
that hub can
then answer<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>the multi-attribute query by filtering their =
results.&nbsp;
Neither of<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>these papers appeared to present a compelling =
application
for this<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>technology.&nbsp; The example given by Ptrees seems =
like it could
be done<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>as easily with discretizing the value range.&nbsp; =
For the
example Mercury<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>uses, it seems a bad idea to build this in a P2P =
fashion at
all<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>because of the possibility of cheating.&nbsp; =
Additionally,
Mercury's<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>solution seems highly inefficient.&nbsp; They not =
that it does
not scale<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>well in the number of attributes because each =
attribute hub
has to<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>maintain all the records and queries are done by =
searching
through all<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>the matches to filter on the other attribute.&nbsp; =
If this is
only<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>workable on small sets of attributes, why not just =
find an
efficient<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>way to discretize that small =
set?<o:p></o:p></span></font></p>

</div>

</body>

</html>

------_=_NextPart_001_01C63283.072C1EB0--

From niranjan.sivakumar@gmail.com  Thu Feb 16 00:22:15 2006
Return-Path: <niranjan.sivakumar@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from xproxy.gmail.com (xproxy.gmail.com [66.249.82.205] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1G5MFt02784
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 00:22:15 -0500 (EST)
Received: by xproxy.gmail.com with SMTP id s6so60692wxc
        for <egs+summary@cs.cornell.edu>; Wed, 15 Feb 2006 21:22:15 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:reply-to:to:subject:date:user-agent:organization:mime-version:content-type:content-transfer-encoding:content-disposition:message-id:from;
        b=LC5hcj4KvanUtFNV9DIKQ1niRp7iYgL6iW0UmEcu/DZXP4uRAYSBI1M85lQv0lBY+qJtyA3Gc9gz52CjOzEg+lW8q4YmVysohRXX57mAKN60DkQyoJ8DZXafHIgmb2eOg1+jX3rKD5HuDx/LDFe8c3ePbXH3h9uew3iy+h5fz70=
Received: by 10.70.34.13 with SMTP id h13mr595047wxh;
        Wed, 15 Feb 2006 21:22:14 -0800 (PST)
Received: from ?192.168.0.101? ( [69.207.63.116])
        by mx.gmail.com with ESMTP id h11sm415032wxd.2006.02.15.21.22.13;
        Wed, 15 Feb 2006 21:22:14 -0800 (PST)
Reply-To: ns253@cornell.edu
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
Date: Thu, 16 Feb 2006 00:22:10 -0500
User-Agent: KMail/1.9
Organization: Cornell University
MIME-Version: 1.0
Content-Type: text/plain;
  charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200602160022.11755.ns253@cornell.edu>
From: Niranjan Sivakumar <niranjan.sivakumar@gmail.com>

Niranjan Sivakumar

GeoPeer:  A Location-Aware Peer-to-Peer System
Querying Peer-to-Peer Networks Using P-Trees
Mercury:  Supporting Scalable Multi-Attribute Range Queries

GeoPeer presents a slightly different routing concept than we have seen in 
previous papers.  Rather than having randomly placing nodes in the network, 
GeoPeer purposefully places nodes based on their geographic location in order 
to aid range querying based on location.  The system relies on Delaunay 
triangulation for creating local neighbor groups and routing in them.  The 
system also maintains long range contacts in order to minimize network 
diameter and avoid some of the issues, such as latency, that would be faced 
if neighbor groups grew to be very large and the whole network was based on 
only Delaunay triangulation.

The P-Trees approach presents a method to facilitate queries that are more 
semantically rich than simple lookups based on a key that is mapped to an 
object.  This system is designed to be adapted to a system like Chord.  The 
P-Trees approach is based on B+ trees.  Nodes in the system essentially hold 
parts of the tree, and rely on other nodes in the system to complete the 
tree.  A stabilization process is presented to deal with nodes that leave and 
join the system.  The system is able to continue to work even in the face of 
changes to the network, but with some detriments to performance.

Mercury is a system for multi-attribute range queries.  Mercury is based on 
creating routing hubs that are used to handle different attributes that may 
be present in the data that is stored in the system.  Mercury also arranges 
routing hubs into a ring, but like GeoPeer, this cannot be a random ring and 
must be ordered in a way to have data organized contiguously to facilitate 
the range searching.  The system maintains a mechanism to poll sample nodes 
to get an idea of different system metrics and another mechanism for 
load-balancing.  The load-balancing is particularly important in a system 
like Mercury because there may not be a uniform distribution of objects in 
this system.  Another feature offered by the system is for caching objects at 
long-distance nodes.

One of the issues that may arise with both GeoPeer and the P-Trees system 
deals with high churn systems.  GeoPeer seems to have a fairly complicated 
procedure to deal with nodes joining or leaving the network and many messages 
seem to be passed between the nodes in a given locality.  This could become 
an issue if there are rapid joins and nodes in a given neighborhood.  The 
P-Trees system has mechanisms to deal with nodes leaving and joining the 
system, but it seems that performance may be noticeably degraded in a system 
where there are a lot of inconsistencies.  One issue seen in Mercury deals 
with scalability.  In the example of the game provided in the paper, the 
authors only seem to be confident in the scalability of the system up to 
"thousands" of nodes if game specific caching was implemented.  However, it 
is not unimaginable that popular applications such as games may incorporate 
more information to be shared, stricter latency requirements, and many more 
participants, perhaps in the millions.

From tc99@cornell.edu  Thu Feb 16 01:13:33 2006
Return-Path: <tc99@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1G6DXt15951
	for <egs@cs.cornell.edu>; Thu, 16 Feb 2006 01:13:33 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1G6DWRj008397
	for <egs@cs.cornell.edu>; Thu, 16 Feb 2006 01:13:32 -0500 (EST)
Received: from 24.59.114.243
        by webmail.cornell.edu with HTTP;
        Thu, 16 Feb 2006 01:13:32 -0500 (EST)
Message-ID: <1532.24.59.114.243.1140070412.squirrel@webmail.cornell.edu>
Date: Thu, 16 Feb 2006 01:13:32 -0500 (EST)
Subject: paper 7
From: "Theodore Ming Shiuan Chao" <tc99@cornell.edu>
To: egs@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

Mercury and P-Trees both address the issue of DHT's only allowing equality
lookups  on objects because the nature of consistent hashing makes it
impossible to resolve  range queries.
P-Trees are made of nodes that store semi-independent B+-Trees of only the
 left-most root-to-leaf path, with the current node viewing its value as
the  smallest for the B+-Tree that it maintains (since the values are
organized in a  looping ring, any arbitrary value can be set as the
smallest). Routing range  queries are done by forwarding to the node with
the largest value less than the  lower bound in the partial B+-Tree
maintained at the current node. Once that node  becomes at the lowest
level of the B+-Tree, routing becomes a straight-forward  traversal of the
ring until it surpasses the upper bound of the query.
If the tree is consistent, then the search for the lower bound takes O(log
N)  steps.
Mercury also maintains the values of attributes in a ring-based topology.
However,  whereas the nodes in P-Trees each is a unique value, the nodes
in a Mercury hub are  responsible for contiguous ranges of values. Each
node stores several long-distance  pointers to nodes responsible for other
ranges, and several pointers to nodes in  other hubs (of other
attributes), and routing is just done on a single hub (picked  using some
criteria) to closest clockwise distance to the lower bound. To detect  and
deal with potential load imbalances within a hub, Mercury uses a (log n)
TTL  random sampling and moves nodes from low-load regions to high-load
regions when it  finds imbalances.

Given the horrendous performance of the ValueLink of non-uniform ranges,
it is a  wonder they didn't put more emphasis on harmonic distribution of
the long-distance  links over the number of links (NodeLink) instead.
Further more, they maintain that  the random sampling will converge to a
distribution with value widths inversely  proportional to the
popularity... but offer no analysis of the time required for  it, or the
cost expectancy of it.
In P-Trees, the values are stored in the nodes themselves. While it allows
for  range queries, the implementation outlined in the paper does not
allow for a  mapping of objects onto nodes.
Furthermore, neither of the papers investigates what would happen if an
object's  value changes. In P-Trees, it seems like the node would have to
be removed and  reinserted every time. Since the changing of the value
would require rebuilding the  affected partial B+-trees, there is probably
no other way to handle it. In Mercury,  the handling of this issue seems
more robust as objects are merely placed along a  ring. All changing the
value would do would be to change the node responsible for  the object,
and possibly require a redistribution of load that would be handled by 
the random sampling.

From sh366@cornell.edu  Thu Feb 16 02:05:16 2006
Return-Path: <sh366@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1G75Ft00815
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 02:05:15 -0500 (EST)
Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1G75FLY023153
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 02:05:16 -0500 (EST)
Message-ID: <257471798.1140073514584.JavaMail.webber@orpheus3.dataserver.cornell.edu>
Date: Thu, 16 Feb 2006 02:05:14 -0500 (EST)
From: Huang Shiang-Jia <sh366@cornell.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Mailer: uPortal WEB email client 3.0
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1G75Ft00815

The three papers present routing protocols for range queries in peer-to-peer systems.
* Previous works such as Pastry/Chord/Tapestry support equality queries. The hash function to distribute data uniformly in these systems destroys the order in the key value space, making them impractical for processing range queries. Elimination of the hash function makes load balance in the works of range queries particular a concern.
* Range query is common in relational database. These papers adapt several concepts and solutions in database systems.
* GeoPeer is particularly well-suited to support location-aware applications. P-tree supports range queries in addition to equality queries, and Mercury furthermore supports multi-attribute range queries. The search cost is between O(log(N)) and O(log^2(N)) in GeoPeer, O(log(N))/O(m+log(N)) in P-tree and O(log^2(N)/k) in Mercury. Each of them is summarized below.

<GeoPeer>
* GeoPeer is a peer-to-peer system that is particularly well-suited to support location-aware applications such as geographically-scoped multicasts/queries. In GeoPeer nodes self-organize into a Delaunay triangulation augmented with carefully selected long range contacts to reduce network diameter. Each GeoPeer node is responsible for (communicate on behalf of) a region of points.
* To create and maintain Delaunay triangulations in the face of nodes joining and leaving the system, nodes exchange specific messages with their geographically neighbors in different steps accordingly. The space division mechanism (nodes determine the circumcircle and perpendicular bisectors for each of its Delaunay triangles) ensures that any key is held by exactly one existing node.
* GeoPeer uses the greedy algorithm, rather than compass, randomized compass or Voronoi, to route messages because of its simplicity. Its problem is large network diameter (latency). The use of long range contacts (LRC) can help reduce the latency: all Delaunay neighbors and LRCs are eligible to be the next hop, and the closest one to the destination is chosen for routing.
* The LRCs proposed in GeoPeer include Hop Level, Hit Count Balancing, Small-World mechanisms and an eCAN-like mechanism for comparison. In the hop level mechanism, nodes maintain levels of LRCs; no more than b-1 hops are allowed in each level without creating a LRC in the next upper level. The hit count balancing mechanism takes into account the popularity of the LRCs; LRCs with very high references are split while LRCs with very low references are discarded. In the small-world mechanism, the space is divided into n squares, and a random process is run to outcome centers of the squares as LRCs. The eCAN-like mechanism repeatedly divides the space into four squares and keeps LRCs to two of them.
* The experimental results show that all the LRC schemes achieve between log(N) and log^2(N) query hops even in the presence of unbalanced node distribution. Among the LRC schemes, hop level mechanism presents the best performance with a small number of LRCs as the network size increases. As it precludes the need of a priori configuration, hop level mechanism is a preferable choice when the environment is unknown.

<P-tree>
* P-tree is a peer-to-peer index structure that supports range queries in addition to equality queries. In consistent states, a P-tree provides O(logN) and O(m+logN) search cost for equality queries and range queries, where m is the number of peers in the selected range and d, the order of P-tree, serves as the base of the log. P-tree requires O(d*logN) space at each peer.
* In P-tree, the search key values (attributes of data items) are regarded as a ring. The underlying ring structure is maintained by a successor-maintenance algorithm (Chord in the implementation of this paper).
* The key idea behind P-tree is to maintain parts of semi-independent B+-trees at each peer; that is, each peer maintains the left-most root-to-leaf path of its B+-tree and relies on a subset of other peers to complete its tree. The sub-trees have overlapping ranges; the same data values are indexed by multiple sub-trees. In this way peers don’t need global coordination and be notified for every insertion/deletion.
* Search in P-tree takes as input the lower-bound (lb) and upper-bound (ub) of the range query as well as the source peer originating the query. Each peer, when routing the query, forwards the message to the farthest pointer that doesn’t overshoot lb. Once reaching the lowest level of the P-tree, it traverses the successor list until the value of a peer exceeds ub. Any node along the route storing a value which falls in the desired range returns its value to the source peer. In a consistent state, the search procedure goes down one level of the P-tree every time the message is forwarded, so O(logN) query steps are guaranteed.
* The consistency of the P-tree nodes is maintained by two co-operating processes Ping and Stabilization: Ping Process detects the inconsistencies of peer deletions/failures as well as the entries with respect to the coverage and separation properties, and Stabilization Process repairs them (in a bottom-to-top and left-to-right manner). Peer insertions are also handled based on the two processes. There is a tradeoff in deciding the frequency of Ping/Stabilization Processes (search cost vs. maintenance overhead for peer insertions/deletions). From the experimental results it claims that it’s impractical for P-tree to have high orders.

<Mercury>
* Mercury is a protocol that supports scalable multi-attribute range queries and explicit load balancing. It achieves logarithmic-hop routing and near-uniform load balancing.
* In Mercury, each query is a conjunction of range in one or more typed attributes. Mercury handles multi-attribute queries by creating a routing/attribute hub for each attribute. It organizes each hub into a circular overlay of nodes with each node responsible for a continuous range of attribute values. A data record is sent to all hubs corresponding to the attributes of the data record. Queries are passed (using query selectivity) to exactly one routing hub corresponding to the attributes that are queried, and new data items are sent to all hubs for which it has an associate attribute.
* Each Mercury node maintains a small number of predecessor and successor links within a hub and links to each of the other hubs. Besides, k long-distant links are selected according to a histogram maintenance scheme based on the sampling of nodes distribution (random walks over the system to determine the ranges the nodes are responsible for). It then routes more efficiently by choosing the link which takes the message closest to the destination.
* From the experimental results it claims that caching (LRU replacement or direct-mapped) is an important optimization that Mercury can easily incorporate into its basic protocol. In addition, Mercury achieves load balance by moving around nodes and changing their responsibilities according to the loads.
* This paper also presents a distributed object management framework based on Mercury. By providing a range-based query language, Mercury allows applications to express their publications and subscriptions to objects lookups/updates flexibly. The framework can be used as a building block for distributed applications, in which instances are interested in a subset of the entire application state.

From pjk25@cornell.edu  Thu Feb 16 02:22:35 2006
Return-Path: <pjk25@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1G7MZt04154
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 02:22:35 -0500 (EST)
Received: from [10.0.1.2] (cpe-69-207-41-159.twcny.res.rr.com [69.207.41.159])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1G7MZBO010977
	(version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 02:22:35 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v746.2)
Content-Transfer-Encoding: 7bit
Message-Id: <3149C7F1-1A96-454B-8DFC-91E7192E7207@cornell.edu>
Content-Type: text/plain; charset=US-ASCII; delsp=yes; format=flowed
To: egs+summary@cs.cornell.edu
From: Philip Kuryloski <pjk25@cornell.edu>
Subject: PAPER 7
Date: Thu, 16 Feb 2006 02:25:16 -0500
X-Mailer: Apple Mail (2.746.2)

GEOPEER
GeoPeer is Peer-to-Peer overlay network designed to support location  
aware operations.  Rather than randomly hashing node IP addresses  
into an identifier space, node locations in GeoPeer are the node's  
identifier.  A number of long range contacts keeps the diameter of  
the GeoPeer network low.  Objects to be stored in the network are  
hashed to virtual positions in the physical space.  Routing as well  
as partitioning the object identifier space for ownership in GeoPeer  
is based on Delaunay triangulation.  Directly correlating location in  
the GeoPeer identifier space to the underlying network allows at  
least two interesting services, including geographically scoped  
multicast, and geographically scoped queries.

GeoPeer has several mechanisms  to reduce the diameter of the  
network, all using long range connections (LRCs).  The first is what  
the authors call hop level, where by which a maximum number of hops  
per query is arbitrarily set.  If a node cannot reach an object, it  
forms a LRC with the node furthest towards the target and retries the  
query.  Another mechanism is called hit count balancing.  LRCs are  
evenly distributed throughout the network.  Periodically, often used  
LRCs are split and less used are removed.  The final mechanism,  
called small world, divides the  identifier space into a number of  
smaller regular segments and chooses a random subset of centers of  
these segments as LRCs.

Unfortunately GeoPeer produces a range of O(log N) to O(log^2 N)  
lookups depending on the LRC scheme, which is not notably good  
compared to other DHT schemes.


P-TREES
The primary purpose of the P-Tree structure is to allow structured  
queries such as range queriesand equality queries, which cannot be  
achieved in many DHT schemes due to the random hashing of object IDs  
into the identifier space.  The P-Tree is in essence a distributed  
set of semi-independent partial B-Trees.  Partial overlap between sub- 
trees is allowed to allow independent tree changes and keep  
coordination traffic low.  A node stores keys which fall on it's  
leftmost root-to-leaf path.  This results in a predecessor/successor  
situation where the Chord algorithm is used to maintain the leftmost  
root-to-leaf path criteria.  Periodic stabilization and ping  
processes are used to detect node failures and keep the tree  
structure even to maintain O(log N) queries.  There is a tradeoff in  
this structure between the width of a tree at each node and the depth  
of the entire P-tree, trading off query latency for node insertion/ 
deletion cost.  Dynamically optimizing aspect ratio of the P-tree is  
an issue not addressed by the authors.  P-Trees do, however, allow  
for efficient range queries, which can similarly be achieved by  
GeoPeer with geographically scoped queries.

MERCURY
Mercury differs from GeoPeer and P-Trees in that is supports multi- 
attribute range queries.  It does however yield O(log N) lookup times  
and does not hash objects randomly across the identifier space.   
Therefore, it also requires a periodic load balancing system to  
maintain an even distribution of objects across nodes.  Mercury  
routing functions as follows:  Nodes are organized into attribute  
hubs for an attribute, with hubs as orthogonal in a virtual space  
with one dimension in that space for each attribute.  Nodes within a  
hub are arranged in a ring structure.  The first hop of a query is  
towards the appropriate hub, with further matching taking place  
around the ring of nodes associated with that hub.  Within these  
rings, there are long distance links which reduce the object search  
times.

Nodes within this ring keep a number of successor and predecessor  
links, allowing maintenance of the ring when nodes join or leave the  
network.  Also, a random walk scheme is used to periodically estimate  
the state of the network and potentially rebalance the hubs.

From victoria@cs.hmc.edu  Thu Feb 16 03:35:43 2006
Return-Path: <victoria@cs.hmc.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from turing.cs.hmc.edu (turing.cs.hmc.edu [134.173.42.99])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1G8Zht21248
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 03:35:43 -0500 (EST)
Received: by turing.cs.hmc.edu (Postfix, from userid 34382)
	id 1B74C53256; Thu, 16 Feb 2006 00:35:37 -0800 (PST)
Date: Thu, 16 Feb 2006 00:35:36 -0800
From: Victoria Krafft <vmk@cs.hmc.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
Message-ID: <20060216083536.GA10232@cs.hmc.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.4.2.1i

For this Thursday, the papers focus on p2p networks which can 
handle range queries, such as "All the objects with a value 
between 7 and 16".  So far, the networks we've looked have 
generally used a hash function to assign a key to each object.  
This makes range queries expensive, since the hash function will 
scatter similar objects across the network, resulting in a lot 
of network traffic.

The first system, GeoPeer, is actually designed to be a system 
which can support location-aware services.  In many ways, this 
is a specialized version or range queries, where you have two 
variables, latitude and longitude.  In this system, the basic 
routing structure is a planar Delaunay triangulation, with 
long-range links to reduce distances.  This gives O(1) 
connections from each node, and the experimental results suggest 
that routing a message through the network is O(log N) to 
O(log^2 N).  Load balancing is managed to some extent by 
duplicating heavily used long range links to share the workload.

One area which might be a problem is the process of re-building 
the Delaunay structure after a node fails.  The amount of work 
needed to fix the structure could become too expensive in a 
network with high churn. 

The second system, P-Trees, are somewhat more complicated.  The 
search key values put into a ring structure, with the highest 
value wrapping around to the lowest.  Then, each node stores a 
partial B+-tree, the root-to-leaf path of its search key.  This 
results in O(log N) search when the system is in a consistent 
state, and only requires O(d*log N) storage at each node.  In 
order to maintain the network, nodes periodically ping each 
other to see who is still alive.  Also, a Stabilization Process 
is run periodically to fix the tree structure at a given node.

One potential problem here is that when the network is in an 
unstable state because of nodes joining and leaving, query 
messages take much longer.  If network churn gets too high, 
then many queries will take longer, and potentially need to be 
resent, further slowing the already stressed network.

The third system is Mercury, which supports multiple attributes 
and uses a ring-based topology, with one ring hub for each 
attribute.  It cannot use a hash function to distribute the 
objects, so instead it just stores the objects in order, and 
runs a load balancing algorithm to prevent nodes from being 
overloaded.  The message routing algorithm provides O(log^2 
(N/k)), where k is the number of links each node has to other 
nodes.  The authors also show how Mercury can be used to create 
a publish/subscribe system.

The load balancing algorithm which they propose is that nodes 
which are heavily loaded find nodes with a light load, and ask 
them to move to take some of the load off the heavily loaded 
node.  If the network is experiencing heavy enough churn, then 
nodes will constantly be moving to balance the load, which will 
only increase the churn on the network.


-- Victoria Krafft

From nsg7@cornell.edu  Thu Feb 16 09:32:47 2006
Return-Path: <nsg7@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1GEWlt04057
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 09:32:47 -0500 (EST)
Received: from webmail.cornell.edu (hermes21.mail.cornell.edu [132.236.56.20])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1GEWjnp028695
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 09:32:46 -0500 (EST)
Received: from 132.236.227.119
        by webmail.cornell.edu with HTTP;
        Thu, 16 Feb 2006 09:32:46 -0500 (EST)
Message-ID: <1752.132.236.227.119.1140100366.squirrel@webmail.cornell.edu>
Date: Thu, 16 Feb 2006 09:32:46 -0500 (EST)
Subject: PAPER 7
From: "Nicholas S Gerner" <nsg7@cornell.edu>
To: egs+summary@cs.cornell.edu
User-Agent: SquirrelMail/1.4.5
MIME-Version: 1.0
Content-Type: text/plain;charset=iso-8859-1
Content-Transfer-Encoding: 8bit
X-Priority: 3 (Normal)
Importance: Normal

"Mercury...", "P-Trees..." and "GeoPeer..." all provide a range query
extension to the routing mechanisms we've seen provided by DHTs. 
"Mercury..." builds a set of ring overlays very similar to the work of
other DHTs we've seen so far.  However, the identifier space in Mercury is
not distributed by a hash function.  Instead values are laid on the ring
in contigous ranges (e.g. ids 2,3,4,... correspond to sequential integers
2,3,4,... in some attribute-value space).  Routing occurs by first
choosing an appropriate ring for an attribute range (each ring in Mercury
corresponds to one attribute) and then routing along that ring to the
first value in the range using ring mechanisms we've seen already.  The
ring is then scanned sequentially (using successor pointers) to retrieve
all nodes containing relevant data-items.  Mercury also provids
system-wide metric estimation techniques for load distribution, node
distribution and query selectivity.  These estimates allow for non-uniform
ring-range assignment to nodes to balance load, allow for selection of
long-range contacts (similar to Chord's finger pointers), and allow for
selection of the most selective attribute ring for a query to minimize
query flooding.  This achieves O(m' + log^2 n) routing performance (where
m' is the number of nodes holding data-items relevant to the most
selective range in a query). "P-Trees" also builds on a ring-based DHT.  A
P-Tree acts like a distributed B+ tree, where each peer in the network
holds tree-nodes along the left-most root-to-leaf path.  A P-Tree loosens
some requirements of a B+ tree to allow for more tolerance in the face of
a dynamic network.

GeoPeer is a very different system.  GeoPeer focusses on geographical
attributes (specifically x-y coordinates) and builds a Delaunay
Triangulation overlay on that space.  The triangulation connects nodes to
their "nearest neighbors" in the Delaunay triangulation sense.  Given that
basic topology, Geopeer adds k long-range contacts (motivated by
Klienberg's Small Worlds) to achieve O(1/k log^2 n) routing.  Geographical
queries are answered by sending the query into the desired region and
doing limited flooding within that region.

"Mercury..." and "P-Trees..." build on existing DHT work and point to the
properties provided by these systems.  The contribution of these systems
is to provide a range-query mechanism (and in the case of Mercury, metric
estimation techniques).  It's not clear that Mercury's O(m' + log^2 n)
routing performance is somehow a lower bound that is difficult to
overcome.  And Mercury's application (Caduceus), while whimsical doesn't
motivate scalability and wide applicability.  "P-Trees..." is well rooted
in database theory, and provides O(m + log n) routing performance which is
an improvement, but no presentation of the true cost of the overhead of
maintaining the tree is presented (in a way comparable to the overhead of
DHTs for example).  GeoPeer takes a wholly new approach to range based
query fulfillment, and a construction mechanism resiliant to unbalanced
node distribution is also presented, but methods to address hotspots (an
important aspect of systems given GeoPeers suggested applications) are not
discussed and these issues may weaken some of GeoPeer's long range contact
Management techniques (a key contribution of "GeoPeer...").

From km266@cornell.edu  Thu Feb 16 11:49:38 2006
Return-Path: <km266@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from authusersmtp.mail.cornell.edu (granite1.mail.cornell.edu [128.253.83.141])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1GGnct18520
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 11:49:38 -0500 (EST)
Received: from KEVSTOY (cpe-69-207-37-68.twcny.res.rr.com [69.207.37.68])
	(authenticated bits=0)
	by authusersmtp.mail.cornell.edu (8.13.1/8.12.10) with ESMTP id k1GGnbP4025950
	(version=TLSv1/SSLv3 cipher=RC4-MD5 bits=128 verify=NOT)
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 11:49:38 -0500 (EST)
Message-ID: <000301c63318$f95e3410$4425cf45@KEVSTOY>
Reply-To: "Kevin" <yobz@cornell.edu>
From: "Kevin" <km266@cornell.edu>
To: <egs+summary@cs.cornell.edu>
Subject: PAPER 7
Date: Thu, 16 Feb 2006 11:49:37 -0500
MIME-Version: 1.0
Content-Type: text/plain;
	format=flowed;
	charset="iso-8859-1";
	reply-type=original
Content-Transfer-Encoding: 7bit
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2527
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527

All of the systems we looked at today are different from what we have seen 
before.  They discuss range queries: basically queries that specify a lower 
bound and an upper bound on some attribute and expect to see everything in 
the middle (well, not necessarily, you can specify different ranges, but the 
idea is the same).

Geopeer builds a Delaunay triangulation system such that two nodes are 
neighbors if they are close in space.  Space here can be longitude and 
latitude, therefore nodes are close in the system if they are close 
geographically.  To keep latency down, long range contacts are also kept. 
Each node is responsible for a region of points.  Overall GeoPeer seems like 
a fragile system.  Since everything is based on geography, if part of the 
system goes down or there is latency on the east coast then many queries 
will be significantly slower.  In the end, GeoPeer gets between log and log 
squared (in the total number of nodes)  performance, which doesn't seem 
optimal.

Mercury is a system that is built upon current ring-based DHTs.  The idea is 
to have multiple logical rings that each represent one attribute.  Each node 
is in one ring and has many pointers to its own ring (in a pre-specified 
distribution) in order to achieve good performance, and pointers to random 
other rings.  A node is put into the ring not through a hash function but 
through a direct mapping to it.  This is clearly a downside since the load 
balancing immediately suffers.  Because of this, the authors come up with a 
scheme to load balance and keep the pointers in a correct distribution.  The 
paper details how to get m+log squared routing performance out of their 
system.  To retreive a specific range, a packet is routed to the node 
corresponding to the lower bound and then sequentially traverses the rest of 
the pointers in order to get at every intermediary node.  To help with 
latency and flooding concerns, Mercury uses query selectivity to try to 
search based on the most selective part of the query.  Mercury's worse than 
P-Trees performance leaves the system lacking.  The only other major issue I 
have with it is its poor scaling with multiple attributes.

P-trees are a database inspired range queries system that build upon B+ 
trees to acheive order log N performance.  Each node keeps the left most 
path of its B+ tree, relying on other nodes in the system which collectively 
maintain the tree.  This system, therefore, is a distributed tree instead of 
a distributed hash table.  The system is order log N+m for routing 
complexity and order log N space at each paper.  The system has a complex 
stabilization protocol and in general seems pretty complex.  I would expect 
an actual real-world implementation to be unstable, not nearly as trivial as 
Gnutella or Pastry.  Furthermore, I didn't see any clue as to how much 
overhead there was in maintaining the tree. 

From tudorm@cs.cornell.edu  Thu Feb 16 12:50:24 2006
Return-Path: <tudorm@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1GHoOt06859
	for <egs@unix.cucs>; Thu, 16 Feb 2006 12:50:24 -0500 (EST)
Received: from exchfe1.cs.cornell.edu ([128.84.97.33]) by exchfe2.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 16 Feb 2006 12:50:23 -0500
Received: from [128.84.223.148] ([128.84.223.148]) by exchfe1.cs.cornell.edu over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 16 Feb 2006 12:45:27 -0500
Message-ID: <43F4BA33.6070906@cs.cornell.edu>
Date: Thu, 16 Feb 2006 12:45:23 -0500
From: Tudor Marian <tudorm@cs.cornell.edu>
User-Agent: Thunderbird 1.5 (X11/20051201)
MIME-Version: 1.0
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 16 Feb 2006 17:45:27.0372 (UTC) FILETIME=[C59898C0:01C63320]

GeoPeer:

This paper presents a p2p system  capable of supporting location-aware 
services. The overlay network lays out nodes such that a Delaunay 
triangulation is created, and the structure is augmented with long range 
contacts for better lookup performance. Each peer has constant 
connectivity degree, and because of the long range nodes the bound on 
the routing hops is O(log^2 N) (small world networks). Choosing and 
maintaining the long range links is done using various schemes. Queries 
are answered by first sending them to the appropriate geographical 
region and then performing a local search.

Problem arise in the case when the system cannot maintaining the 
triangulation under heavy churn.

P-Trees:

This work builds upon the B+ trees and the ability to perform range 
queries efficiently while using such a structure. P-Trees are basically 
a distributed index built on top of a p2p system like Chord, in which 
peers maintain parts of semi-independent B++ trees. Each peer is 
responsible for the left most path from itself to the root of the 
correspondnding B+ index. A query is routed to the lower value of the 
range using the index, and the whole range is then retrieved by walking 
on the neighbor pointers of the ring.

The system does not scale well since there's an index for each attribute 
that one might want to index, therefore maintaining multiple indexes 
could be a major problem. The ping and stabilization protocols are 
rather complex and one would expect the system to perform poorly under 
churn.

Mercury:

The paper presents a routing protocol that supports multi-attribute 
range queries. In doing so, each attribute is associated with a routing 
hub, that is basically a ring of peers (a subset of all the peers in the 
system). Within each ring, data is placed contiguously therefore each 
node is responsible for a range of values. This has the downside that 
node joining must consider load ballancing constraints. Detecting load 
imballance is done using a random walk and relocation of nodes from 
regions of less load to "hot" regions. A node that joins is to be 
inserted such that it will take first half of the interval of a highly 
loaded node.

The authors admit that their system does not scale with the increase in 
number of attributes. The scheme employed is very wasteful, in that the 
first hop is the one that chooses the hub, and then all the rest of the 
routing is done within that hub -- this means that if the attribute that 
will determine the hub to be chosen is not picked appropriately, 
consequent filtering will incurr a high cost.


Tudor

From gp72@cornell.edu  Thu Feb 16 12:51:40 2006
Return-Path: <gp72@cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from postoffice10.mail.cornell.edu (postoffice10.mail.cornell.edu [132.236.56.14])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1GHpet07098
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 12:51:40 -0500 (EST)
Received: from orpheus3.dataserver.cornell.edu (orpheus3.dataserver.cornell.edu [128.253.161.167])
	by postoffice10.mail.cornell.edu (8.12.10/8.12.6) with ESMTP id k1GHpcjD011157;
	Thu, 16 Feb 2006 12:51:38 -0500 (EST)
Message-ID: <368894474.1140112297095.JavaMail.webber@orpheus3.dataserver.cornell.edu>
Date: Thu, 16 Feb 2006 12:51:38 -0500 (EST)
From: Gopal Parameswaran <gp72@cornell.edu>
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
Cc: gp72@cornell.edu
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
X-Mailer: uPortal WEB email client 3.0
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1GHpet07098

Mercury is a scalable routing protocol for range queries by creating a set of routing hubs where a routing hub is a logical collection of nodes in the system and queries are passed to one of the hubs corresponding to the attributes that are being passed and a new data item is sent to all hubs for which it has an associated attribute. The routing protocol is  implemented by first partitioning the nodes in the system into a circular contiguous overlay separated into groups called attribute hubs which are in fact logical partitions with each attribute hub corresponding to each  each attribute. The logical partitions are called dimensions. When a node gets a query it resolves all queries in its range or else passes it on.  Each node in the routing hub has a link to its predecessor and to its successor and also maintains a link to all other dimensions or hubs to support the selection of a range query for a set of attributes. Routing is done in a hub with the query entering the system  at the point where the maximum value of the attribute is stored and then traversed link by link till it reaches a value of the attribute at a node that is lesser than the least value in the range query for the attribute. Also the node traversals are reduced by maintaining in its routing table a distance matrix that specify the nodes at a particular range distance from the node. All values that correspond to the data elements that match the query are selected. This design imposes two restrictions. Firstly since the each attribute has its own hub as the number of dimensions increase the links to be maintained increases. Secondly for range queries the advantages of distributed hash tables viz load balancing cannot be utilized since the node id  should have a correlation with the range of the values. Mercury ensures load balancing by implementing load balancing by explicit load balancing by moving around nodes and changing the ranges of the nodes according to the load. 
Every hub in the system has a representative node and nodes join the system by querying it and then getting hub information from a set of representative nodes that represent each hub in the system. The new node chooses a hub at random and contacts members of that hub and inserts itself by taking half of the nodes range of values and used the nodes predecessor pointer as its own and the node as the successor pointer and  changes the nodes predecessor pointer to itself thus adding itself the the network. It also copies the routing table from the old node. Another observation from this is that every new addition is a form of localized load balancing. If three or more successive nodes leave the system then the links in the nodes break down and till the sections are repaired certain sections of the queries would get affected that correspond to the broken links.

P-Tree
The paper deals with a peer to peer network like Mercury that evaluates range queries in addition to equality queries using P-Trees. A P-trees is a spatial access method that defines hyperplanes, in addition to the orthogonal dimensions, which node boundaries may parallel. It can also be imagined to be a collection of b+ trees at every node with each b+ tree corresponding to a range of values. The sub trees have overlapping values indexed by multiple sub trees that helps in reducing excessive co-ordination between the nodes. The P-tree network implements routing different from Mercury in the sense that it also tries to start from a maximum value for the search by first looking for the farthest away pointer that does not overshoot the lower bound of the search and then proceeds from there node by node for each successor till it reaches the lower bound and searches for values till it just crosses the upper bound specified in the rang query. The behaviour of P-Trees is similar to B+ Trees and when consistent would involve logN to the base b. Node insertions are performed by contacting existing nodes and determining the position of the new node and taking the predecessor's pointer and changing the predecessor's pointer to its own. Node deletions are automatically taken care of by two processes called Ping process and stablization process. In the Ping process checks at periodic intervals for node failures or deletions and they are detected and then used by the stablization process where the network is repaired. 

Geo Peer
This paper mainly deals with the scalability of networks of stationary nodes and how support can be provided to very large scale location aware systems. GeoPeer is a location aware Peer to Peer network that supports location constrained queries and information dissemination also taking into account the unbalanced distribution of nodes in the geographic area. It is based on a concept defined as geographic ping that works by storing the path to a request that is made periodical and then using the path for subsequent requests thus bypassing the next queries for the same object. This would be equivalent to caching the address of the objects instead of querying for it. Geo Peer uses a Delaunay triangulation for the topology and routing is done by a neighbour discovery process and choosing an angle in the direction of the destination node with the angle that it makes with the destination node being reduced with each jump from node to node. Thus it can be also imagined as a control system that oscillates around its mean as it flows along time. Thus this algorithm can be changed by making jumps that decrease the angles in proportion to the slope. Also this algorithm uses a concept of Long range contacts that are addresses in its routing table that point to nodes that are at a distance. It implements this using three mechanisms, the first of which the Hop level mechanism looks at the number of hops it has made so far in its search and if the number of hops exceed a certain value it stores the current node in its routing table and again starts the search from that node. Thus for all further queries that are in the same region in space the queries can start from that node. Geo peer implements this routing mechanism by not taking random hashes for node id’s but making hashes that have a correlation with the node location. This algorithm can be further improved by taking hashes that include a distance metric in its hash function and would help in making the finger pointer tables more meaningful. This paper also discusses ab
ining statistics about the hits via hit counters whose values are used to adjust the node distribution.

From ymir.vigfusson@gmail.com  Thu Feb 16 13:07:15 2006
Return-Path: <ymir.vigfusson@gmail.com>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=ham 
	version=3.1.0
X-Spam-Level: 
Received: from nproxy.gmail.com (nproxy.gmail.com [64.233.182.199] (may be forged))
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1GI7Et13493
	for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 13:07:14 -0500 (EST)
Received: by nproxy.gmail.com with SMTP id a4so150732nfc
        for <egs+summary@cs.cornell.edu>; Thu, 16 Feb 2006 10:07:14 -0800 (PST)
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
        s=beta; d=gmail.com;
        h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition;
        b=Uke5qvVwLz9oXxEDUEJbZKhBU1D9w87DgLP1CDKVT/xE4scc9yIW4DQEyR/ekAtlaqgKle2OS0AsbNz0+wgSYC2EIzkWytt9RXSgLnomemNw+oEkmZXxOoHsRepWMDwmCYcySXd6mnu5RO+Fmi9Y5h+jpzxDxMX0Q1kQNJ1331c=
Received: by 10.49.80.13 with SMTP id h13mr222274nfl;
        Thu, 16 Feb 2006 10:07:13 -0800 (PST)
Received: by 10.48.217.10 with HTTP; Thu, 16 Feb 2006 10:07:13 -0800 (PST)
Message-ID: <9302f1e20602161007p304d02fdxc425f6c3228ca15f@mail.gmail.com>
Date: Thu, 16 Feb 2006 13:07:13 -0500
From: Ymir Vigfusson <ymir.vigfusson@gmail.com>
To: egs+summary@cs.cornell.edu
Subject: PAPER 7
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1GI7Et13493

GeoPeer is a location-aware peer-to-peer architecture. Instead of
using distributed hash tables to and equating a hashed identifier
to position, it creates a geographical network and uses machinery
to find long range contacts (LRCs) in order to keep the network
diameter small. The process of self-organization used in GeoPeer
network is done by maintaining a Delaunay triangulation of the
geography of the physical network (meaning that the circumcircles
of triangles do not contain any of the nodes in the network,
essentially the dual of Voronoi cells). Furthermore, for each point
in the space there is exactly one responsible node. What we get is
a geometric partition of the network where nodes are assigned
authority over physically nearby areas (the Voronoi cell), instead
of the identifier ranges as in the DHT P2P networks. This means
that maintenance of the network is done by exchanging messages with
your geographic neighbours. The paper talks about the details of
the coordination required for dynamic triangulation (node arrivals
and departures and corresponding division of space), which is
complicated by the fact that nodes could have different views of the
network topology.
So far, the use of Delaunay does not keep the network diameter small.
GeoPeer tries to battle this by comparing four different ways of
providing long range contacts. The ones that seemed to be experimentally
worse were the Hit count balancing mechanism where you count how often
LRC links are used, and override the ones that are not much used, and
the Small-world mechanism where we divide the space into squares and
randomly connect squares according to an r-harmonic probability
distribution. The ones that turned out better were Hop level mechanism
where you shortcuts are created on commonly used paths (so we can take
bigger steps later), and an eCAN-like recursive division of the space into
four spaces and randomly connecting nodes in one square to two of the
other. Routing in GeoPeer then is done by a greedy algorithm.
The paper lacks theoretical models and proofs, and only shows empirically
that the diameter is kept at about log(n) to log^2(n). It seems to me
that densily populated areas in the world would be less suspectible to
large traffic than intermediate nodes between population clusters.

The P-Tree paper talks about an index structure for peer-to-peer
substrates that enables one to do range queries. At current, DHT based
systems allow for locating of data using only equality lookups.
The main idea in the paper is that B+-trees can be distributed
in a semi-independent yet fault-tolerant fashion (called P-trees)
among nodes so that range queries cost only O(m + log_d n) messages,
where m is the number of peers in the range, d is the order of the
P-trees and n is the number of nodes in the system. (Or as the authors
describe it: "a non-trivial adaptation of Chord to skewed data
distributions.")
The size of P-trees stored on a node is O(d log_d n). Basically, a
peer stores and maintains the leftmost root-to-leaf path of its
corresponding B+ tree, and keeps track of pointers to other peers to
complete the missing subtrees.
It is important to notice that to get fault-tolerance we require some
level of redundency, so subtrees have overlapping ranges and data
values can be indexed by multiple subtrees. The paper describes the
required properties and time/space/reliability guarantees in details
for network maintenance and search, along with proofs. Two of the
properties are coverage and separation, respectively ensuring that
no values are missing in the index, and that subtrees are not too
far apart. The properties are maintained by two processes, the
Ping Process which detects inconsistencies in the network, and
the Stabilization Process which repairs them bottom-up.
The paper also includes experimental evaluation of the P-trees.
As for a downside, most of the theorems talk about guarantees in a
stable network, or eventual consistency. It is not clear how
efficient the protocol is when the network has high churn.

From kelvinso@cs.cornell.edu  Fri Feb 17 01:46:26 2006
Return-Path: <kelvinso@cs.cornell.edu>
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on 
	sundial.cs.cornell.edu
X-Spam-Status: No, score=-0.9 required=5.0 tests=ALL_TRUSTED,AWL 
	autolearn=ham version=3.1.0
X-Spam-Level: 
Received: from exchfe2.cs.cornell.edu (exchfenlb-2.cs.cornell.edu [128.84.97.34])
	by sundial.cs.cornell.edu (8.11.7-20031020/8.11.7/M-3.22) with ESMTP id k1H6kPt26567
	for <egs@unix.cucs>; Fri, 17 Feb 2006 01:46:25 -0500 (EST)
Received: from mail pickup service by exchfe2.cs.cornell.edu with Microsoft SMTPSVC;
	 Fri, 17 Feb 2006 01:46:17 -0500
Received: from mail pickup service by exchfe1.cs.cornell.edu with Microsoft SMTPSVC;
	 Fri, 17 Feb 2006 00:46:48 -0500
Received: from EXCHVS2.cs.cornell.edu ([128.84.97.24]) by exchfe1.cs.cornell.edu with Microsoft SMTPSVC(6.0.3790.1830);
	 Thu, 16 Feb 2006 12:05:48 -0500
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
X-MimeOLE: Produced By Microsoft Exchange V6.5
Subject: Paper 7
Date: Thu, 16 Feb 2006 12:04:37 -0500
Message-ID: <2AA53C9C489B0049B43E56D28088677B52CFD3@EXCHVS2.cs.cornell.edu>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: Paper 7
Thread-Index: AcYzGxGPW5x/n9P+RU+pFwTR8CSa2w==
From: "Kelvin So" <kelvinso@cs.cornell.edu>
To: <egs+summary@cs.cornell.edu>
X-OriginalArrivalTime: 16 Feb 2006 17:05:48.0561 (UTC) FILETIME=[3BB6D810:01C6331B]
Content-Transfer-Encoding: 8bit
X-MIME-Autoconverted: from quoted-printable to 8bit by sundial.cs.cornell.edu id k1H6kPt26567

	"GeoPeer: A Location-Aware Peer-to-Peer System" presents a system
which can easily support location-aware services. "Querying Peer-to-Peer
Networks Using P-Trees" presents an index structure called P-tree, which
efficiently provides range queries while "Mercury: Supporting Scalable
Multi-Attribute range Queries" presents a system other than DHT to support
efficient multi-attribute range queries. 
	Geopeer is a system which supports location-aware operations. It uses
Delaunay triangulation as the basic routing structure. The identification of
nodes is correlated to the node location, so the queries and broadcast can be
easily implemented as location-aware operation. It also uses a light-weight
and effective schemes to manage long-range contacts to improve performance by
keeping low diameter of the network. This system is different than the next
two papers. This paper focuses on location-aware operations while the other
two papers focus on range search in peer-to-peer systems.
	P-tree uses a similar indexing structure as B+-tree in database on
top of DHT to provide efficient range query. The search keys lay in a
circular logical space. Also, each P-tree node stores only the left-most
root-to-leaf path of the B+-tree, and relies on other node to complete the
rest of the B+-tree. Using search a structure, it provides O(logN) search
performance. In addition, the range queries can be answer efficiently by
looking at the smallest value in the range, and then scanning the search key
in the ring clockwise. Since the structure allows peer insertion, deletion
and failures, P-tree nodes have to allow local inconsistency. Under local
inconsistency, it still allows search to proceed correctly with degradation
in performance. But with the Ping Process and Stabilization Process, it will
eventually transform the P-tree back into consistency state. However, P-tree
does not store values which are close together into the same node, and it may
require one hope to get to other node to retrieve each data in the range
query.
	Mercury is a system which supports scalable multi-attribute range
query. It builds a completely new structure called Mercury which uses many
similar techniques in DHT. Mercury does not use DHT as underlying structure
because in DHT the hash of a range is not correlated to the hash of the
values. Mercury uses a hub, logical ring-based structure, for each attribute
in a query. Each query will be routed in one of the hub depending on the
query selectivity on each attribute. Each node is corresponded to a range of
values in each hub. Since Mercury does not use hash of the key to provide
load-balancing and usually the search key space is not uniformly distributed,
it has to uses some other techniques to provide load-balancing. If there is a
highly uneven node in the network, it will leave the hub and rejoin in highly
dense range to split the load. To provide performance, it also has long range
links similar to the one in Symphony with knowledge of node distribution. To
gather statistic of load distribution, node distribution and query
selectivity, it uses random sampling of nodes to estimate the distributions.
This structure is better than P-tree because values which are close together
will be stored in the same node and the lookup will be more efficient.