Paper 3

CAN (Content-Addressable Network) = constantly makes=20 parallels against its infrastructure and a hash-table. It is a = torus=20 shaped overlay that is d-dimensional. A packet is routed in one = direction=20 along each dimension until it reaches its destination. This allows = a=20 routing path length of (d/4)(n^(1/d)) hops. A node picks a random = location=20 along each dimension to put itself into and gets associated with a bunch = of=20 key,value pairs. It inherits these pairs from a node that was = taking up=20 part of that volume. To route a packets to any specific node, you = just=20 contact your neighbor in the dimension that leads to that node and relay = the=20 message to it. To do this you have to maintain a neighbor list = which is=20 O(d) in size. A lot of the joining is similar to those of other = networks:=20 you have to know a node which is already in the system and then find a = place to=20 put yourself. You then inherit a large amount of information from = your=20 new-neighbor and tell your neighbor nodes about your place in the = system. =20 It seems like a lot of the interesting parts of the paper were about the = improvements and modifications you could make to this scheme.
The = first=20 obvious improvement would be to increase the dimensionality in which you = operate. The more dimensions you have, the fewer application-level = hops=20 you will have to take to get a message to any other node. = Furthermore, you=20 have to store more neighbor information which allows for more = redundancy. =20 The next improvement is called 'realities.' Each reality is an = entire=20 network or coordinate space of its own. Having multiple realities = is akin=20 to being connected to multiple networks. In this case, each node = has a=20 location in each reality therefore making data redundant. You = insert=20 yourself into a random location in each reality and therefore you can be = closer=20 to a peice of location in one reality as compared to another, possibly = allowing=20 you to send a message to a particular node quicker (and never = slower). =20 FUrthermore, the redundancy helps prevent network outages and since all = data is=20 replicated r times, where r is the number of realities, sudden node = departure is=20 not nearly as big an issue. The third improvement is quite simple: = it=20 sends out a ping to each neighbor and finds the neighbor that will get = your=20 message closer to your destination with respect to its ping time. = A fourth=20 improvement is allowing each zone in the coordinate space to have = multiple peers=20 -- this provides redundancy and lower delays as you route through the = fastest=20 peer. It is similar to reducing the overall number of nodes in the = system,=20 which makes the network delay lower. The fifth improvement is = allowing=20 multiple hash functions which would create distinct <key,value> = pairs=20 which therefore would put more redundancy into the system. There = was a=20 section on being geographically or topologically aware of the underlying = internet structure, but it was not tested. The seventh improvement = allows=20 nodes to be inserted in locations where zone volume is bigger, this = allows more=20 even distribution of nodes in the system (alloing all nodes to be within = a=20 factor of 4 from each other). This is accomplished by having a = node give=20 the join command to a neighbor if the neighbor occupies a higher volume = than it=20 does. Caching and replication are standard in other p2p systems = and was=20 also detailed here.
This paper's main concentration seemed to be on=20 improvements to peer-to-peer systems, specifically the one introduced = early in=20 the paper. These improvements could well be put into other = systems. =20 Despite, or more probably because of this, there were large ommisions in = terms=20 of security and other big issues. Security was not mentioned at = all in the=20 paper and there are loopholes. A node could try to join in a = specific zone=20 to try and take over a specific <key,value> pair in order to drop = the=20 packets on the ground. For example, the RIAA could hash "Nirvana" = and try=20 to put itself in the zone associated with that, and drop packets or send = them=20 back with faulty data. Furthermore, it seemed as if failure was = not taken=20 into account as much as it should have been. The solution seems to = be=20 "just wait for the data to be refreshed." This seems to be a bad = solution=20 -- there is a compromise that has to be made. Either data will be=20 unavailable for a period of time or the network will constantly be = bombarded=20 with refreshes. Furthermore, the paper seems to think that most=20 disconnections will be intentional, which is not reasonable in a p2p = file=20 sharing system where people drop unexpectadly. One part of the = paper I am=20 confused about is the neighbor data. In the TAKEOVER part of the = paper, it=20 seems as if a node has access to its neighbor's neighbors. The = only way=20 this could be true is if the node held O(d^2) information about its=20 surroundings. The paper very clearly states that it only keeps = track of=20 O(d) neighbors. If this is the case then obtaining the = disconnected=20 neighbor's neighbors will be tricky and require sending potentially many = packets. This part was either omitted from the paper or I misread=20 it.