In this lecture we look at a bunch of different gossip algorithms all linked by their use of gossip "epidemics" to build up a picture of the state of a distributed system.

First we talk about a pretty trivial example, namely tracking who is in the system.  The basic idea is to gossip about folks you've heard from recently (assumes that processes somehow put a time on messages, so that if I say "Danny sent me a message at his time 10" you can see that this is more current than the last time you heard from Danny, at his time 5.  You can use wall clock time, but keep in mind that clocks might not be synchronized.... so if you do use wall clocks, make sure you can trust them!)

Anyhow, then we generalize this to a notion of tracking other sorts of information and marking up a form of system "map" or "topology graph" with it, to create a live version of a google mashup (those are the kinds of pushpin maps you get on the web when you search for a Starbucks Coffee shop in Ithaca, or wherever -- they show a fixed roadmap and then superimpose data from a table extracted out of a database or something).  A live mashup would reflect recently collected data and would revise itself as things change.

With a gossip epidemic the revision occurs within log(N) time after something happens.

In big systems not everyone will want to watch everyone else.  We need to subdivide into big subsystems with only some bounded number of nodes in each.

This leads to the idea of Astrolabe, which is a hierarchical form of database that uses gossip and aggregation precisely in this manner.

Finally, we look at a different use of gossip, in support of distributed lookup (the DHT problem solved by Chord).  First we discuss Chord, and we also discuss a weakness of Chord.  Not everyone knows about this, but there is a way to attack Chord that leaves it totally screwed up, forever.

Basically, you partition the network and let Chord start to repair itself (the attacker needs control over which links break and when).  But you remove the partition when the two Chord rings are not quite separated -- they still have some cross-pointers (for example cached lookup pointers).   Next you need to add new nodes and simultaneously kill off the nodes that each side knew about on the other side, prior to the partitioning failure (Chord pings "old contacts" periodically and we don't want it to find them).  On the other hand, you do need to let Chord cross those cached links and maintain persistent cross-pointers, which it will tend to do in this case.  The outcome is a permanently screwed up Chord system that simply malfunctions: inserts and lookups basically fail and the thing can't detect this and won't ever repair itself.  The Chord papers usually have a footnote pointing out that yes, this is an issue, but it isn't likely to occur.

Anyhow, with Gossip we can create a much better DHT.  We discuss Kelips which uses gossip to build an emergent, convergently consistent DHT.  Lookups are fast, too: 1-hop instead of log(N) hops in Chord.  Of course we need more space to pull this off: sqrt(N) instead of log(N).  This is a small issue since no system is big enough for sqrt(N) to get very large.

Epichord also fixes Chord.  Basically, gossip epidemics can solve the issues with Chord.