PastryIM (Peer-to-Peer
Instant Messaging System)
PastryIM (Peer-to-Peer Instant Messaging System) is an adaptive peer-to-peer Instant Messaging application that enables exchange of instant messages to geographical distant users whilst protecting the anonymity of both the sender and the receiver. There is no centralized server that may do all the housekeeping but instead each of the nodes in the PastryIM network is responsible for cooperating in routing requests to other nodes in the network. This system can be very easily extended to share files, search for files, just like other existing p2p applications.
The client for this Instant messaging system can be separated from the Routing Nodes (servers), where the client refers to the Presentation Interface for a user and can be anything like a standalone application, a web application, a PDA app, etc.
This system uses Pastry p2p protocol (see references) for lookup and exchanging messages between servers. The Pastry protocol enables a message to be sent to any connected node in maximum log2bN hops where N is the number of hops and b is a configuration parameter.
The scalability and optimal performance given by the Pastry protocol is the motivation behind the implementation of a P2P IM system, although centralized server based IM systems are pretty common.
There are a number of Instant Messaging systems available with AOL, MSN, Yahoo, etc being the most popular ones. But all these systems are inherently based on the concept of a centralized server to which each of the IM clients connects and all the messages are routed through this centralized server. Such architecture has several disadvantages:
PastryIM aims to tackle these types of problems. The system operates as a location-independent distributed system across many individual, low-cost computers, which too serve as equally participating nodes in the system. So, this architecture precludes the need of costly hardware.
Following are the primary goals of the PastryIM system:
The system is a distributed system in which all the nodes have identical capabilities and responsibilities and all communication is symmetric. PastryIM is an internet-based, p2p global IM system, which aims to provide high availability, scalability and security.
Pastry is a
peer-to-peer routing substrate that is efficient, scalable, fault resilient and
self-organizing.Given a userId, Pastry routes an associated message towards the
node whose
userId is numerically
closest to the 128 msbs of the userId to which the message is directed, among
all live nodes.
Assuming a PastryIM
network consisting of N nodes, Pastry can route to the numerically closest node
for a given userId in less than [log2bN] steps under
normal operation (b is a
configuration
parameter with typical value 4). Despite concurrent node failures, eventual
delivery is guaranteed unless l/2 nodes with adjacent userIds (nodes) fail
simultaneously (l is a configuration parameter with typical value 32).
The tables required
in each PASTRYIM node have only (2b-1) *[log2bN] +2l
entries, where each entry maps a userId to the associated node’s IP address.
Moreover, after a node failure or the arrival of a new node, the invariants can
be restored by exchanging O(log2bN )messages among the
affected nodes.
Following is a brief
sketch of the Pastry routing scheme:
For the purpose of routing, userIds are
thought of as a sequence of digits with base 2b.A node’s routing
table is organized into [log2bN] levels with 2b-1
entries each. The 2b-1 entries at level n of the routing table
each refer to a node whose userId shares the present node’s nodeId in the first
n digits, but whose (n +1)th digit has one of the 2b-1
possible values other than the (n +1)th digit in the present node’s id. Each
entry in the routing table points to one of potentially many nodes whose userId
have the appropriate prefix; in practice, a node is chosen that is close to the
present node, according to the proximity metric. If no node is known with a
suitable userId, then the routing table entry is left empty. The uniform
distribution of userIds ensures an even population of the userId space; thus,
only [log2bN] levels are populated in the routing table.
In addition to the
routing table, each node maintains IP addresses for the nodes in its leaf set
and its neighborhood set .The leaf set is the set of nodes with the l/2
numerically closest larger userIds, and the l/2 nodes with numerically closest
smaller nodeIds, relative to the present node’s userId. The neighborhood set is
a set of l nodes that are near the present node, according to the proximity
metric. It is not used in routing, but is useful during node addition/recovery.
Node addition and failure:
A key design issue in Pastry is how to
efficiently and dynamically maintain the node state, i.e., the routing table,
leaf set and neighborhood sets, in the presence of node failures, node
recoveries, and new node arrivals. Briefly, an arriving node with the newly
chosen userId X can initialize its state by contacting a “nearby ”node A (does
it need to know the IP Address of a node? Or can it be done with expanding ring
methodology????? Still need to figure out. If a node must know the IP Address
of a node before entering the system, doesn’t it mean dependence on some sort
of centralized server??) and asking A to route a special message with the
destination set to X. This message is routed to the existing node Z with userId
numerically closest to X2. X then obtains the leaf set from Z, the
neighborhood set from A, and the ith row of the routing table from the ith node
encountered along the route from A to Z. Using this information, X can
correctly initialize its state and notify all nodes that need to know of its
arrival, thereby restoring all of Pastry ’s invariants.
To handle node failures, neighboring nodes in the userId space (which
are aware of each other by virtue of being in each other’s leaf set)
periodically exchange keep-alive messages. If a node is unresponsive for a
period T, it is presumed failed. All members of the failed node ’s leaf set are
then notified and they update their leaf sets to restore the invariant. Since
the leaf sets of nodes with adjacent userIds overlap, this update is trivial. A
recovering node contacts the nodes in its last known leaf set, obtains their
current leafs sets, updates its own leaf set and then notifies the members of
its new leaf set of its presence. Routing table entries that refer to failed
nodes are repaired lazily.
Pastry, as described so far, is deterministic and thus vulnerable to malicious or failed nodes along the route that accept messages but do not correctly forward them. Repeated queries could thus fail each time, since they are likely to take the same route. To overcome this problem, the routing is actually randomized. To avoid routing loops, a message must always be forwarded to a node that shares at least as long a prefix with, but is numerically closer to the destination node in the namespace than the current node.
PastryIM
overview
Any host connected to the Internet can act as a PastryIM node by installing the proper software, i.e. if a user wants to join the Instant Messaging PastryIM system, he just needs to install and run the client PastryIM software, which immediately makes the user the member of the PastryIM community.
The PASTRYIM system provides the following set of operations to its clients:
This causes the node to join an existing network (or join a new one), initialize all relevant tables, and return the node’s userId. The credentials contain information needed to authenticate the local node.
This method is called by the sender node to send the message packet to the user “toUser” along with the credentials for security purposes. If the “toUser” is same as endUser, the packet is not propagated further; else the packet is moved further along the network.
This method returns the UserId, which is the 128-bit identifier of a node on the network, if this node exists and is reachable in the network at that particular time.
This method returns the list of all the users of the PastryIM system. This may include both the live nodes as well as the nodes, which were earlier connected to the system. (Not implemented as yet)
Each PastryIM node is assigned a 16-bit (will be changed to 128-bit) node identifier, called a UserId. The UserId indicates a node’s position in a circular namespace, which ranges from 0 to 216 – 1. The UserId assignment is quasi-random (SHA-1 hash of the node’s public key) and cannot be biased by a malicious node. This process ensures that there is no relationship between the value of the UserId and the node’s geographic location, network connectivity, ownership, or jurisdiction.
PastryIM is layered on top of Pastry, a peer-to-peer
request routing and location scheme.
Following are the method implementations for PastryIM:
Node Arrival: When a user joins the PastryIM system, a userId is computed as the cryptographic hash of the node’s IP Address and port (Security features have not been implemented as yet). Each node maintains a list of (username, IPAddress) about whom it knows. Each node on joining the network broadcasts its arrival to the closest k nodes, which store the information in a table.
SendMessage: When a user wants to send a message to User X, it routes the message via Pastry, using the ToUserName as the key and the message as another argument. The node looks up into its Routing table and finds the node closest to him. It sends the message containing the ToUsername to this node. This node looks up into its user table and if it has the information for this user, ie it knows the location of this user, it routes the message effectively to the user, else this node again looks up the node closest to him, and sends the message to that node and the process continues. If the node with the given username is found, that node receives the message and sends back the acknowledgement to the sender node. The Parent node then relays its client the Address of User X, and the client now can communicate with User X directly.
All these messages are sent via Pastry protocol, which has been discussed earlier.
LookUpUser: A user can be looked up in the same manner as a message is sent to a user. In this case, only the result message is different, which returns the actual username and its other attributes like address, interests, etc. The Client can the add that User as its buddy.
GetUserList: This returns a list of all the users, which are currently joined in the system, and also the list of users, which were there earlier in the system but are now logged out. Has not been implemented.
Message exchange with other IM systems: The system can be easily integrated with the other existing IM systems via Jabber. This system already has the functionality built-in to send messages to widely used IM systems like Yahoo, MSN, AOL and provides interfaces to integrate with this system.
The system is being developed in the Java programming language, which gives the user to flexibility to use it on any of the commercial operating systems.
There were two approaches towards building such a system:
Implementation: I am using Approach 1 for this implementation where there is a cluster of Pastry Servers, which can support multiple clients. When a client wants to join an existing Pastry network, it knows the address of at least 1 running server, with which it registers.
The current implementation assigns each of the nodes a 16-bit nodeId that which will be later on changed to 128-bit MD5 based hash. This was done to simplify the implementation and testing. The user registers a new username when he first enters the Pastry network. A nodeId is assigned to the user and it is checked at other nodes whether this username already exists or not. On startup, the Pastry Servers join the network and each has the knowledge of its numerically nearest Pastry servers. Once the state tables are constructed (these can also be stored in some fixed storage), the new Pastry server can now accept Subscription requests from the clients and can route the messages being send by clients to other users.
The system takes care of new node arrivals or node failures and state tables are updated accordingly.
Completed:
Future work
Conclusion:
This proposal for building an IM system is based on the Pastry protocol, which is a generic peer-to-peer content routing system based on a self-organizing network of nodes.
This system is totally decentralized and doesn’t require any centralized server and is highly scalable. The service routes the messages from one user to another through a distributed array of nodes.
The system routes the message to any node in O(log N) steps in the absence of node failures.
*** References:
1) Pastry: Anthony Rowstron and Peter Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.
2) PAST: Rawstron, Druschel. PAST.
3) FreeNet: Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System.
4) Jabber: http://www.jabber.org