PastryIM (Peer-to-Peer Instant Messaging System)

 

 

Abstract

 

PastryIM (Peer-to-Peer Instant Messaging System) is an adaptive peer-to-peer Instant Messaging application that enables exchange of instant messages to geographical distant users whilst protecting the anonymity of both the sender and the receiver. There is no centralized server that may do all the housekeeping but instead each of the nodes in the PastryIM network is responsible for cooperating in routing requests to other nodes in the network. This system can be very easily extended to share files, search for files, just like other existing p2p applications.

The client for this Instant messaging system can be separated from the Routing Nodes (servers), where the client refers to the Presentation Interface for a user and can be anything like a standalone application, a web application, a PDA app, etc.

This system uses Pastry p2p protocol (see references) for lookup and exchanging messages between servers. The Pastry protocol enables a message to be sent to any connected node in maximum log2bN hops where N is the number of hops and b is a configuration parameter.

The scalability and optimal performance given by the Pastry protocol is the motivation behind the implementation of a P2P IM system, although centralized server based IM systems are pretty common.

 

Introduction

 

There are a number of Instant Messaging systems available with AOL, MSN, Yahoo, etc being the most popular ones. But all these systems are inherently based on the concept of a centralized server to which each of the IM clients connects and all the messages are routed through this centralized server. Such architecture has several disadvantages:

 

 

PastryIM aims to tackle these types of problems. The system operates as a location-independent distributed system across many individual, low-cost computers, which too serve as equally participating nodes in the system. So, this architecture precludes the need of costly hardware.

 

Following are the primary goals of the PastryIM system:

 

 

The system is a distributed system in which all the nodes have identical capabilities and responsibilities and all communication is symmetric. PastryIM is an internet-based, p2p global IM system, which aims to provide high availability, scalability and security.

 

Pastry Overview ***

 

Pastry is a peer-to-peer routing substrate that is efficient, scalable, fault resilient and self-organizing.Given a userId, Pastry routes an associated message towards the node whose

userId is numerically closest to the 128 msbs of the userId to which the message is directed, among all live nodes.

Assuming a PastryIM network consisting of N nodes, Pastry can route to the numerically closest node for a given userId in less than [log2bN] steps under normal operation (b is a

configuration parameter with typical value 4). Despite concurrent node failures, eventual delivery is guaranteed unless l/2 nodes with adjacent userIds (nodes) fail simultaneously (l is a configuration parameter with typical value 32).

The tables required in each PASTRYIM node have only (2b-1) *[log2bN] +2l entries, where each entry maps a userId to the associated node’s IP address. Moreover, after a node failure or the arrival of a new node, the invariants can be restored by exchanging O(log2bN )messages among the affected nodes.

 

Following is a brief sketch of the Pastry routing scheme:

 

For the purpose of routing, userIds are thought of as a sequence of digits with base 2b.A node’s routing table is organized into [log2bN] levels with 2b-1 entries each. The 2b-1 entries at level n of the routing table each refer to a node whose userId shares the present node’s nodeId in the first n digits, but whose (n +1)th digit has one of the 2b-1 possible values other than the (n +1)th digit in the present node’s id. Each entry in the routing table points to one of potentially many nodes whose userId have the appropriate prefix; in practice, a node is chosen that is close to the present node, according to the proximity metric. If no node is known with a suitable userId, then the routing table entry is left empty. The uniform distribution of userIds ensures an even population of the userId space; thus, only [log2bN] levels are populated in the routing table.

 

In addition to the routing table, each node maintains IP addresses for the nodes in its leaf set and its neighborhood set .The leaf set is the set of nodes with the l/2 numerically closest larger userIds, and the l/2 nodes with numerically closest smaller nodeIds, relative to the present node’s userId. The neighborhood set is a set of l nodes that are near the present node, according to the proximity metric. It is not used in routing, but is useful during node addition/recovery.

 

Node addition and failure:

 

A key design issue in Pastry is how to efficiently and dynamically maintain the node state, i.e., the routing table, leaf set and neighborhood sets, in the presence of node failures, node recoveries, and new node arrivals. Briefly, an arriving node with the newly chosen userId X can initialize its state by contacting a “nearby ”node A (does it need to know the IP Address of a node? Or can it be done with expanding ring methodology????? Still need to figure out. If a node must know the IP Address of a node before entering the system, doesn’t it mean dependence on some sort of centralized server??) and asking A to route a special message with the destination set to X. This message is routed to the existing node Z with userId numerically closest to X2. X then obtains the leaf set from Z, the neighborhood set from A, and the ith row of the routing table from the ith node encountered along the route from A to Z. Using this information, X can correctly initialize its state and notify all nodes that need to know of its arrival, thereby restoring all of Pastry ’s invariants.

To handle node failures, neighboring nodes in the userId space (which are aware of each other by virtue of being in each other’s leaf set) periodically exchange keep-alive messages. If a node is unresponsive for a period T, it is presumed failed. All members of the failed node ’s leaf set are then notified and they update their leaf sets to restore the invariant. Since the leaf sets of nodes with adjacent userIds overlap, this update is trivial. A recovering node contacts the nodes in its last known leaf set, obtains their current leafs sets, updates its own leaf set and then notifies the members of its new leaf set of its presence. Routing table entries that refer to failed nodes are repaired lazily.

Pastry, as described so far, is deterministic and thus vulnerable to malicious or failed nodes along the route that accept messages but do not correctly forward them. Repeated queries could thus fail each time, since they are likely to take the same route. To overcome this problem, the routing is actually randomized. To avoid routing loops, a message must always be forwarded to a node that shares at least as long a prefix with, but is numerically closer to the destination node in the namespace than the current node.

 

PastryIM overview

 

Any host connected to the Internet can act as a PastryIM node by installing the proper software, i.e. if a user wants to join the Instant Messaging PastryIM system, he just needs to install and run the client PastryIM software, which immediately makes the user the member of the PastryIM community.

 

The PASTRYIM system provides the following set of operations to its clients:

 

This causes the node to join an existing network (or join a new one), initialize all relevant tables, and return the node’s userId. The credentials contain information needed to authenticate the local node.

This method is called by the sender node to send the message packet to the user “toUser” along with the credentials for security purposes. If the “toUser” is same as endUser, the packet is not propagated further; else the packet is moved further along the network.

This method returns the UserId, which is the 128-bit identifier of a node on the network, if this node exists and is reachable in the network at that particular time.

This method returns the list of all the users of the PastryIM system. This may include both the live nodes as well as the nodes, which were earlier connected to the system. (Not implemented as yet)

 

Each PastryIM node is assigned a 16-bit (will be changed to 128-bit) node identifier, called a UserId. The UserId indicates a node’s position in a circular namespace, which ranges from 0 to 216 – 1. The UserId assignment is quasi-random (SHA-1 hash of the node’s public key) and cannot be biased by a malicious node. This process ensures that there is no relationship between the value of the UserId and the node’s geographic location, network connectivity, ownership, or jurisdiction. 

 

PastryIM is layered on top of Pastry, a peer-to-peer request routing and location scheme.

 

PastryIM operations

 

Following are the method implementations for PastryIM:

 

Node Arrival: When a user joins the PastryIM system, a userId is computed as the cryptographic hash of the node’s IP Address and port (Security features have not been implemented as yet). Each node maintains a list of (username, IPAddress) about whom it knows. Each node on joining the network broadcasts its arrival to the closest k nodes, which store the information in a table.

 

SendMessage: When a user wants to send a message to User X, it routes the message via Pastry, using the ToUserName as the key and the message as another argument. The node looks up into its Routing table and finds the node closest to him. It sends the message containing the ToUsername to this node. This node looks up into its user table and if it has the information for this user, ie it knows the location of this user, it routes the message effectively to the user, else this node again looks up the node closest to him, and sends the message to that node and the process continues. If the node with the given username is found, that node receives the message and sends back the acknowledgement to the sender node. The Parent node then relays its client the Address of User X, and the client now can communicate with User X directly.

 

All these messages are sent via Pastry protocol, which has been discussed earlier.

 

LookUpUser: A user can be looked up in the same manner as a message is sent to a user. In this case, only the result message is different, which returns the actual username and its other attributes like address, interests, etc. The Client can the add that User as its buddy.

 

GetUserList: This returns a list of all the users, which are currently joined in the system, and also the list of users, which were there earlier in the system but are now logged out. Has not been implemented.

 

Message exchange with other IM systems: The system can be easily integrated with the other existing IM systems via Jabber. This system already has the functionality built-in to send messages to widely used IM systems like Yahoo, MSN, AOL and provides interfaces to integrate with this system.

 

APPROACH TAKEN

 

The system is being developed in the Java programming language, which gives the user to flexibility to use it on any of the commercial operating systems.

 

There were two approaches towards building such a system:

 

  1. P2P Server based: In this technique, the clients are separated from the server and a system-defined number of servers work in Pastry based network. Each of the server is responsible for a fixed number of clients which depends upon the factors like client locality, server load, etc.
  2. P2P client based:  In this technique, all the clients both act as a server and a client. The client itself maintains the State information and is a thick client. This strategy is being pursued for the project right now, as it gives high scalability and is totally independent of any server components.

 

Implementation: I am using Approach 1 for this implementation where there is a cluster of Pastry Servers, which can support multiple clients. When a client wants to join an existing Pastry network, it knows the address of at least 1 running server, with which it registers.

The current implementation assigns each of the nodes a 16-bit nodeId that which will be later on changed to 128-bit MD5 based hash. This was done to simplify the implementation and testing. The user registers a new username when he first enters the Pastry network. A nodeId is assigned to the user and it is checked at other nodes whether this username already exists or not. On startup, the Pastry Servers join the network and each has the knowledge of its numerically nearest Pastry servers. Once the state tables are constructed (these can also be stored in some fixed storage), the new Pastry server can now accept Subscription requests from the clients and can route the messages being send by clients to other users.

The system takes care of new node arrivals or node failures and state tables are updated accordingly.

 

WORK STATUS

 

Completed:

 

 

Future work 

 

Conclusion:

 

This proposal for building an IM system is based on the Pastry protocol, which is a generic peer-to-peer content routing system based on a self-organizing network of nodes.

This system is totally decentralized and doesn’t require any centralized server and is highly scalable. The service routes the messages from one user to another through a distributed array of nodes.

The system routes the message to any node in O(log N) steps in the absence of node failures.

 

 

*** References:

1) Pastry: Anthony Rowstron and Peter Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems.

2) PAST: Rawstron, Druschel. PAST.

3) FreeNet: Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System.

4) Jabber: http://www.jabber.org