CS 514 Spring 2008: Assignment 1

Due: February 15, 2007.

Objective

An “overlay network” is a software layer that implements the same style of send/receive functionality as does the Internet, but that does so on behalf of some set of endpoint nodes. The overlay is implemented by a software layer that itself uses standard Internet functionality (TCP/IP, UDP, etc). However, applications running “over” the overlay have access to special advantages that normal Internet communication might lack, such as security, better quality of service, or other properties.

Our goal for assignment 1 is to create an overlay network for a set of nodes that will overcome a certain type of Internet failure called a transient partitioning event. When a network partitioning event occurs, certain network paths stop working, even though there may be other “indirect” paths that could carry data between the same nodes. For example, suppose that our endpoint nodes are machines A, B and C. A situation could arise in which A can’t make direct connections to B, and yet A can talk to C and C can talk to B, hence A could have relayed information through C and continued to communicate.

The problem we're asking you to solve has a very simple solution. If your first idea seems complicated, we want to urge you to pause and think more about it. You'll be amazed at how simple the solution can be. In fact, the whole thing can be coded up in at most a few hundred lines of code and many will be cut-and-pasted. Our guess is that on can solve this in perhaps twenty lines of "real" code, and without much of an "algorithm", either. So if you find yourself with a solution that doesn't sound much like this, talk to friends or ask us for help!

You’ll be building a library of procedures callable from C# on .net, using the Winsock layer. You are free to create overlay links between any two nodes that have direct network-level connectivity, but be mindful of the cost associated with each overlay link formed. Applications using your library will benefit from “better” connectivity than if they used (only) direct point-to-point communication, because your library will hide disruptions that could prevent this sort of direct communication from working.

The interface we want you to support is as follows:

· my_overlay.join(IPEndPoint contact, int my_portno); -- initializes the library. contact is a normal IP-address:port pair, supplied as an argument. The IP-address part of the contact argument can be null (all zeros), in which case the library will assume that this is the first node to join the overlay. Otherwise, if contact's IP address is non-null, the library will connect to an existing overlay of which the contact node must be an existing, reachable member. join is of type void, but it should throw an IO Exception if something goes wrong. The port number should tell the overlay library which TCP port number to use for this process – we’re letting you control this because doing so makes it easier to run multiple instances of your demo program on a single machine (be careful to pick port numbers that aren’t likely to already be in use for some other purpose).

· my_overlay.OnRcv += new EventHandler(this.GotMsg); -- registers a procedure to handle new incoming messages. Any given message should be delivered exactly once, and if multiple messages came from a single source, they should be delivered in the order they were sent. EventHandler here is any suitably defined delegate, e.g., “public delegate void EventHandler(byte[] msg);”

· my_overlay.send(IPEndPoint dest, byte[] message); -- sends the message to the designated destination node. That node’s GotMsg procedure will be invoked with the message as an argument. The message could be of arbitrary non-zero size: anything from 1 byte to many megabytes. send should always succeed in our overlay. If there is no path from the source to the destination available (because too many links have broken), send should block until a path is restored.

Although this interface poses some challenges even without the overlay functionality, your job is to implement a mechanism whereby the overlay network can route around temporary outages that partition nodes from one-another so that direct connectivity is not possible, while leaving at least one indirect path available.

To facilitate testing add an additional control interface:

· my_overlay.partition(IPEndPoint dest); -- temporarily disables direct communication from the node on which the call is done to (or from) the destination node. Any messages currently in transit will be lost, and outgoing messages (if there are any when the call is done) will similarly be dropped, with no error visible to anything in the library. Notice that this can be called from either “side” of the link, or even on both sides.

· my_overlay.repair(IPEndPoint dest); -- restores normal direct communication from the calling node to and from the destination node.

Build a demonstration application to illustrate the behavior of your overlay network. We aren’t going to tell you how to do this; the job of designing this demonstration is part of the assignment! The demo should have a control console (a Visual Studio form) with an interface that allows the user to partition or repair links, and showing the information that you need to show us to make it crystal clear how your overlay works.

Test your application with various numbers of endpoints on a single machine in the CSUG laboratory. Then test it again with more than one machine (two would be fine). Finally, design a simple test program that can evaluate the performance of your solution. Measure average delay and the variance in delay for sending 10,000 messages through the network, with a single fixed source, but multiple destinations. Now break some links (leaving at least one path from the source to each destination). Measure the delays again. (Your performance will probably depend on the “topology” of the resulting network).

Note: There are many other things one could have tested, such as patterns in which there are multiple senders. It would be wonderful if you have time to do so. However, we won’t insist upon it.

Goals

The primary goals here are:

A working correct solution: each message should be delivered (at least eventually) exactly once, and
A reasonably thorough evaluation, including a performance measurement of the solution. This could include both theoretical and practical analysis of how your solution behaves in different possible scenarios.

Finally, all things being equal, solutions that achieve very high performance and are not disrupted by link breakage and repair (e.g. the performance is unaffected) will be considered better than solutions that are sluggish or that have high costs at the moment of a breakage or a repair. On the other hand, solutions that violate other rules (for example, where messages can be lost entirely or delivered twice) will be considered wrong.

What to hand in.

Since this is the first assignment, some ground-rules apply. Hand in PDF files or Word documents. Use 12-point fonts, normal spacing, and leave at least 1 inch margins on all sides. Leave an additional two inches of space at the top of the first page – we need the 3 inches to write comments. Put your name and netid in the “footer” on each page of anything you hand in. Prefix each file with each of your netid(s), separated by the underscore character(‘_’). Give the submitted files sensible names, such as vv38_Writeup.pdf, and vv38_Code.pdf (for single person teams), or kpb3_vv38_Writeup.pdf, and kpb3_vv38_Code.pdf (for two person teams)

Please hand in a short writeup explaining the algorithm you designed and implemented in the overlay library. A reader who reads this document should be able to implement your solution, independently, and end up with more or less the same code. Figures illustrating the technique are very helpful. We expect this discussion to be short, perhaps a page of text plus any illustrations you include. Discussions that are longer than 3 pages of text are strongly discouraged, even if this forces you to struggle to find a way to describe your solution concisely. You can leave out some details if that helps.

Next, in the same document, describe the performance you obtained with your solution. Tell us what the costs are when the overlay is “intact” and how they change, if at all, when links are broken (if this depends on the topology of the network, tell us what scenarios you tested in addition to telling us what numbers you obtained). Always include the standard deviation. If there are temporary costs right at the moment you break a link (for example, if there is a mechanism that only kicks in when the breakage occurs), discuss those costs, and similarly for any costs that occur when the link is restored. Again, we expect this to be fairly short, perhaps a few pages at most. It should be possible to get full credit with a single page writeup if you are clear and to the point.

Last, attach a printed copy of your code.

Important Note: All code that's part of your Assignment 1 solution needs to be your own -- i.e., you are not allowed to reuse existing code-bases (from similar projects) for this assignment. You are of course allowed to study, adapt or implement existing ideas/algorithms (more on this in the Hints section below), but you cannot reuse code. We adopt this policy here since this assignment should be reasonably simple. (On the other hand, you are encouraged to reuse existing code-bases in the more complex Assignments 2 and 3)

How to turn it in.

Upload the two PDF files (one for the writeup and one for the code), suitably named, at the 514 CMS site prior to the deadline. There is no need to mail Professor Birman or the TA’s. As noted in the course overview, contact us ahead of time if you get into trouble on this assignment; we may be able to bail you out, and in any case we may have a preference as to whether you should hand in a partial solution or try to finish debugging your code. Late assignments will only be accepted if we gave you permission before the deadline!

How we plan to grade it.

We’ll print out your submission, read it carefully, and comment on the quality and correctness of the algorithm. We’ll look at your code for structure, clarity (comments really do help!) and we’ll compare your performance with the performance others obtained. The graders will be requesting demos from some students, more or less at random (although if they are puzzled by your algorithm or code, they are obviously more likely to ask for a demo than if it is crystal clear!).

Hints

We aren’t going to tell you how to solve this problem, but we suggest that you read the description of the Resilient Overlay Network system, from MIT. You may not base your solution on the code available from MIT – using their code at any stage of developing your solution would be considered a violation of the academic integrity policy. However, you are welcome to read the paper and can use their ideas if you like.

RON isn’t necessarily a good solution for our version of the problem. For example, suppose (just hypothetically) that a system always had two path-independent ways of routing messages from source to destination. It could even send every message twice. Then, even if one path were to temporarily stop functioning...

One final suggestion. We’ve pointed out that the Internet supports lots of ways of sending messages: TCP, UDP, even UDP multicast. Nothing about this assignment assumes anything about which of those methods would be the best choice here. You should make that choice in a way that fits well with your solution.