CS514 Spring 2008

Assignment #3

Due: May 9, 2008

Objective

In your future career, you will inevitably discover that not just building, but even designing all details of any sufficiently large and complex application is far beyond the capabilities of a single person, and furthermore, that any large system has to constantly evolve to adapt to changing requirements. For these reasons, component-oriented approach, modularity, and separation of concerns are the key to success. They are the key factors that enable a technical architect to divide work in a 10-person team in a way that leaves each developer a manageable and reasonably isolated piece of work to focus on. They allow team members to specialize and gain expertise in their areas, thereby freeing the technical lead from the need to be an omnibus on everything project-related, something that usually ends with a complete failure. They also allow the entire team re-implement selected parts of a system to adapt to changing requirements without breaking the whole, and to save development time by reusing existing general-purpose software components implemented by others.

Designing distributed systems in a modular manner is particularly difficult because we lack tools and language support. For desktop applications, modularity involves a heavy use of things like object-oriented programming, which allows us to hide implementation details of a part of the application behind a generic interface, and replace the implementation with another one that matches the same interface without breaking inter-component dependencies. But distributed systems don't even have a true object-oriented language to begin with. The closest we get to having one is through the use of web services. But web services are only useful for a limited range of applications where scalability and performance aren't essential. For example, they are an awfully poor match for implementing an Internet-scale video streaming service or a massively multiplayer game. Besides being relatively slow and heavy-weight due to XML serialization overhead, among other factors, web services are also a client-server technology in the first place, and as such, they are a poor match for the modern Internet, which is becoming predominantly peer-to-peer. In a system with hundreds of thousands of casual users spread across a large area, any technology that places too much overhead on a centralized client-server infrastructure is doomed to be expensive and difficult to maintain, it will suffer from bottlenecks, and from competition with substantially cheaper and more efficient services that can leverage the potential for direct peer-to-peer interactions.

In real large-scale systems, the closest we get to modular design is through protocol layers. Higher-level protocols that deal with more abstract tasks such as replication or content delivery are designed to leverage lower-level protocols that might deal with tasks such as delivering packets from A to B, or locating A and B in the first place. For example, the Internet infrastructure itself involves a routing infrastructure, on top of which we run DNS. We have a layer that allows machines on the internet to establish point-to-point connections, on top of which one may build an overlay, which is then used as a basis for a content-distribution service. Fault-tolerant systems often involve layers that deal with detecting failures, organizing nodes across the internet into groups, which run various multicast or replication protocols that are further used as a basis for replicated data structures, replicated services, a means of making consistent configuration changes etc. Implementation of each of the layers can often change without affecting the implementation of the layers below or above it. For example, your browser can fetch content from this website through a direct of a VPN connection. In one case, it will talk to the server directly, whereas in the other, it may involve additional layers for encryption and tunneling of the traffic, but the protocol used by the browser doesn't need to deal with any of these aspects. Similarly, you can run an overlay network or a BitTorrent client on top of either type of connections without changing it a bit.

By now, having worked on the previous assignment, you will understand that working with protocol layers as a means of structuring your application is not easy. While a desktop application developer may simply create a new object and declare a new variable to represent a component, a distributed system developer often needs to maintain state in communication channels. This is not always necessary, and for certain parts of the application that don't involve heavy traffic, such as e.g. a component that allows users to login, fetch an udpate, register an address with a centralized repository or lookup a service, web services are perfectly adequate. However, in any part of the system that involves heavy volume of traffic, such as transactions, video streams, events from sensors, simultaneous updates to thousands of document accessed by thousands of users, the use of peer-to-peer technologies that carry data and events directly between producers and consumers, without the bottleneck of going through a central server, is a must if the system is meant to scale.

This assignment is meant to be an exercise in modular distributed system design. We are asking you to use QuickSilver Live Objects framework to build a complete distributed application that consists of multiple interconnected distributed components. Some of these components will be simple agents that live in a single location, but many will be distributed protocols, such as multicast, and some will be composed or build upon one or more other components, much in the way your shared document in the previous assignment was built upon an underlying multicast channel.

There are two key aspects of this project, both of which must be present in your solution and will count towards the grade.

Your system should not exclusively involve web services. While it is perfectly ok to use web services for many parts of it, you should also leverage distributed protocols. Your system should make a significant use of multicast, and may also use other protocols you yourself develop, either from scratch or by wrapping existing tools or libraries to fit into the live objects framework. The less your system relies on web services for tasks such as event or content delivery, storage, and others that involve heavy throughput or large volume of data, the better. This being said, there are tasks for which, like mentioned earlier, web services are perfectly adequate, and you should use them when appropriate. Also, you only have a limited amount of time for this assignment, so being extreme in this aspect and trying to use multicast everywhere you can might not be the best idea. If unsure, you should consult with us sometime early in the planning stage.
Your system should be modular, and the more modular it is, the better. Building even the most fascinating monolithic application that will break apart if any piece of it is touched will defeat the purpose of this assignment. Your system should involve multiple distributed components, and ideally multiple different types of distributed components. Sensors, multicast channels, mash-ups, storage objects, directories, containers, logic objects etc. are all good examples of what we have in mind. The components do not need to be complex and do not necessarily need sophisticated logic. Indeed, since you have limited time, most components should be rather simple, and it is perfectly acceptable to leverage existing code. The best applications would be composed of reusable components that someone might take in the present form and use elsewhere, possibly for an entirely different purpose. If your system can function not just an application, but as some kind of a toolkit for building a certain class of applications, that's even better.

Teams

You may work alone, or in a team of as many as 3 co-workers. We strongly encourage you to work as a team.

Optional for M. Eng. Credit

In addition to implementing the above, we expect you to discuss the reliability and fault-tolerance aspects of your application (what can you guarantee about your implementation and why and what you know you can't guarantee, what are the vulnerabilities, and how one might avoid them). You don't need to implement anything. We also ask you to evaluate the performance and scalability of the system and identify the bottlenecks and apparent vulnerabilities. How many sensors, how many users, what data rates can the system support? You don't need to be comprehensive, but you should have some argument backed by data that you might extrapolate. For example, by running the system with 1, 2, 4, and 8 users, you might show that overhead grows or performance decreases in a certain way, and speculate when it might collapse or drop below some threshold. It doesn't matter for this project if the implementation is scalable, only that you can understand its limitations.

Details

We suggest that you build one of the following three example applications. Customizations are possible, and we are open to other ideas, but we would like to keep the list reasonably short, and if you want to deviate from what we propose, you should consult it with us.

A monitoring application for a data center.

Think of a large enterprise network with thousands of machines scattered across multiple office buildings. Companies invest a lot of resources into maintaining and monitoring their network and computing infrastructure, and the lack of customizable tools that can be easily adapted to support proprietary hardware is a headache. Your task will be to design a system that can collect information from a variety of customizable distributed agents, present it in the form of mash-ups that multiple users can access, and possibly modify, and perhaps allow the user to perform certain actions, such as running a script, changing a configuration setting, or deploying a file on one or more machines in the data center.

Internally, you application will involve several types of components:
- Agents or sensors that tap into the local resources on a machine on which they run, and either pump information they collect into a multicast channel, or invoke certain local actions based on requests received from multicast channels. For example, agents might leverage Windows Management Instrumentation (WMI) interface and performance counters, to read information such as processor usage, average network throughput or the number of transmission failures in the last minute, the temperature on a fan, or recent errors from the system event log, start a service, replace a library, or modify the local registry. You might also create agents that tap into databases or other local applications. Agents should be customizable: the system should not assume that it works only with certain five types of agents that have been hardcoded, but that the user can create and deploy new agents. Ideally, each agent would be a small live object.
- Multicast channels that carry information from agents to users.
- Objects that visualize the information obtained from agents, perhaps after processing it a bit. For example, one object might simply display a number, either in a numerical format or as some widget, while another might display a history of values over a period of time or a histogram etc. Ideally, this would include mash-up objects that can present information from multiple agents on what looks like a small webpage. For example, a page could show a graph showing database requests per second over the last hour alongside a list of events that need urgent attention, machines that are overheating or that have some key services down etc. The users should be able to modify existing mash-ups and save them for others to use. Every piece of data from agents and every mash-up should be viewable by multiple users at a time, and if it is customizable, it should correctly deal with situations where multiple users try to access and modify it.
- Means by which agent code or mash-ups that users created are stored somehow. At the very minimum, in some central repository, but we would encourage you to leverage the work you did for the previous assignment, store the mash-ups in channels, and allow the users to edit them concurrently. Note that ideally, you would need to ensure that mash-ups that are not being viewed by anyone are somehow "saved" before the last user disconnects from the channel and the data is lost. While implementing this is not required for this assignment, we'll value it extra if you do. To keep things simple, you can leverage web services to keep track of the clients who have a given document open and have a dedicated server connect to the channel where the document is stored before the last client closes the document. Other solutions are possible (and, like mentioned above, the lack of a solution is also acceptable). You can assume that the clients are generally well behaved and will not crash unexpectedly.
- Means by which users can discover what viewable information is out there and access it, and by which new mash-ups created by the user can be published and discovered by other users.
Although the "wow" factor counts in this project, we do not require fancy graphics, and there's no need to implement every possible sensor or deal with every possible problem that you might encounter when building the system. What matters the most is that the application has a good architecture and offers a degree of customizability (adding new types of sensors to a running system without recompiling the entire project, changing settings on existing sensor, editing mash-ups). This being said, if your system implements a virtual systems lab where the user can walk between virtual servers and virtual displays as in a role-playing game and look at the virtual sensors that represent information collected from the agents, it will count extra.
A distributed multiplayer role-playing game.

This version is similar to the last year's assignment, but since you will have more time, better tools, and an experience in using multicast, we expect you to implement something more sophisticated. The system should maintain multiple rooms, multiple avatars, and multiple objects, and it should allow for new rooms, users, and objects to be created. Also, it should not be limited to just a few predefined types of objects, but allow new types of objects to be created that can be used in a running game without the need to recompile it. For example, if a panel displaying a webcam video is not supported, one should be able to implement it, publish somehow, and let users place it in one of the rooms.

Your application might internally involve the following types of objects:
- Avatars the represent individual users. The state of the avatar might include its current position, appearance, or objects it is carrying, direction it is walking towards etc., and should be stored in a multicast channel, accessed by all users watching the avatar. A user controlling the avatar should tunnel all actions taken by the avatar through the channel, much in the way edits to a shared document were tunneled through it.
- Objects. Some of these might be static, but some would ideally have state, such as color, text displayed, or perhaps even be connected to a video stream.
- Rooms. These should be like mash-ups, in that they would contain links to avatars and users that are in them, as well as links to other rooms. When in a room, the user's avatar would thus connect to channels corresponding to other avatars and objects that reside in the room in order to display them. You might also want to publish on the room channel what the users have just said. If you want to be fancy, you might even publish and playback audio from the user's microphones, although we don't expect you to work this hard.
- Means by which information about rooms, users, and objects is stored. Same guidelines apply as in the previous example.
- Means by which one can discover, access, modify, and publish information related to the rooms, users, and objects in the game.
As you can see, as far as the internal architecture is concerned, this project is actually not very different from the previous one, and the same design guidelines apply. The main difference here is that you would spend more time dealing with graphics, and less with APIs for accessing databases or system resources. Still, we expect that your application provides a degree of customizability and allows users to modify the virtual world they're in.
A collaboratively-administered news portal.

This is basically a twist on the first project. Instead of sensors that read data from the system or databases, you would have "sensors" that might data from RSS feeds or other Internet sources, publish video frames from webcams, media files, or video streams obtained from elsewhere, and pump these into multicast channels. Everything else would be the same: users could create mash-ups that could resemble articles, or just collections of annotated content collected in one place, and publish those for others to edit. We would want a way to create new types of "sensors" that suck information from the Internet and publish it on the channels, and a way for users to find channels with new content and add them to their mash-ups, without recompiling the entire application.

What to hand in

As in the previous assignment.

How to turn in your assignment

As in the previous assignment.

How we'll grade the assignment

General guidelines are as in the previous assignment, but this assignment counts for more, because the project is more ambitious: 50% of the homework grade points will be based on assignment 3, with 25% each from assignments 1 and 2. Also, unlike in the previous assignments, every group will need to run a demo of your system in the CSUG. Professor Birman and the TAs will show up and will want to see your stuff in action. Be prepared to blow us away (good practice for impressing venture capital investors in June, once you’ve graduated from Cornell and are ready to start your company).

Hints

Nothing for now, check again later.