Project Argus

Project Argus is a continuation of project Octopus developed at Cornell in spring 1998. Our aim is to speed up the discovery, extend its scope and improve adaptability, integration and extensibility of the existing algorithms. We will develop our tools on Solaris and Linux. Output will be accessible via the web. You can download an alpha release of our toolkit.

Structure of the tools

The architecture is organized on two levels: local domain and backbone. The local domain manager (shortly brain) has agents called "eyes" in the network. Their job is to collect various types of data on behalf of the brain. Reasons for having many eyes in the network are: the possibility of monitoring links that are not visible from a certain point and access to information from routers that restrict SNMP to certain trusted hosts. In the most simple setup there is a single eye located on the same computer as the brain. The brain maintains two databases of topological information: a full one about the local domain and a partial one about the backbone. The backbone topology analyzer (shortly mastermind) asks the managers of various domains for partial backbone data and correlates the various views.

Output

The desired result of the discovery is composed of the following pieces of information:
  1. The graph of IP-level topology of the network
  2. Characteristics of selected links
  3. Continuous monitoring of selected links and routers
  4. Information about the services running on servers
  5. A history of the changes in the topology of the network
  6. Geographical information about some network elements
  7. Hierarchical representation of the network
  8. Topologies as viewed by routing protocols
  9. Statistical/visual information about traffic, delay and stability of selected links, routers and paths
  10. Correlation of the collected data with daily, weekly and other cycles and time series

Input

A strong point of the toolkit is that it is capable of using multiple sources of information adapting itself to the sources available in the caseat hand. These sources are:
  1. Efficient ping
  2. Efficient traceroute
  3. DNS (single lookups and listings)
  4. Specialized service detectors and monitors (e.g. for HTTP, FTP, SMTP, DNS)
  5. Pathchar
  6. NIC web pages
  7. Router names

Visualization

All the collected data should be viewable as HTML files, static pictures and Java applets or applications. Zooming in and out should follow the levels of the hierarchy we define. Real time notification of the GUI of changes in the network would be a useful feature. The characteristics of individual links or routers should be click-accessible from the graph view of the network.

Internal organization issues

The database should hold not only individual pieces of information but also timestamps for them. These would contain the creation time of the item and the last confirmation of its existence by all the methods that have been used to confirm it. When an item gets removed we place it into the history log before deleting it from the database, so that older topologies of the net can be reconstructed. There will be separate logs for the values whose time behavior we analyze.

The protocols used for communication between components have to be specified yet. Their overhead should be low. Communication between the eyes and the brain should be fast. We have to also take into account the network partitions that separate them from each other. There is  a single brain associated with each eye. In future extensions we could have multiple brains associated with an eye. This protocol should give the brain the possibility to ask about the capabilities of the concrete eye. The communication might include timestamps and messages can be prioritized. The communication between the mastermind and the various brains  happens over WANs, so partitions might happen here too. Compressing the transmitted data might be useful. A brain should be able to communicate with multiple masterminds. The protocol used between the brain and the GUI and the mastermind and the GUI can be the same. It might amount to generating a representation of selected parts of the respective databases into an easily parsable form. We have to be able to establish all the connections in both directions because of possible firewalls. Although using an existing protocol would be good, none of them seems to satisfy our needs.

The brain would normally ask the eyes to provide certain pieces of information. But it can also ask them to continuously monitor certain network elements. In this case the eye can have the initiative of sending information to the brain. In case of temporary separation of the eye from the brain it should be able to store locally (in memory on on hard disk) the messages preferring to throw away the less important ones if space runs out.

There can be many definitions of what a domain means. At the highest level of the hierarchy there will be groups of ASes. A possible criterion to group them is to lump together all small ASes that use the same big AS as main connection to rest of the Internet. Manual configuration might be needed. At the sub-AS level OSPF areas and domain names are natural boundaries for grouping routers together. The best way to represent all these groupings is to be determined based on practical experience.

Adaptability, extensibility, configuration

The tools adapt to various conditions by always choosing the best algorithms for the task at hand. Various parameters of the algorithms will be set (e.g. expected tim e for getting an answer to a ping) to reflect the current network. By having multiple sources of data inconsistencies will appear (e.g. the ARP table of a router indicates the existence of a computer, but it doesn't answer ping). We can deal with these by assigning preferences to data sources. In some cases the best solution might be to use the intersection of the data provided by different sources, while in other cases their union might be what we want. Unresolved inconsistencies will be gathered into a log for further analysis by the program or by the user. This mechanism can be used for determining the topology of the network when we have previous data. Detailed discovery will happen only where the topology changed.

The "main loop" of the brain can be written in two ways: the sequential algorithm and the multithreaded one. The first one is simpler while the second might improve efficiency when we are working with the backbone.

Sequential:
while(HaveTasksLeft())
    {crt=ChooseTask();
    algs=ChooseBestAlgorithms();
    foreach alg in algs
        {Execute(alg,crt->params);}
    }
Multithreaded:
//thread starter loop
while(HaveTaskToDo() or HaveThreadRunning()) {
    crt=ChooseTask();
    algs=ChooseBestAlgorithms();
    do_in_parallel alg in algs
        {Execute(alg,crt->params);}
}
//answer dispatcher thread
while(1) {
    event=GetNextAnswer();
    thread=Destination(event);
    HandOver(thread,event);
}

The definition of the granularities of the tasks will result from experimenting with various choices. For being able to choose the best algorithms to perform a certain task we have to evaluate their cost and the usefulness of their results. This choice will be influenced by the preferences expressed by the user (e.g. looking for a fast discovery or an exact topology). This structure makes adding extensions (new algorithms) to the toolkit easy.

The configuration of the eyes should specify what operations they can perform, where the brain is and limits on the memory and disk space they are allowed to use. Adding SNMP community strings to this configuration file (instead of keeping them at the master) might improve security. The configuration of the brain should include data about the location of the eyes and about the level of the services it offers to the mastermind(s).