Argus - an extensible toolkit for efficient and adaptable network discovery
and monitoring
There has been a significant amount of work on network topology discovery
and visualization and network monitoring. This work can be divided into
two distinct parts: work centered on the Internet and work centered on
the enterprise network.
Existing tools for Internet discovery and monitoring
The work in the this category has been done mainly by academics and small
companies. Because most of the Internet is not owned by the one analyzing
it, the tools only observe the network. Discovery and monitoring is done
usually by sending probes into the network (e.g. ping, traceroute). The
problems addressed are usually collecting the data, visualization and efficient
correlation of the data gathered by different points. Most of what I mention
here is work in progress. There are few articles published on this topic.
Felix
The Felix project of Bellcore started
in September 1997. Their work focuses on 5 main components:
-
Network station - UNIX workstations to be used as monitors,
-
Monitor Data Exchange Protocol - the protocol which the monitors use to
communicate,
-
Performance database - measurements recorded by each monitor for computation
of topology and network health displays,
-
Web based GUI - for user to access and control the database,
-
Linear Decomposition Algorithms (LDA) - for topology discovery and performance
evaluation of specific network elements.
Because they measure delays and other parameters of the network by looking
at packets circulating between their own monitors, they can only cover
a very small part of the Internet. The focus of their approach is
on developing some linear decomposition algorithms that allow fast
processing of the data collected into a database, but there are no results
publicly available.
CAIDA
Caida is an organization that focuses
on Internet topology discovery and monitoring. The tools they developed
are:
-
Mapnet - Java applet that
does geographically based macroscopic Internet infrastructure visualization.
The data is introduced manually.
-
Otter - Java based general
purpose topology visualization tool. I does not structure the network hierarchically,
therefore it can be used only for viewing networks with a small number
of nodes.
-
Skitter - is their
discovery tool which base on traceroute and ping. Based on active probing
of the network, it discovers network connectivity (topology), measures
round trip time and record dynamic changes of topologies.
For backbone topology visualization they use some graph
layout code written by Bill Cheswick from Bell Labs and Hal Burch from
CMU. The tool is based on an annealing algorithm. For a tree with 80,000
nodes, a typical layout run takes 24 CPU hours on a 400 MHz Pentium. It
is non geographical and they plan to develop a 3D representation.
MIDS
Matrix Information and Directory Services
is a company that claims to be the oldest in Internet analysis. They have
products that provide visualization of the Internet geography :
-
MatrixIQ - a program which produces a comprehensive consistent view of
the Internet and specific ISPs. They do not sell the program, nor do they
have intention to open the source code.
-
Internet Weather Report - a free sampler of some partial views of the latency
statistics in geographical form. Available online for free.
-
Matrix Maps Quarterly - a research publication which contains maps of the
Internet worldwide by region. $200 per back issues (online).
They use beacons doing active monitoring. The information they try to collect
is on a very high level on the scale of countries. They are mostly concerned
with the backbone connectivity and its correspondence to geographical locations,
as well as the related statistics such as latencies. Based on these information,
they can evaluate ISPs by latency, packet loss, overall throughput, reliability
and speed of repair. The process by which they generate their data is proprietary.
Octopus
This is a set of tools implementing heuristics for network discovery. It
was developed at Cornell in 1998 and my work will be based on these tools.
A detailed description of this project can be found in this paper.
Enterprise level discovery and monitoring
The work in this category is done mainly by large software companies. Their
products are expensive and try to address the problem network management.
Discovery and monitoring is only a small part of their integrated solutions.
They control completely the devices from the network by using SNMP. I never
used any of their tools but the opinion of the ones who did is that they
don't work where SNMP is not deployed and are not suited for discovery
outside their own network. The significant current products addressing
this problem are:
Description of Argus project
I work on this project together with two undergraduates, Walter Chang and
Haye Chen, under the supervision of S. Keshav. There were dramatic improvements
in performance (from 1000 to 10 minutes for discovering Cornell) by modifications
to ping and traceroute done by Haye and Walter. The purpose of this work
is to further improve the existing implementation (Octopus) and add new
functionality. An important point is that we will be actively distributing
our toolkit for being able to measure its behavior in other settings too.
The work can be conceptually divided in two parts: local domain discovery
and backbone discovery, but many of the algorithms and methods will be
common.
In order to achieve these goals we are currently implementing an easily
extensible framework in which the current algorithms will be integrated
(now they are stand-alone scripts). This will also give us the benefit
of being able to use the various algorithms interchangeably choosing the
one best suited for the current task. Another major aim is automating some
tasks that were done manually up to now. An important problem that has
to be addressed by the algorithms is and that will get much attention in
the framework is that they have to deal with inconsistent data. The reason
for having inconsistent data are: uncorrected sources of information, fast
change in the network, the heuristic nature of many of our algorithms.
Besides the usual metrics for algorithms (memory usage and time) there
are some very important ones that can be used to describe topology discovery
and monitoring algorithms: accuracy of the output (topology), completeness
(percentage of the target domain that was discovered), amount of network
traffic generated during discovery, adaptability of the algorithm to various
network conditions and topologies.
Improvements will be achieved by the following means:
-
multi-threading allowing for parallelization of jobs not very closely related
-
new structure of the topological database allowing faster access and ease
in adding extensions
-
fine-tuning of the existing algorithms and heuristics based on collected
run traces
-
intelligent mechanisms for choosing the best algorithm for the current
task based on collected run traces
-
multiple probe points for improving accuracy and coverage
-
new, better algorithms and heuristics
We will also add new functionality to the toolkit: tracking the history
of the changes in topology, continuous monitoring of selected network elements
and possibly a better visualization tool. We have a description
of the current status of the project with more details.
Research question
The research question I want to address in the paper is the design of the
framework. I will use performance measurements to determine the relative
merits of various algorithms and features of the framework. The main metrics
for evaluating the time to completion, amount of traffic generated, accuracy
and completeness.