prog

REAL 5.0 Programmer's Manual S. Keshav Cornell University skeshav@cs.cornell.edu August 13^th 1997

Contents
1. Introduction
2. Tour of the files
3. Cookbook

1. Introduction

This manual provides a detailed description of REAL for users who wish to modify the code or add to it. Since REAL code (ignoring modified Nest code) spans some 15,000 lines of C, it is impossible to describe all of it in any detail. Instead, I will give a tour of some the files comprising REAL, and then some recipes for things like writing node functions and packet switches. However, you will need to read code to fully understand the system.

2. Tour of the files

The files in REAL can be divided into the following logical classes:

Node functions: These are the functions that execute protocols in nodes.
Queue management and routing: These manage buffers in nodes and gateways, and perform packet switching.
Support functions

2.1. Node Functions

Node functions implement computation at each of the nodes in the network. There are three types of node function: source, router and sink. I will discuss them in this order.

2.1.1. Source

There are thirty main source node functions, but all the flow-controlled ones implement variants of TCP-like transport layer functionality. A more detailed description of source functionality is found in the user manual. I will describe the code for generic() as an example of writing a node function with flow control.

The file starts with some standard headers. The function then declares some protocol variables (such as rtt, last_ack etc.).

/* Generic ftp source with flow control. */ 
#include "../kernel/real.h" 
#define MWS MAX_WINDOW_SIZE 
#define RECOVERY_STATE 1 /* the source is retransmitting lost packets 
*/ 
#define NORMAL_STATE 0 /* otherwise */ 
generic() 
/* no window adjustment or timeout deviation estimation */ 
{ 
/* Misc. variables */ 
PKT_PTR pkt; 
int num, node, line_busy = 0; 
ident destn, sender, sink; 
long key; 
timev now, gen, diff; 
float err, timeout; 
/* state variables */ 
timev time_sent[MWS]; 
float rtt; 
int last_sent,last_ack, seq_no, min_window; 
int ftp_state; 
/* either normal or recovery */ 
int recovery_can_send; 
/* flag used to allow only one pkt to be sent after a timeout */ 
/* once this happens, either the retx. pkt must timeout again, 
* or the retx. pkt should acked, which will reset the flag */ 
int timer_id = 1; 
int max_window;

The node starts by setting some standard variables such as `node_number', the simulation node number for that instance of the node function, and `sink' - the sink to which this node will send data. abs_advance() reschedules the source so that it starts simulation after a time specified by the start_time parameter in NetLanguage.
node_number = get_node_id ();
max_window = ftp_window;
sink = assigned_sink[node_number];
timeout = beta_rtt * rtt;
abs_advance (node_start_time[node_number]);

The node now does some initialization of its type and other state variables. ftp_state is either RECOVERY (while recovering from a timeout) or NORMAL (otherwise).

printf ("Generic source: %d ", get_node_id ()); 
printf ("sending to sink %d0, sink); 
source_node_type[node] = GENERIC_SOURCE; 
last_sent = last_ack = -1; 
rtt = 6.0 * scale_factor; 
seq_no = 0; 
alpha_rtt_ftp = (float) (1 / (float) 10.0); 
recovery_can_send = 0; 
ftp_state = NORMAL_STATE; 
/* pretend that you got an INT pkt, go to test */ 
goto test;

The label `recv:' is the point at which packets are received by the node. This is done by the `recvm()' function. Now, the packet is processed according to its type. (This processing is essentially a state machine. It is a good idea to first draw out the state diagram, and then write code around it, which is what I did.)

recv: 
sender = recvm (&destn, &key, &pkt); 
switch (pkt-type) 
/* use packet type to decide what to do */ 
{

When an acknowledgment is seen (type ACK), we first check if the ack arrived when the node was recovering from a timeout. If it is an ack for the retransmitted packet, the mode is reset to NORMAL. We discard ack duplications (if its sequence number < last_ack). Since acks can be aggregated, all the packets that are acknowledged by the ack have to be ticked off, and their contribution to the round trip

time marked. make_flt_plot(), described in the users manual, generates data for plotting. These RTT estimates are inaccurate, but generic TCP doesn't bother about that.

case ACK: 
{ 
timev now, diff; 
float tao; 
int i, last; 
now = runtime (); 

if (ftp_state is RECOVERY_STATE and pkt-seq_no is last_sent) 
        ftp_state = NORMAL_STATE; 
/* last_ack is the highest seq_no that has been acked */ 
if (pkt-seq_no last_ack) 
{ 
        /* ack not seen before */ 
        last = last_ack; 
        /* it might be an ack for more than one packet */ 
        for (i = 1; i <= pkt-seq_no - last; i++) 
        { 
                diff = time_minus (now, time_sent[(last_ack + i) % MWS]); 
                /* actual round trip time */ 
                tao = make_float (diff); 
                /* estimated round trip time */ 
                rtt = rtt + alpha_rtt_ftp * (tao - rtt); 
                /* plot data */ 
                make_flt_plot ("tao", tao); 
                make_flt_plot ("rtt", rtt); 
                make_entry (tao, &rt_time[node_number]); 
        } 
        /* update state */ 
        last_ack += (pkt-seq_no - last); 
        recovery_can_send = 1; 
} 
/* get rid of the ack packet */ 
free (pkt);

This ack packet may be an ack for a packet that timed out earlier and has been enqueued waiting to be sent out. If so, this packet has to be removed from the output queue.

/* clean up queue -- i.e. remove all acked packets */ 
num = num_in_q (node_number); 
while (num-- 0) 
{ 
        pkt = deq (node_number); 
        if (pkt-seq_no last_ack) 
                enq (node_number, pkt); 
        else 
                free (pkt); 
} 
goto test; 
}

INT packets are generated from the trunk, and are used to indicate that the data has been sent out from the output line.

case INT: 
/* line is now free */ 
line_busy = 0; 
free (pkt); 
goto test;

Nest does not provide timers. In order to implement a timeout, REAL sends out a timer packet from a source back to itself to return after some specified time. The problem with this is that once a timer has been set, it cannot be reset. Thus, we have to do extra work to detect spurious timeout and ignore them. Basically, we would like to emulate a system that has a single resettable timeout clock with a number of non-resettable TIMEOUT packets. If a retransmission timeout is received, we check if it is a valid timeout. First, the timed out packet must not already have been acked. Second, we stamp each TIMEOUT packet with a unique id, and we check that the timeout packet has an id that matches the current timer_id. If the timeout passes these tests, it is enqueued for future retransmission whenever possible.

case TIMEOUT: 
/* may have been acked */ 
if (timer_id is pkt-id and pkt-seq_no last_ack) 
{ 
        /* resend */ 
        ftp_state = RECOVERY_STATE; 
        recovery_can_send = 1; 
        flush_q (node_number); 
        pkt-type = FTP; 
        pkt-seq_no = last_ack + 1; 
        pkt-timeout = beta_rtt * rtt; 
        pkt-gen_time = runtime (); 
        num_retransmissions[node_number]++; 
        enq (node_number, pkt); 
} else 
        /* dump pkt */ 
free (pkt); 
goto test;

In the `test' state we check if a packet can be sent out on the output line. First, we check if the output queue already has some packet to send (a retransmission). If so, it gets priority, and is sent. If not, a new packet is created (since a FTP source is always ready to send) and enqueued.

test: 
if (ftp_state is NORMAL_STATE and seq_no <= last_ack + max_window) 
{ 
        num = num_in_q (node_number); 
        if (num is 0) 
        { 
                make_pkt(pkt); 
                pkt-gen_time = runtime (); 
                pkt-seq_no = (seq_no)++; 
                pkt-resent = 0; 
                enq (node, pkt); 
        } 
}

Next, we pull out whatever is there in the output queue and send it. We must ensure, however, that the output line is free, and that the flow control window allows the packet to be sent. A call to safe_send() initiates packet delivery. This routine generates timeout packets and makes sure that timeouts are generated in nondecreasing order in virtual time.

while (--num = 0) 
{ 
        pkt = deq (node_number); 
        if ((ftp_state is NORMAL_STATE and 
        pkt-seq_no <= last_ack + max_window) 
        or (ftp_state is RECOVERY_STATE and recovery_can_send) 
        and ! line_busy and pkt isnt NULL) 
        { 
                line_busy = 1; 
                time_sent[pkt-seq_no % MWS] = runtime (); 
                pkt-id = ++timer_id; 
                safe_send (pkt, pkt-timeout, 0, &last_sent); 
                recovery_can_send = 0; 
                break; 
        } else if (pkt != NULL) 
                enq (node_number, pkt); 
        } 
goto recv; 
} 
}

Controlled rate and poisson sources use a state diagram that is very similar. The only addition is the use of a `TICK' packet to initiate a rate-control timer. Read poisson.c and then controlled_rate.c for understanding the basic structure of a rate controlled source.

2.1.2. Router

The router code is fairly complex since it implements several scheduling disciplines. Here, I will explain the processing that happens if the discipline is first come first served. The other cases are extensions of this simple case.

The code opens with some standard inclusions and parameter definitions that I skip. The main code starts in the for(ever)loop. We get a packet from some input line, and set some local variables. conv_id is the conversation id for that source-destination pair. Queueing is done per conv_id. qnum is the number of the output line to which data will be sent. Note that there could be many queues, one per conv_id, at each output line. This is not really needed for a FCFS router, but allows implementations of Fair Queueing and fcfs scheduling to share code.

for (ever) 
{ 
        /* get a packet */ 
        sendr = recvm (&destn, &key, &pkt); 
        if (pkt is NULL) 
        pr_error ("received a NULL pkt in router !"); 
        so = pkt-source; 
        final_dest = pkt-dest; 
        if ((final_dest / ABSOLUTE_MAX_NODES) is realnum) /*local*/ 
                dest = route[node][pkt-dest % ABSOLUTE_MAX_NODES]; 
        else 
                dest = route[node][cc_router]; 
        /* dest is the actual next node - not the eventual destn */ 
        conv_id = hash (node, so, final_dest); 
        qnum = q_number[dest]; 
        q_num_of_conv[node][conv_id] = qnum; 
        /* qnum is the output line on which to send */ 
        now = runtime (); 
pkt-arr_time = now;

We now test the packet type. If the packet's conv_id is not yet known, it is entered in a list of conv_ids (which is later scanned to select a packet to send). If the output line is busy, then the packet has to be buffered. If there are buffers, the packet is put in one, else it is dropped.

switch (pkt-type) 
{ 
case TELNET: 
case FTP: 
case ACK: 
case SETUP: 
if (!check_conv (conv_id, qnum)) 
        add_conv (conv_id, qnum); 
if (busy[node][qnum]) 
/* line to dest is busy , so buffer */ 
{ 
        if (all_op_q_full (qnum)) 
        { 
                if (decongest_mechanism is 2) /* random */ 
                { 
                        decongest_random (qnum); 
                        if (put_in_op_q (conv_id, pkt) == ERROR) 
                                pr_error ("Output buffer full - CONGESTION "); 
                        } else 
                        { 
                                /* drop */ 
                                make_entry (1.0, &so_tossed[so % ABSOLUTE_MAX_NODES]); 
                                free (pkt); 
                        } 
                } else 
                        { /* some place still left in the buffer */ 
                        if (put_in_op_q (conv_id, pkt) == ERROR) 
                                pr_error ("Output buffer full - CONGESTION "); 
                        } /* end of else */ 
                        break; 
                }
        }

If the line is not busy, the packet is simply forwarded to the appropriate destination, and statistics are collected for it.

} else 
/* not busy */ 
{               
        busy[node][qnum] = 1; 
        /* collect stats - but not for ACK ... */ 
        if (pkt-type isnt ACK) 
                make_time_entry (time_minus (now, pkt-arr_time, &qing_delay[node][pkt-source % ABSOLUTE_MAX_NODES]);
        if (!sendm (dest, 0, pkt)) 
                pr_error ("sendm failed in router"); 
}

If we get an INT packet, the output line is now free. We check if there is a queued packet to be sent. If so, we send it, else we just unset the busy flag.

case INT: 
free (pkt); 
if (num_buffers_allocated[node][qnum] isnt 0) 
{ 
        pkt = get_fcfs_pkt (qnum); 
        /* reset useful values */ 
        so = pkt-source; 
        final_dest = pkt-dest; 
        if ((final_dest / ABSOLUTE_MAX_NODES) is realnum) /* local */ 
                dest = route[node][pkt-dest % ABSOLUTE_MAX_NODES]; 
        else 
                dest = route[node][cc_router]; 
        conv_id = hash (node, so, final_dest); 
        qnum = q_number[dest]; 
        if (pkt-type isnt ACK) 
                make_time_entry (time_minus (now, pkt-arr_time), 
                        &qing_delay[node][pkt-source % ABSOLUTE_MAX_NODES]); 
        /* send out a packet */ 
        busy[node][qnum] = 1; 
        if (num_in_op_q[node][conv_id] is 0) 
                delete_conv (conv_id, qnum); 
        if (!sendm (dest, 0, pkt)) 
                pr_error ("sendm failed in router"); 
} else 
{ 
        /* idle: nothing in the queue to be sent */ 
        busy[node][qnum] = 0; 
} 
break;

The other disciplines add various things, such as call setup and teardown, and resource reservation. However, the basic structure of the code is the same as that of fcfs_router.

2.1.3. Sink

The sink node function is a universal receiver. It acknowledges every packet it receives with an ack packet that contains the sequence number of the last in-sequence packet it has seen. If the packet is from a guaranteed service source, however (and the pkt-gs bit is set), it merely frees the packet. The sink uses a small trick - the packet it receives is converted into an acknowledgement merely by changing its parameters.

2.2. Queue Management and Routing

The queue management functions are written in an object oriented and layered style. The queue objects are manipulated by a small set of functions. Each layer provides services that the layer above uses to provide its own services.

The data structures for the queues are in queues.c. Packets are buffered in a per-conversation linked list and are accessed by two pointers: one points to the packet at the head of the queue, and another points to the tail. Each packet has a field that points to the next packet in the queue.

Conversations are indexed by a conversation identifier (CID) that uniquely identifies each source-destination pair. The CID is equivalent to a virtual circuit number, and is used to manage resources that are allocated to that conversation. The packets queued for a conversation with CID = C are pointed to at the head by conv_table[C], and at the tail by conv_table_tail[C]. q_num_of_conv[C] is the number of the output line on which C sends packets.

Given the number of an output line, we need to keep track of the set of conversations that are using it. This is managed by a linked list of conversation structures that contain the CIDs of the conversations on that output line. q_table[Q] points to the head of the linked list of conver- sations that share output line Q.

The state of the queues is summarized in two arrays - num_in_op_q[C] is the number of packets in conversation C's queue. num_buffers_allocated[Q] is the total number of buffers used by output line Q. The functions defined in queues.c allow access to ele- ments of queues, and to the queue state. init_q() initial- izes queue state. all_op_q_full() and op_q_mt() allow read access to the queue state. put_in_op_q(C) places a packet at the tail of conversation C's queue. get_from_op_q() gets packets from the head and removes them from the queue. dequeue(p) removes a packet pointed to by p from its conver- sation queue without deleting it. get_num_in_op_q() returns the number of packets in a conversation's queue. fc_fs() returns a pointer to the the packet at the head of a queue (if this packet has to be removed, it must be explicitly dequeue). get_num_active() returns the number of conversa- tions on an output line that have packets queued to be sent. dump_q(C) prints out the packets queued in the queue for conversation with CID C.

The next layer handles decongestion (packet dropping) using the functions defined above. decongest_first(Q) drops a packet from the head of the queue of the conversation that has the most packets queued up in output line Q. decongest_last(Q) acts similarly, but drops a packet from the tail of the longest queue. decongest_random(Q) drops a packet chosen at random from the set of all packets queued at that output line.

Conversation management is done using three functions add_conv(), delete_conv() and check_conv(). check_conv(Q) checks if a given CID is present in the list of conversa- tions using output line Q.

Scheduling policy management is done in layer 3 of the queueing hierarchy. get_fcfs_pkt(Q) returns a pointer to the packet that arrived first at output line Q and at the same time it dequeues this packet. Thus, to implement FCFS scheduling, the router has to make a call get_fcfs_pkt(Q), and the appropriate packet will be presented to it.

get_min_bid(Q) returns the packet that has the least bid amongst all the packets queued on output line Q. find_min_finish(Q) returns the minimum finish number amongst all the packets queued on output line Q. It is used to recompute the round number in Fair Queueing.

hrr(Q) returns a packet that is selected using the HRR scheduling discipline, implemented in hrr.c. Since HRR is non-work-conserving, if the line has to be kept idle, hrr will return a NULL_PKT type packet instead. This packet is intercepted by the channel function and is used to generate an INT packet for the next scheduling slot.

The top layer of queueing deals with packet scheduling and is done in scheduler.c. The function select_from_op_q() looks at the policy that has been selected for the simula- tion and calls get_fcfs_pkt() or get_min_bid() accordingly. The functions new_arrival() and remove_old_conv() do some processing needed for Fair Queueing.

The entire queueing hierarchy exports a single func- tion: select_from_op_q(), which is called by the router. Once the policy has been specified, a call to this function automagically returns with the appropriate packet.

node_queue.c provides queue management for retransmit- ted packets in a node. The interface is through the func- tions enq(), deq() and num_in_q() which place a packet at the tail of its queue, remove it from the head, and tell the queue length.

2.3. NetLanguage generation

NetLanguage has been extensively described in the user manual. Here, I will describe how to add features to Net- Language, and a little bit about automatic NetLanguage generation.

NetLanguage is implemented using lex and yacc, and I expect that you know how to use these tools. The files for the language are in sim/lang. lang.lex contains definitions for creating a lexical analyzer using lex. The definitions are pretty straightforward. lang.yacc is the yacc input file. Note the complications introduced by linked lists: one needs to differentiate between the first function in a list and the rest. The actions are C code that create the graph structure.

The parsing of NetLanguage poses an interesting problem: the simulator reads function names, but needs their addresses. Hence, it needs to access the symbol table and read off the function names to get addresses. I invoke the `nm' system command to extract these names. The result is filtered through an awk script (nm.awk) and read into an internal symbol table organized as a hash table keyed on function names. This is then used to translate from function names to addresses. symtab.c contains functions to create a symbol table and access it.

2.4. Miscellaneous

channels.c This file contains a functions called tx_chan(). tx_chan() is used to add transmission delay to a packet. It looks at a packet size and the transmission line speed to determine the transmission delay for a packet. It then looks at line characteristics and adds latency delay. It also sends INT packets back to the sender to tell it that the line is now free.

config.h This file contains configuration parameters and other useful definitions. If you want to do large simulations, you must make sure that you do not exceed the limits defined in config.h. If you do, reset these values and recompile the binary.

sim.c This has main() for the simulator. It reads in command line arguments, and then calls simulate(), never to return. The file declares globals and simulation parameters that are externed to the other files through parameters.i. sim.c #includes y.tab.c, which is a C file created by pro- cessing a yacc description.

hash.c This file contains generic hash table functions to translate from a source destination pair to a conversation ID. It is also used by plotting.c The file exports two functions: hash() and invert().

init.c Some initializations of queues, tables and other simulation variables.

monitor.c This file contains custom_monitor. Since the monitor is called at every pass and can access all global variables, it is used to do things like printing time ticks, printing simulation reports etc. The function also calls coord() to coordinate with other REALs to do distributed simulation. Here is a guided tour through the file.

After includes and declarations, the monitor checks to see if this is the first time it is executing. If so, it does some initializations and prints out the simulation parameters using dump_parameters(). At this point, if distributed simulation is enabled, coord() is called.

try_to_dump() tests if the time now is a multiple of the control parameter DUMP_INTERVAL (defined in config.h). If it is, a simulation report is printed out.

set_assigned_sinks(), set_plot_options() and set_line_speeds() set these values from the parameters received over the Nest socket as part of the graph struc- ture.

plotting.c This file contains two functions make_plot() and make_flt_plot() that are described in the user manual. These create a file name for the plot file, then look up a hash table to convert them to a small (<256) index. The index, time now, and data value are appended to a 'smart buffer'. When the smart buffer gets full, it empties itself into a file descriptor that is either a socket or a pointer to the 'plot' file in the result directory.

routing.c The function route_graph(g) in this file takes a pointer to a graph structure, and figures out the shortest hop-count routes between all pairs of nodes. This implements Dijkstra's all pairs shortest path algorithm, and is based on pseudocode in Bertsekas' networks book. The routing is static and centralized..

switches.c This file collects debug flags defined in many different files. The idea is that setting a flag requires recompiling a small file, and a relink, instead of recompiling a large file. Also, it is easier to set a number of flags all at once.

table.c This contains the functions necessary to manipulate and print out a report of a simulation. It exports two functions - make_entry(t) and make_time_entry(t) that stores floats and timevals respectively into the table that t points to. Tables are stored with fields for mean, number of entries, global minimum, global maximum and for variance computation. The mean stores the sum of all the entries and when it has to be printed out, this is divided by the number of entries. Global min and max are adjusted as data values are entered. The computation of the variance is explained in the user manual. One major problem is that the formatting is very sensitive to changes. If you change any fields, good luck!

exptest.c This is used to check if the random number generator on the machine you are using is up to par or not. It just builds up a histogram of exponentially distributed values, and prints it out. You can eyeball this using graph(1) or run it past your favorite randomness tester to be sure that you are not let down by the simulator's random number generator. On a Version 9 Unix, use `grap |pic|troff|lp', on 4.3BSD use `graph -g1 -l |lpr -g'.

Makefile The makefile is quite standard. Define CFLAGS, LDFLAGS and LINTFLAGS for cc, ld and lint options. `make final' will get you the binary for the simulator. clean: will remove droppings created by plotting. real_master: will create the real_master.

3. Cookbook

This section of the manual is for the not-so- adventurous programmer who wants to make modifications to the system without having to figure out how the whole thing works. To this end, here are some recipes for modifying things like NetLanguage. By necessity, this section is incomplete and suggestions are welcome.

3.1. Modifying config.h

config.h has declarations that are used to size the simulator. MAX_NODES is the number of nodes in the current simulation. ABSOLUTE_MAX_NODES is the nearest power of 10 larger than the maximum number of nodes you intend to have in your simulation. No gateway should have more than MAX_FAN_OUT outgoing trunks. MAX_CONVERSATIONS is the largest possible number of conversations in a simulation. MAX_WINDOW_SIZE is the largest possible window size.

DUMP_FILE is the name of the report file.

3.2. Modifying NetLanguage

If you want to add a new parameter to the language, or to change it, you will need to make the following changes: First, change lang.lex to recognize any new tokens. Modify lang.yacc to add the new token, the new production and the semantic actions necessary for that production. Change lang.inp to add the new construct to the sample language input. (This will allow you to check your work.) Now, type in `yacc ../lang/lang.yacc' in sim/sim. If everything works correctly, the y.tab.c file should be created. Recreate the binary file, modifying the makefile if necessary, and you are ready to run.

Here is an example of how this works. Suppose that you want to add a new parameter to the simulator, and want this parameter to be described in NetLanguage. I will assume that the parameter is a global that is declared in sim.c. The place to add this parameter is probably the real_parameters section of NetLanguage. Let us suppose that the parameter is called my_param.

First, you should decide the syntax the language addition should have. In this case, pattern matching with other parameter declarations, an appropriate syntax would be

my_param = 5.5;

Next, let us modify lang.lex to recognize the token my_param. Simply add the rule

my_param {return MY_PARAM ;}

to this file, along with the rules for the other parameters. (5.5 is a floating point number and will be recognized as F_NUMBER.)

We are now ready to modify lang.yacc. Note that real_params are declared in a paramlist. Modify the `param' non-terminal to add `my_param' to the list. Now, add the rule for `my_param'. Pattern matching from the rules already there, this should be

my_param : MY_PARAM '=' F_NUMBER

What about semantic actions ? We know that sim.c declares a global called `my_param', so the action is just

my_param = extract_float($3);

Finally, declare MY_PARAM in the %token list at the head of lang.yacc. You are done with modification to NetLanguage.

Now, change lang.inp to add your parameter to it. The y.tab.c file produced should be correctly created, and should link in with sim.c in sim/sim correctly.

3.3. Modifying statistics collection

Statistics are collected in TABLE data structures defined in table.c The functions make_entry() and make_time_entry() will allow you to add data to the tables. Every print_interval, the table values will be printed and the entries reset (except min and max). If you want to add more statistics collection, define a new TABLE in table.c. Modify the source code to collect statistics at the appro- priate time. Then, change print_report() to print out the value, and print_summary() to print out confidence intervals. You will probably need to reformat the output, and this is a messy job.