Component 1 - Final Report

Date: December 16, 1998
Name of group: Phonedation
Team members: Vincent Ng, Kiri Wagstaff, Wes Weimer, Yu Zhang
Task: Data Exchange : Final Report

Section 1. What we set out to do

1.1 An overview of our goals

We set out to provide a high-level interface for sending and receiving audio information over lossy and variable computer networks. In laymen's terms, we set out to provide the functionality of a telephone connection, using computers with speakers and microphones. The true challenge involved is mimicing the familiar perfect reliability and low latency associated with the telephone network -- over a computer network that only provides best-effort service. Since the realtime nature of our application precludes us from using standard network protocols based on retransmission of lost packets, we researched and implemented schemes for adapting to packet loss and variable network characteristics on the receiver and sender sides separately.

Fundamental responsibilities of our component included selecting a digital audio format, sampling audio information from sources such as files and microphones, and playing out received audio information to files and speakers. Additionally, we needed to implement UDP-based connections, high-precision clocks, and a friendly, non-blocking interface that high-level applications could use. In a nutshell, we set out to create a component that would sample audio information periodically from an input source, packetize it, send it quickly over the network, reconstruct it quickly on the other side, and play it out in order with tolerable latency.

1.2 Challenge 1: Unreliable networks

Since packets transmitted over a computer network do not always arrive in order or evenly spaced, it is necessary for the receiver to buffer received packets for a certain amount of time before playing them out in sequence. This allows packets that arrive slightly late, or out of order, to be received in time for playout -- thus creating the illusion of an in-order transmission at the cost of added latency. A larger buffering delay allows for more truant packets to be played out (rather than dropped because they arrived too late). Unfortunately, latencies beyond 200 milliseconds are intolerable to the human ear. Thus the mathematically ideal solution, which is an infinite buffering delay, cannot be used. If the exact transmission characteristics of the network were known, then it would be possible to calculate a playout delay such that any desired percentage of the packets could be played out. However, we are unable to anticipate the future behavior of the network, so we instead model it based on the past. We set out, then, to solve this problem using an adaptive playout algorithm.

1.3 Challenge 2: Packet loss

Additionally, computer networks are plagued by packet loss. Realtime applications like ours lack the luxury of retransmitting lost packets. Since the human ear is very adept at reconstructing speech from garbled or distorted input, it is often desirable to include redundant copies of previous samples in each outgoing packet. If network loss characteristics are characterized by bursts, then the receiver can use this redundant information to reconstruct what would have been contained in the lost packets. Unfortunately, this additional redundant information creates a greater load on the network, which may in turn cause even greater packet loss. Care must then be taken to ensure that these secondary samples are small enough so that they add minimal overhead and that they can truly be used by the receiver. We planned to solve this problem by researching a number of audio compression methods for use in the secondary samples and by researching a method for adaptively determining the characteristics of the included redundant information. When we embarked on this project, we were unable to locate any papers written on combining methods for dealing with variable packet delay and packet loss, or on strictly adaptive methods for dealing with packet loss.

1.4 Division of labor

Since our team had four members, we had planned to divide the work of creating the low-level transport layer, implementing adaptive algorithms, and studying and analyzing measurements of actual network characteristics equally among us.

Section 2. What we accomplished

2.1 Adaptive playout

Our solution to the problem of variable transmission times was to research and implement an adaptive playout algorithm. The object of such an algorithm is to compute the optimal receiver-side buffering delay based on realtime measurements of network loss and latency. Since the receiver and the sender do not have enough time to communicate, our adaptive algorithm is implemented on the receiver side only, and is based on timestamp and sequence number differences from the packets that actually arrive. Our adaptive playout algorithm is an extensive modification of Algorithm 2 presented in "Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms" by Moon, Kurose and Towsley in 1998.

The algorithm chooses a buffering delay based on a fixed safety factor and a measure of the network variance. It does not assumed synchronized clocks, but does assume that the clocks do not drift with respect to each other. It then keeps an average value of the sender timestamp in the packet minus the time the receiver received the packet. If there were no variance in the network, this number would never change. We then compute the variance by comparing the measured value for each incoming packet with an exponentially decaying average of all previously measured values. The buffering delay scales linearly with a multiple of our measurement of the variance. Adjusting the buffering delay on a per-packet basis would lead to very jittery playout, so instead we adjust it on a per-talkspurt basis, shrinking and expanding the intervening periods of silence based on our updated measurements of the variance. This allows us to compensate for the changing network characteristics. We use talkspurt to mean any contiguous period of sound terminated on both sides by silence.

Finally, as shown below in our experimental measurements of network characteristics, packet transmission on the Internet are characterized by spikes. A spike is a sudden and dramatic increase in the latency or transmission time of a series of packets, after which the latency decays sharply, but not as quickly, back to normal. Spikes would wreak havoc with a standard adaptive playout algorithm because the playout buffering delay calculated from the normal variance would be insufficient to allow for the playout of any of the packets involved in the spike; additionally, the measurements taken during the spike will skew the algorithm's future perception of the network variance. Our algorithm detects spikes based on a three-point secant approximation of the rate of change of the variance, and we collect statistics and calculate the playout time differently if we decide that a spike is occurring.

2.2 Adaptive FEC

We have proposed and simulated a model for adaptive forward error correction. Our algorithm implements forward error correction by including redundant information for previous samples with every packet. Previous implementations of this sort of FEC typically use a fixed delta of 1 or 2 (that is, in the packet containing a sample for t_2 an algorithm with a delta of 1 would also include a lower-quality sample for t_1). However, we have a model for computing the optimal useable delta on the receiver side on a per-source basis based on per-packet measurements of network loss and delay characteristics. The reciever then sends small signaling packets containing its desired delta to the appropriate sender, which is able to calculate the optimal delta for all receivers in the multicast group and change its behavior accordingly.

Since our adaptive FEC algorithm is both complicated and original, it will be described in detail in Appendix A.

2.3 Audio Encoding

In order for forward error correction to be viable, the redundant information included with each packet must be small enough that it does not impose an excessive additional load on the network. Therefore it must do tight compression of audio data. It must also be able to work quickly in order to meet the demands of a realtime application, and it must provide reasonable quality.

Our actual implementation uses a variety of codecs (encoding schemes), including PCM and three varieties of ADPCM (standards G721, G723-24, G723-40). We also implemented the CELP (Code Excited Linear Prediction) version of Linear Predictive Coding but later decided not to use it because its quality-for-encoding speed tradeoffs were unsuitable for real-time applications. In particular, on our reference platforms it took 400,000 microseconds to encode a 480 byte sample (which corresponds to 21,768 microseconds of speech). Although it offered superior compression we were unable to make use of an algorithm that took twenty times longer to process a sample than we took to record it. In our experience, ADPCM has proved sufficient in terms of encoding speed and decoded quality, and its flexibility allows for a fine-tuned approach.

2.4 Management Responsibilities

Midway through the semester, our group was unfortunately left without a management team, and we offered to take over in that capacity. To that end, we organized a series of at least a dozen continuing meetings with various subsets of Phonedation to hash out the details of group interaction, interfacing, and demonstration methods. Although time-consuming, these proved to be extremely valuable because we were able to eliminate various misconceptions and assumptions on the part of components who had not previous spoken with each other. We tried our best to insure that the various components agreed upon their interfaces. Working with each component involved, we outlined the flow of control in excruciating detail for each application. We have also installed software, administered accounts, worked with concerned individuals and teams, and suggested ways to tackle common problems.

As time went on, our management responsibilities mushroomed into a quagmire of mediation, interface construction, and basic coding assistance. It became self-evidently clear that our group's lack of strong initial managment guidance was a blow from which we would never fully recover, and in many senses our group spent the rest of the semester in a vain attempt to catch up. In particular, almost every group developed their interfaces in isolation, and before we took on the role of managment, no one had taken the time to scrutinize the reports to provide some sort of consistency between what was expected and what was actually provided. When it came time to integrate, then, every component found it necessary to implement additional, previously overlooked, functionality for other dependent components. Tempers ran short as each team perceived this as last-minute impositions requiring them to change their interface in order to support other teams' needs. Aside from the expected fragmentary impact this had on group morale, it also meant that the time we had set aside for integration proved inefficient, as teams were forced to wait longer than expected on others before they could begin testing. As an example, the late maturity of the signaling and directory service teams' servers and interfaces hamstrung many of the higher level applications, who were forced to rewrite their code to function in the absence of those two essential teams in order to do any testing.

2.6 Signaling

Initially hampered by poor communication between the signaling team members and lack of firm commitment, signaling began functioning as a one-person team and was unable to meet the demands placed on it by specialized applications written in three different languages on two different operating systems. In the role of the glue that was supposed to hold the various components together when in use, signaling was overwhelmed and became lost in a labyrinth of RMI calls and UTF-encoded string. About a week and a half ago, we decided to aid signaling by writing a generic piece of their code to provide an interface with all of the applications using socket-based calls to send information about incoming calls, indicate call termination, and request outgoing calls. Additionally, since no formal interface had yet been designed, we created one to govern communications between signaling, the gateway, and all non-RMI applications. Unfortunately, this may have been too little, too late, and instead of being the shining beacon that gathers us all together, it is our perception that the applications are floundering on the rocks while we are still climbing the stairs to light the lighthouse.

2.7 Directory Service

Much like signaling, directory service had made no initial plans to communicate with applications written in languages other than their own. Thus, about a week and a half ago, we discovered that no one could actually communicate with directory service and asked them to create a socket-based interface to work for all applications. Because their lack of familiarity with other languages and platforms, we took it upon ourselves to provide a Java interface to directory service for everyone to use. We also spent some time restructuring, clarifying, and fixing some of their internal database methods. Unfortunately, after we committed to providing the Java interface, all of the applications came to view us as the responsible directory service group, and came to us for solutions to their problems in that area. We referred them to directory service, directory service referred them back to us, and the applications sometimes were lost in the middle.

2.8 Gateway and generic integration

A large fraction of our time was spent integrating our component 1 code with the gateway code -- a task that translates loosely into fighting with the Dialogic card, puzzling over its poor documentation, and trying random parameters in search of something that worked. This took much longer than we expected, in part because functions that we had assumed the Dialogic card would be able to support, such as reading and writing from files simultaneously, turned out to be impracticable. Unfortunately, this meant that we spent a lot of time integrating with the gateway, which left little time for gateway to integrate signaling. But since signaling was not fully functional at the time, it didn't seem pressing.

We provided Unix and NT C implementations and an NT Java interface to our component 1 code, but despite many messages containing written instructions and the source code and compiled versions of working example programs, we found it necessary to explain how to use our seven functions to each team individually and in person and often more than once, which was a severe drain on our time. We were often also called in to explain how to use directory service's or signaling's code, or to run their servers -- tasks which were necessary for integration but spread us even thinner. In our experience, this would appear to be many other teams' first experience with large projects containing arrays of characters, socket calls, memory allocation, and asynchronous events. A nontrivial amount of time was spent reviewing these concepts and debugging code based on them -- time that we had scheduled for integration at a much higher level.

Despite all of this, many of our applications have done tremendous amounts of work, especially in the last two weeks. Unfortunately, it is our great fear that they will be unable to present the entirety of their work due to a lack of complete integration with lower-level teams. What we really wanted to do as managers was to give the applications a chance to shine in the context of the completed system. For whatever reason, but perhaps largely because of our group's late start with firm management and integration, there wasn't enough time.

Section 3. Technical Problems

Our NT port was stymied by a lack of precision timing. Our code simulates multiple multiplexed timers using a priority queue and a high reliance on cheap, accurate system calls to determine the current time to microsecond precision. NT does not directly support this so we have had to work around it by accessing a high-frequency hardware performance counter for use as a clock. Since our adaptive playout algorithm allows for arbitrary clock differences, this works despite the fact that clocks generated by our algorithm on different machines will have vastly different offsets. Additionally, we were initially hard pressed to find specific replacements for fork(2), pipe(2) and wait(2). We are still unable to select(2) on the NT variant of pipes. For sound recording and playout we had to implement our own buffering system under NT in order to avoid clicks and gaps caused by small sample sizes. Some implementation details involved in providing a Java interface have caused us to create a separate interface layer that uses only base types.

In the last week and a half of the course, most of our technical problems centered around interfacing with the Dialogic card and having C, C++, and Java all speak to each other through function calls and sockets. Java doesn't speak to C++ at all, Java internally uses a 16-bit character format incompatible with the usual ascii encoding, and C can only communicate with the other two via sockets.

Section 4. What we have learned

Taking on the management responsibilities for our group was definitely an added challenge, but it has been extremely educational. None of us have ever been involved in teamwork of such magnitude, and being at the helm of the group has given us the chance to observe a wide variety of group dynamics and try our hand at getting twenty people focused on the same goal.

In a technical sense, we have learned that it is possible to provide the illusion of multiple timers for an application by using priority queues and the current time of day. We also learned a lot about shared memory and passing information uniformly between threads and forked processes.

In order to provide a Java interface for the rest of our group, we have also learned some Java and have become very familiar with JNI (the Java Native Interface). The port to NT has also taught us a lot about the importance of standards and the reliance of everyday code on functions that turn out to be system-dependent.

We have also learned that it is possible for insidious timing and arithmetic bugs to lie dormant in complicated sections of the code that appear, prima facia, to be working correctly.

Section 5. What would you do differently next time?

If possible, we would have resolved the management ambiguity earlier. As above, it has become apparent that our team has suffered from a lack of early organizational meetings.

We should have talked earlier to the other components one-on-one with respect to their expectations for what sort of input and output they wanted us to support.

A long time ago we requested that all components have their coding done by Thanksgiving, so that we could devote the rest of the semester purely to integration. This did not even come close to happening, and in retrospect we sorely underestimated the amount of time required for integration. Components developed separately in a vacuum should not be expected to come together in less than two weeks, even with Herculean efforts on the part of dedicated software engineers.

Section 6. Interface

We provide two interfaces, one for accessing our code from C, and another from Java.

C interface

/* What is the source of the samples we will send over the network? */
typedef struct _Input {
    enum { Input_Silence=0, Input_File=1, Input_WAV=2, Input_Microphone=3, 
        Input_File_By_Name=4, Input_Ringbuffer=5, Input_Nonexistent_File=6 
    } type; /* sheer */ 
    FILE *file; /* if appropriate, an already-open and readable file
                 * from which to read the input. */
    char file_name[NAMESIZE];   /* on Input_File_By_Name, set this but do
                                 * not set 'file'. */
    int loop_file;      /* if we are reading from a file of some kind, should
                         * we restart when we get to the end? If not, we'll
                         * switch to silence at the end. */
    unsigned char *ringbuffer;  /* used by Input_Ringbuffer */
    int ringbuffer_size;
    int ringbuffer_offset;      /* where are we now in the ringbuffer? */
} Input;

/* Where should we put samples we receive from the network? */
typedef struct _Output {
    enum { Output_None=0, Output_File=1, Output_WAV=2, Output_Speaker=3, 
        Output_File_By_Name=4, Output_Ringbuffer = 5, Output_Gateway=6 } type;
    char file_name[NAMESIZE];   /* on Output_File_By_Name, set this but do not
                                * set file. */
    FILE *file; /* if appropriate, an already-open and writeable file
                 * in which to store the output. */
    unsigned char *ringbuffer;  /* used by Output_Ringbuffer */
    int ringbuffer_size;
    int ringbuffer_offset;      /* where are we now in the ringbuffer? */
} Output;

/* Where should we send samples on the network? */
typedef struct _Destination {
    char *      name;   /* destination hostname */
    short       port;   /* internet port number (for all destinations) */
} Destination; 

/* The range of valid sample size parameters */
#define MAX_SIZE                1024
#define MIN_SIZE                128

/* User definable parameters for a connection. */
typedef struct _ConnectionParameters {
    Destination     dest;       /* the IP address of the destination */
    int             size;       /* desired primary sample size */
    Input           input;      /* source for sounds to send out to dest */
    Output          output;     /* where to play out received sound packets */
    int         primary_encoding;
    int         secondary_encoding;
    /* These are special debugging options that you should ignore */
    int         drop_every_x;
} * ConnectionParameters;

typedef struct _Connection * Connection;
/* From the user's perspective, a Connection c has a single field,
 * c->param, which is of type ConnecitonParameters (that is, c->param->size
 * refers to the desired sample/packet size). None of the other state
 * should be modified or examined. */

/*
 * Exported Function Prototypes
 */

/****************************************************************************
 *      EstablishConnection()
 * This procedure creates a connection based on the user supplied parameters
 * found in p. It forks another process to handle the sending and receiving
 * packets and sample playout. On sucess, it returns a newly created
 * Connection (which should be saved so you can modify or destroy it later). 
 * On failure it returns NULL and sets 'err' an in explanatory fashion.
 ****************************************************************************/
Connection EstablishConnection(ConnectionParameters p, char **error);

/****************************************************************************
 *      DestroyConnection()
 * Tears down an established connection. Deallocates all associated resources
 * and closes all open network connections. We even free the connection
 * itself, so don't look at it after this.
 *
 * It does not (for example) close file descriptors that you passed in as
 * input or output types.
 *
 * This blocks until the child process dies. 
 *
 * Returns 1 on failure, 0 on success.
 ****************************************************************************/
int DestroyConnection(Connection c);

/****************************************************************************
 *      ModifyConnection()
 * Updates the Connection to reflect its current connection parameters. 
 * So change the parameters (c->param) and then call this.
 *
 * Do not attempt to change the following:
 *      the destination
 *      the input or output type to a file pointer that was not open before
 *              you called EstablishConnection()
 *
 * Nota bene that the update may not be perfectly synchronous.
 *
 * Returns 1 on failure (although we don't catch the above errors), 0 
 * on success.
 ****************************************************************************/
int ModifyConnection(Connection c);

/****************************************************************************
 *      SendMessage()
 * Sends a UDP message to all recipients of the given connection. The
 * message should be fairly small (<1500 bytes).
 *
 * Returns 1 on error, with errno set, 0 on success.
 ****************************************************************************/
int UDP_SendMessage(Connection c, char *buf, int buf_size);

/****************************************************************************
 *      RecvMessage()
 * Waits for 'msec' microseconds to receive a UDP message from any one on
 * the other side of a given connection. 'msec' of 0 means wait forever.
 * Additionally, the sender's IP address is stored in 'sender_ip_addr' if
 * it is not NULL. 'buf' is assumed to point to an allocated area of
 * at least '*buf_size' bytes.
 *
 * If a message is received it is stored in 'buf' and the message length is
 * stored in 'buf_size'.
 *
 * This is implemented using select(2) and may therefore block for less
 * than the specified time.
 *
 * Return 1 if a message was received, 0 otherwise.
 ****************************************************************************/
long UDP_RecvMessage(Connection c, char *buf, long buf_size, long msec,
                    long *sender_ip_addr);

/****************************************************************************
 *      Connection_DoneWithInput()
 * If you specified input.type == Input_File or Input_WAV and you specified
 * input.file_loop = 0, we will send an "EOF" message to the other end of
 * the connection when we finish sending the file. If you would like to
 * know when we have finished sending the file, query this function
 * periodically.
 *
 * Return: 1 if we have finished sending the file and are now not sending
 * anything else. 0 if we are still working on it of it the input type is
 * not appropriate (ie, Input_Microphone). Returns -1 on error.
 ****************************************************************************/
int Connection_DoneWithInput(Connection c);

/****************************************************************************
 *      Connection_Offsets()
 * Writes the ringbuffer and file offset (currently) in bytes to rb_off and
 * file_off. 
 *
 * Returns -1 on error, 0 on success.
 ****************************************************************************/
int
Connection_QueryOffsets(Connection c, int *rb_off, int *file_off);

Java interface

public class C1_Connection
{
  // Values for parameters
  // -------- Input choices ------------------- //
  public static final long Input_Silence      = 0;
  public static final long Input_File_By_Name = 4;
  public static final long Input_WAV          = 2;
  public static final long Input_Microphone   = 3;
  // -------- Output choices ------------------ //
  public static final long Output_None         = 0;
  public static final long Output_File_By_Name = 4;
  public static final long Output_WAV          = 2;
  public static final long Output_Speaker      = 3;
  public static final long MIN_SIZE         = 128;
  // --------- Encoding choices --------------- //
  //  Good defaults are primary   = Encoding_PCM
  //                    secondary = Encoding_None
  public static final long Encoding_None    = 0;
  public static final long Encoding_PCM     = 8;
  public static final long Encoding_ADPCM_3 = 3;
  public static final long Encoding_ADPCM_4 = 4;
  public static final long Encoding_ADPCM_5 = 5;

  // Implementations in C
  // Dest_name should be the ip address for the destination.
  // A good choice for sample_size is MIN_SIZE*4.
  // loop_file should be 0 to play once, 1 to loop forever.
  // err is actually ignored.
  native long EstablishConnection(String dest_name, short dest_port,
                                  long sample_size, long input_type, 
                                  String input_file, long output_type, 
                                  String output_file, short loop_file,
                                  long primary_enc, long secondary_enc, 
                                  String err);

  native long DestroyConnection(long c);

  native long ModifyConnection(long c, long size, long input_type,
                               String input_file, long output_type, 
                               String output_file, short loop_file,
                               long primary_enc, long secondary_enc); 

  // lc is the long EstablishConnection returned
  native long SendMessage(long lc, String buf, long buf_size);

  // msec is how long to wait for a message, in milliseconds
  // 0 means (try to) wait forever (system calls will interrupt)
  native String RecvMessage(long lc, long buf_size, long msec);

  // if you specified Input_File_By_Name or Input_WAV, and you
  // set loop_file to 0, then you can call this function every
  // 100 ms or so to check whether it's done playing.
  // returns 1 if done, 0 if not, -1 if error 
  native long DoneWithInput(long lc);
  static
  {
    System.loadLibrary("C1_Connection_Imp");
  }

  public void main()
  {}
}

Section 7. Advice for the course staff.

It would be helpful to make it clear from the beginning (to the students taking the course) that integration will require about half of the semester. Example timelines for the progress of the project would also help, because at the outset we were unable to anticipate fully the scope of what we were committed to doing.

Section 8. References

vat (LBNL Audio Conferencing Tool), http://www-nrg.ee.lbl.gov/vat/
rtptools (description: ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/ports/mbone/rtptools/pkg/DESCR, code: ftp://ftp.freebsd.org/pub/FreeBSD/FreeBSD-current/ports/mbone/rtptools.tar)
Meeting with S. Keshav on goals for this component.
Microsoft Visual C++ help files
Ramachandran Ramjee, Jim Kurose, Don Towsley, Henning Schulzrinne, "Adaptive Playout Mechanisms for Packetized Audio Applications in Wide-Area Networks", Proceedings of IEEE INFOCOMM '94, April 1994.
Sue B. Moon, Jim Kurose, and Don Towsley, "Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms", to appear on ACM/Springer Multimedia Systems.
Vicky Hardman, Martina Angela Sasse, Mark Handley, Anna Watson, "Reliable Audio for Use over the Internet", Proceedings of INET '95.
Matthew Podolsky, Cynthia Romer, Steven MaCanna, "Simulation of FEC-Based Error Control for Packet Audio on the Internet", Proceedings of IEEE INFOCOMM '98.
Jean-C. Bolot, Andres Vega-Garcia, "The Case for FEC-Based Error Control for Packet Audio in the Internet", ACM Multimedia Systems, 1997.
Jean-C. Bolot, Andres Vega-Garcia, "Control Mechanisms for Packet Audio in the Internet", Proceedings of IEEE INFOCOMM '96.