The Implication of Increasing Ethernet Load on Applications

Yu Zhang 02/02/99

1. Introduction

While proving different levels of internet services to different applications and users becomes one of the hotest topics in current network research, most of the work is focused on the architectures (eg differentiated services) and routing mechanisms (eg. QoS routing) in the range of WAN. Little interest has been shown on studying its implication to LANs, especially the legacy Ethernet which don't provide any QoS support, though the stub networks are the first and the final steps in the "internet path" to meet the QoS requirements. The usual arguments about this is either the Ethernet are currently lightly loaded, or we can easily over-dimension them to make them so. The questions which follows naturally are "Is it true that low load local network is a prevalent case?" "To what extent do we need to over-dimension them to meet the applications' requirements?"

The focus of this work (to be considered so far) is to examine how the (high) load situation of Ethernet affect the dynamics and performance of several popular network applications. In particular, the two applications which we put on our list are web access and internet telephony. The former is a very popular network application which greatly contributes to the current internet traffic. Its traffic pattern is characterized by small client requests one way and bulk data transfer as server responses the other. The latter has been proposed as a highly desirable application and now is only in its experimental phase. Its traffic is featured by a series of evenly spaced, small size audio packets. Due to the different traffic characteristic and performance requirements of these two applications, we conjecture the impacts of the increasing Ethernet load on them will be quite different. We are interested to see what these impacts are, especially, when the degrading performance fails to meet the (basic) application requirements. We are also interested in improving the implementations of these applications to accommodate increasing load on the Ethernet.

We plan to use simulation to study the problem. As a first step, to avoid the complication when WAN technology is involved, we restrict the problem to a simple case when all the participants of the applications sit in the same LAN. The work is divided into three parts: 1. test Ethernet load situation as an attempt to shed some light on the answer of the first question; 2. study the impact of Ethernet load on web application; 3. study the impact of Ethernet load on internet telephony. In section 2, we will briefly introduce our simulation setting. Section 3,4,5 address the plan of three parts of our work one by one. Section 6 will open the lid for the second step plan.

2. Simulation Setting

To study the impact of Ethernet load on applications, we need two components in our simulation: simulating the behavior of the participants (servers and clients in web application, participants in internet telephony), simulating the Ethernet to which all the participants are connected. We decompose the simulation by having dedicated simulators (or real entities) for each of the components. Here we focus on describing the second component -- LAN simulation. We will address the issue of simulating the behavior of participants in the following subsections.

We use Entrapid on top of Jia Wang's efficient Ethernet simulation [1] to simulate the LAN. We prefer Entrapid to other simulators because from a developer's perspective, Enrapid provides the abstraction of "a network in a box". It supports multiple Virtualized Networking Kernels (VNKs). Each VNK corresponds to a machine on the Internet, and each virtualized process corresponds to a process running on that machine. A developer can instantiate new protocols either directly on a VNK, or as an external process and test its behavior when interacting with other network protocols already implemented within Entrapid. Moveover, it supports RealNet technology which can seamlessly connect real world devices, such as, routers and switches to the emulated network. By using RealNet, we can connect our simulators (or real code) for participants to the "Ethernet box" simulated by Entrapid int the normal way they connect to real Ethernet. Also, the efficient Ethernet simulation adopted by Entrapid facilitates adjusting the load situation of the Ethernet. The idea of efficient Ethernet simulation is instead of using time-consuming CSMA/CD simulation, giving an accurate estimation of the packet delay by mapping the measured carried load to an empirical delay distribution and generate a randomized delay value according to this distribution. In addition, by providing a way to set the background offered load and average packet size directly, we can get the affect of heavy cross-traffic for free while in traditional Ethernet simulation, the same amount of load has to be generated packet by packet.

Figure 1 shows the simulation setting. Two machines are set up for simulation with user application running on one and Entrapid running on the other. Clients and server of the web application or participants of telephony application are running on the left machine. On the right machine, two virtual machines, m0 and m1, are created in Entrapid to simulate the LAN. Using RealNet technique, we can bind each virtual machine to a seperate network interface (m0 to de0, m1 to de1 in Figure 1). By setting the routes and parameters in applications, we can enforce all packets sent by the clients (or sender) to the server (or receiver) are directed to m0, and then m1, finally to the server. Similarly, we can enforce packets from the server (or receiver) to the clients (or sender) to go through exactly the inverse path. Thus, packets exchanged between the clients (sender) and the server (receiver) go through the simulated LAN as in a real setting. They are subject to the delay determined by traffic characteristic on LAN, which is a combination of background offered traffic and the traffic generated by this application. By changing the load situation of LAN in Entrapid, we can study its impact on the application by observing the corresponding change of the participants' behavior.

Figure 1. Simulation Setting

3. Test Ethernet Load

We are not particularly aiming at answering the question "Is it true that local networks are always low load?". Since no matter what testing results we get for the specific LAN we experiment with, it's hard to say that this LAN is representative of the current local networks. But it's still desirable to collect and analyze some experiment data --- perhaps to get first approximation of the answer!

We plan to utilize some research results obtained when constructing mapping tables of fast Ethernet simulation. Specifically, to build the table to map offered load to packet delay, we get a family of mean packet delay v.s. total offered load curves for various average packet sizes as a by-product. Interesting enough, with these curves, we can do an inverse mapping -- Given the average packet size, we can choose the corresponding curve in the family. In addition, if the mean packet delay is known, we can estimate the offered load. In addition, we have a family of goodput v.s. total offered load curves for various average packet sizes. Once the mean packet size and offered load are known, we can infer current goodput.

The average packet size on an Ethernet during a time window can be measured by turning on the tcpdump utility on any of the stations to listen on the interface where this station connects to the Ethernet. tcpdump can sniff every Ethernet packet going through this interface. Due to the broadcast feature of Ethernet, these are all the packets exchanged over the Ethernet. We can just simply average their sizes over the time window size.

It's not so obvious how to measure the mean packet delay. Since it's not possible to track sending time and receiving time of each packet on the Ethernet (We can't get a hand on each station on the Ethernet and turn on tcpdump on it), we try to avoid the naive approach which requires to measure the delay of each packet and compute their mean. Instead, we take the "probe" approach. We will generate a negligible probe traffic and measure the mean packet delay of this additional traffic. Assume the packet delay of this traffic has the same distribution as the original traffic ( Can we justify this assumption?), this mean packet delay is a very good estimation of that of the original traffic.

The details of our experiment setting is the following: Pick two stations on the Ethernet we can get a hand on: s0, s1. Let s0 ping s1. At the same time, turn on tcpdump on both s0 and s1. In tcpdump trace of s0, suppose the timestamp for a ping packet p sent from s0 to s1 is t0, and the timestamp for the corresponding ack packet p_ack arriving s0 is t3. In tcpdump trace of s1, suppose the timestamp for p's arrival is t1, and the timestamp for sending of p_ack is t2. Further suppose the link between s0 and s1 is symmetric (reasonable assumption for Ethernet?), i.e. the delays of p and p_ack are the same value d. And the clock of s1 is slower than the clock of s0 by a value of delta, then we have
               t0 + d + delta = t1
               t2 + d - delta = t3
And the estimation of d is (t3-t0) - (t2-t1)     (i.e. RTT - response time). We examine the trace for ping packets, compute mean value of the delay, and take it as the estimation for the mean packet delay of the total traffic.

After getting the average packet size and mean packet delay, we can infer the offered load and further the goodput. Note that computing mean depends on the time window we choose. In order to capture the relative-stable-but-varying-over-time dynamics of traffic, we need to compute the variation of the packet size and packet delay and to use a small window size when the dynamics change drastically.

4. The Impact of Ethernet Load on Web Application

We use Apache1.3.3 as our HTTP server, which is a popular free web server which incorporates many HTTP1.1 features. It also generates server-side log file for the purpose to evaluate server performance. To generate the web traffic and simulate the web clients, we use Scalable URL Reference Generator (SURGE) [2] [3] from Boston University. SURGE generates a sequence of URL requests which exhibit representative distributions for document popularity, document sizes, request sizes, temporal locality, spatial locality, embedded document count and off times. It supports different versions of HTTP: HTTP1.0, HTTP1.1 without pipelining, and HTTP1.1 with pipelining (though I haven't got SURGE for HTTP1.1 to work so far). Moreover, it creates client-side log file which facilitates analysis.

One of the key parameters to assess the performance of web application (including both clients and server) is latency, defined as the time between the sending of the client request and the completion of the transfer of the server response. Latency greatly depends on network conditions. As the average packet size and the Ethernet load vary, the packet delay and goodput (including packet dropping) change accordingly. If the packet delay increases, the latency will certainly go up, provided other conditions are the same. Latency also differs with the different design of web clients proposed by different versions of HTTP. HTTP1.0 opens one connection for each file in each object (An object is defined as a base html file with zero or more embedded files). It requires at least two RTTs per document or inlined image hence brings unnecessary latency. It also incurs additional connection setup-maintain-teardown cost and the short connections suffer the TCP slow-start inefficiency. HTTP1.1 without pipelining improves version 1.0 in two ways: 1. allowing multiple connections to be opened and files to be retrieved concurrently; 2. allow multiple files to be retrieved via an open connection, to alleviate the latency, additional cost and short connection problem. HTTP1.1 with pipelining further improves the HTTP performance by requesting all the embedded files at one time by sending them along the connection, without waiting for the responses to any of the individual requests. This pipelining will further decrease the latency[4]. We plan to investigate the impact of increasing Ethernet load on each of the client designs to compare their "resistance" to pathological network situation.

The second key parameter is the server throughput, in terms of the number of requests the server processes per unit time or the number of data bytes that the server transfers per unit time[5]. The choice between these two measurements depends on which resource is the bottleneck: CPU cycles at the server, or network bandwidth. Since when we increase the network load, we expect the network bandwidth will eventually be the bottleneck We will investigate the efficiency of Apache server in terms of the data transfer throughput with the varying network load situation.

Finally, the fairness of the web service in multiple clients setting is an interesting aspect of the performance of the web application. We will study the fairness by comparing the mean and variance of mean latency of each client.

Aside: To adjust our study of local web server and clients, from the perspective of a typical web client, web access can be requests to local server, remote server, local proxy, local cache, how many percentage of the requests actually served by local server? Maybe different for different environments, what is the typical value for universities?
(From [5], from the perspective of server, remote sites requests account for >= 70% of the accesses, and >=60% of the total transfered bytes.)

5. The Impact of Ethernet Load on Internet Telephony

We plan to use the real sender/receiver code which implemented adaptive playout algorithm 2 in [6] and adaptive Forward Error Correction algorithm (available from 519 project Phonedation group team 1 [7]). We will instrument it to dump traces suitable for performance analysis.

The important parameters for the type of real-time applications like internet telephony are delay and delay jitter. Delay jitter determines the playout (buffering) delay and size of playout buffer at the receiver side. Though delay is not so significant as delay jitter, it is bounded by real-time interaction demand ( Voice mail is just an internet phone call with unusually large delay:-). As the mean packet delay increases with a growing Ethernet load, we conjecture the delay jitter will also increase.

The increasing of delay and delay variance has great impact on adaptive playout. Specifically, the playout delay p = d + beta * v, (where d is an estimation for mean delay, and v is an estimation of variance of the delay) will increase and (possibly) eventually be too large to be tolerated ( In this case, the network connection is not suitable for interactive audio application). We hope this pathological case will occur as late as possible. We will investigate the different parameters used in the algorithm, or changing the algorithm itself, to see if we can delay the pathological case and what is the highest load the algorithm can tolerate.

The increasing of delay and delay variance also has a great impact on adaptive FEC. [7] analyzed the impact of connection characteristic on FEC performance based on ping traces in both WAN range and between local machines. It divided connections (or the virtual internet paths) into three classes and found that for different connection classes, adaptive FEC algorithm has different performance. For class 1 connections, lost packets are rare and if there are packet loss, the bursts of loss typically are of width 1.so FEC_DELTA is almost always 1 and FEC can save almost all the lost packets. Class 2 connections are characterized by small delay variation and small bursts of losses. Typical loss bursts have width 1~3. Since packet loss is due to lossy connection other than large delay variation, FEC data are mostly in time to rescue, so FEC saves many of the lost packets.Class 3 connections typically has large delay variation. Even if there is no truly lost packets, many packets arrive too late and are considered as "missed" by playout algorithm. Typical bursts of "loss" have width more than 6. So FEC_DELTA is constrained by the estimation of the buffering delay in this case. For packets missed due to large delay variation, FEC data, typically arriving after the PCM data, will be too late to rescue. So for class 3 connections, though FEC can save some of the truly lost packets (if there are some) and late packets, generally FEC is not effective. Two other important observations made are:
1. Real connections may behave quite differently during different time periods, displaying characteristics of different classes. These include local connections, e.g. connection between wrw3.resnet.cornell.edu and www.cornell.edu sometimes changes between class 1 and 3.
2. Some connections tend to show consistent behavior. Local connections are class 1 connections in most of the time.

We conjecture the connection characteristic will be different between LAN and WAN. For example, due to the fact that we never observe class 2 behavior in local connections, maybe we can assume most of the links in LAN are non-lossy (By non-lossy links, we mean links which may loss packets but don't have consistent lossy behavior). But in LAN connections, the division between small delay variance connections and large delay variance probably exists. We will investigate how this changing connection characteristic in LAN affects FEC performance. We conjecture FEC will be less effective for increasing load when the delay variation also increases. But on the other hand, when delay variation varies in a small range, we hope adaptive FEC performs better than fixed FEC and save more packets by adapting FEC_DELTA to the varying network situation --- we are interested to see at what point this advantage vanishes due to the soaring delay variation.

6. Possible Directions of the Second Step Plan

1. A typical LAN includes bridges interconnecting multiple Ethernet. Bridges can be dumb or self-learning. Include bridges into our picture?
2. Can we expand our setting to the WAN range? Based on some assumption about WAN, investigate LAN as the first and last steps of the internet paths...
3. The multicast case of telephony.

Reference

[1] Jia Wang, S. Keshav, Fast Ethernet Simulation
[2] Paul Barford and Mark Crovella, "Generating Representative Web Workloads for Network and Server Performance Evaluation", in Proceedings of the 1998 ACM SIGMETRICS, pp151-160, July 1998
[3] Paul Barford, SURGE-HOW-TO
[4] Jeffrey C. Mogul, "The Case for Persistent-Connection HTTP", SIGCOMM '95, pp299-313
[5] Martin F. Arlitt, Carey L. Williamson, "Web Server Workload Characterization: The Search for Invariants", SIGMETRICS '96, pp126-137
[6] Sue B. Moon, Jim Kurose, and Don Towsley, "Packet Audio Playout Delay Adjustment: Performance Bounds and Algorithms", to appear on ACM/Springer Multimedia Systems.
[7] Data Exchange : Final Report http://www.cs.cornell.edu/yuzhang/519project/report/519final.htm