The focus of this work (to be considered so far) is to examine how the (high) load situation of Ethernet affect the dynamics and performance of several popular network applications. In particular, the two applications which we put on our list are web access and internet telephony. The former is a very popular network application which greatly contributes to the current internet traffic. Its traffic pattern is characterized by small client requests one way and bulk data transfer as server responses the other. The latter has been proposed as a highly desirable application and now is only in its experimental phase. Its traffic is featured by a series of evenly spaced, small size audio packets. Due to the different traffic characteristic and performance requirements of these two applications, we conjecture the impacts of the increasing Ethernet load on them will be quite different. We are interested to see what these impacts are, especially, when the degrading performance fails to meet the (basic) application requirements. We are also interested in improving the implementations of these applications to accommodate increasing load on the Ethernet.
We plan to use simulation to study the problem. As a first step, to avoid the complication when WAN technology is involved, we restrict the problem to a simple case when all the participants of the applications sit in the same LAN. The work is divided into three parts: 1. test Ethernet load situation as an attempt to shed some light on the answer of the first question; 2. study the impact of Ethernet load on web application; 3. study the impact of Ethernet load on internet telephony. In section 2, we will briefly introduce our simulation setting. Section 3,4,5 address the plan of three parts of our work one by one. Section 6 will open the lid for the second step plan.
We use Entrapid on top of Jia Wang's efficient Ethernet simulation [1] to simulate the LAN. We prefer Entrapid to other simulators because from a developer's perspective, Enrapid provides the abstraction of "a network in a box". It supports multiple Virtualized Networking Kernels (VNKs). Each VNK corresponds to a machine on the Internet, and each virtualized process corresponds to a process running on that machine. A developer can instantiate new protocols either directly on a VNK, or as an external process and test its behavior when interacting with other network protocols already implemented within Entrapid. Moveover, it supports RealNet technology which can seamlessly connect real world devices, such as, routers and switches to the emulated network. By using RealNet, we can connect our simulators (or real code) for participants to the "Ethernet box" simulated by Entrapid int the normal way they connect to real Ethernet. Also, the efficient Ethernet simulation adopted by Entrapid facilitates adjusting the load situation of the Ethernet. The idea of efficient Ethernet simulation is instead of using time-consuming CSMA/CD simulation, giving an accurate estimation of the packet delay by mapping the measured carried load to an empirical delay distribution and generate a randomized delay value according to this distribution. In addition, by providing a way to set the background offered load and average packet size directly, we can get the affect of heavy cross-traffic for free while in traditional Ethernet simulation, the same amount of load has to be generated packet by packet.
Figure 1 shows the simulation setting. Two machines are set up for simulation with user application running on one and Entrapid running on the other. Clients and server of the web application or participants of telephony application are running on the left machine. On the right machine, two virtual machines, m0 and m1, are created in Entrapid to simulate the LAN. Using RealNet technique, we can bind each virtual machine to a seperate network interface (m0 to de0, m1 to de1 in Figure 1). By setting the routes and parameters in applications, we can enforce all packets sent by the clients (or sender) to the server (or receiver) are directed to m0, and then m1, finally to the server. Similarly, we can enforce packets from the server (or receiver) to the clients (or sender) to go through exactly the inverse path. Thus, packets exchanged between the clients (sender) and the server (receiver) go through the simulated LAN as in a real setting. They are subject to the delay determined by traffic characteristic on LAN, which is a combination of background offered traffic and the traffic generated by this application. By changing the load situation of LAN in Entrapid, we can study its impact on the application by observing the corresponding change of the participants' behavior.
Figure 1. Simulation Setting
We plan to utilize some research results obtained when constructing mapping tables of fast Ethernet simulation. Specifically, to build the table to map offered load to packet delay, we get a family of mean packet delay v.s. total offered load curves for various average packet sizes as a by-product. Interesting enough, with these curves, we can do an inverse mapping -- Given the average packet size, we can choose the corresponding curve in the family. In addition, if the mean packet delay is known, we can estimate the offered load. In addition, we have a family of goodput v.s. total offered load curves for various average packet sizes. Once the mean packet size and offered load are known, we can infer current goodput.
The average packet size on an Ethernet during a time window can be measured by turning on the tcpdump utility on any of the stations to listen on the interface where this station connects to the Ethernet. tcpdump can sniff every Ethernet packet going through this interface. Due to the broadcast feature of Ethernet, these are all the packets exchanged over the Ethernet. We can just simply average their sizes over the time window size.
It's not so obvious how to measure the mean packet delay. Since it's not possible to track sending time and receiving time of each packet on the Ethernet (We can't get a hand on each station on the Ethernet and turn on tcpdump on it), we try to avoid the naive approach which requires to measure the delay of each packet and compute their mean. Instead, we take the "probe" approach. We will generate a negligible probe traffic and measure the mean packet delay of this additional traffic. Assume the packet delay of this traffic has the same distribution as the original traffic ( Can we justify this assumption?), this mean packet delay is a very good estimation of that of the original traffic.
The details of our experiment setting is the following: Pick two stations
on the Ethernet we can get a hand on: s0, s1. Let s0 ping s1. At
the same time, turn on tcpdump on both s0 and s1. In tcpdump
trace of s0, suppose the timestamp for a ping packet p sent from s0 to
s1 is t0, and the timestamp for the corresponding ack packet p_ack arriving
s0 is t3. In tcpdump trace of s1, suppose the timestamp for p's arrival
is t1, and the timestamp for sending of p_ack is t2. Further suppose the
link between s0 and s1 is symmetric (reasonable assumption for Ethernet?),
i.e. the delays of p and p_ack are the same value d. And the clock of s1
is slower than the clock of s0 by a value of delta, then we have
t0 + d + delta = t1
t2 + d - delta = t3
And the estimation of d is (t3-t0) - (t2-t1)
(i.e. RTT - response time). We examine the trace for ping packets, compute
mean value of the delay, and take it as the estimation for the mean packet
delay of the total traffic.
After getting the average packet size and mean packet delay, we can infer the offered load and further the goodput. Note that computing mean depends on the time window we choose. In order to capture the relative-stable-but-varying-over-time dynamics of traffic, we need to compute the variation of the packet size and packet delay and to use a small window size when the dynamics change drastically.
One of the key parameters to assess the performance of web application (including both clients and server) is latency, defined as the time between the sending of the client request and the completion of the transfer of the server response. Latency greatly depends on network conditions. As the average packet size and the Ethernet load vary, the packet delay and goodput (including packet dropping) change accordingly. If the packet delay increases, the latency will certainly go up, provided other conditions are the same. Latency also differs with the different design of web clients proposed by different versions of HTTP. HTTP1.0 opens one connection for each file in each object (An object is defined as a base html file with zero or more embedded files). It requires at least two RTTs per document or inlined image hence brings unnecessary latency. It also incurs additional connection setup-maintain-teardown cost and the short connections suffer the TCP slow-start inefficiency. HTTP1.1 without pipelining improves version 1.0 in two ways: 1. allowing multiple connections to be opened and files to be retrieved concurrently; 2. allow multiple files to be retrieved via an open connection, to alleviate the latency, additional cost and short connection problem. HTTP1.1 with pipelining further improves the HTTP performance by requesting all the embedded files at one time by sending them along the connection, without waiting for the responses to any of the individual requests. This pipelining will further decrease the latency[4]. We plan to investigate the impact of increasing Ethernet load on each of the client designs to compare their "resistance" to pathological network situation.
The second key parameter is the server throughput, in terms of the number of requests the server processes per unit time or the number of data bytes that the server transfers per unit time[5]. The choice between these two measurements depends on which resource is the bottleneck: CPU cycles at the server, or network bandwidth. Since when we increase the network load, we expect the network bandwidth will eventually be the bottleneck We will investigate the efficiency of Apache server in terms of the data transfer throughput with the varying network load situation.
Finally, the fairness of the web service in multiple clients setting is an interesting aspect of the performance of the web application. We will study the fairness by comparing the mean and variance of mean latency of each client.
Aside: To adjust our study of local web server and clients, from the
perspective of a typical web client, web access can be requests to local
server, remote server, local proxy, local cache, how many percentage of
the requests actually served by local server? Maybe different for different
environments, what is the typical value for universities?
(From [5], from the perspective of server, remote sites requests account
for >= 70% of the accesses, and >=60% of the total transfered bytes.)
The important parameters for the type of real-time applications like internet telephony are delay and delay jitter. Delay jitter determines the playout (buffering) delay and size of playout buffer at the receiver side. Though delay is not so significant as delay jitter, it is bounded by real-time interaction demand ( Voice mail is just an internet phone call with unusually large delay:-). As the mean packet delay increases with a growing Ethernet load, we conjecture the delay jitter will also increase.
The increasing of delay and delay variance has great impact on adaptive playout. Specifically, the playout delay p = d + beta * v, (where d is an estimation for mean delay, and v is an estimation of variance of the delay) will increase and (possibly) eventually be too large to be tolerated ( In this case, the network connection is not suitable for interactive audio application). We hope this pathological case will occur as late as possible. We will investigate the different parameters used in the algorithm, or changing the algorithm itself, to see if we can delay the pathological case and what is the highest load the algorithm can tolerate.
The increasing of delay and delay variance also has a great impact on
adaptive FEC. [7] analyzed the impact of connection characteristic on FEC
performance based on ping traces in both WAN range and between local machines.
It divided connections (or the virtual internet paths) into three
classes and found that for different connection classes, adaptive FEC algorithm
has different performance. For class 1 connections, lost packets are rare
and if there are packet loss, the bursts of loss typically are of width
1.so FEC_DELTA is almost always 1 and FEC can save almost all the lost
packets. Class 2 connections are characterized by small delay variation
and small bursts of losses. Typical loss bursts have width 1~3. Since packet
loss is due to lossy connection other than large delay variation, FEC data
are mostly in time to rescue, so FEC saves many of the lost packets.Class
3 connections typically has large delay variation. Even if there is no
truly lost packets, many packets arrive too late and are considered as
"missed" by playout algorithm. Typical bursts of "loss" have width more
than 6. So FEC_DELTA is constrained by the estimation of the buffering
delay in this case. For packets missed due to large delay variation, FEC
data, typically arriving after the PCM data, will be too late to rescue.
So for class 3 connections, though FEC can save some of the truly lost
packets (if there are some) and late packets, generally FEC is not effective.
Two other important observations made are:
1. Real connections may behave quite differently during different time
periods, displaying characteristics of different classes. These include
local connections, e.g. connection between wrw3.resnet.cornell.edu and
www.cornell.edu sometimes changes between class 1 and 3.
2. Some connections tend to show consistent behavior. Local connections
are class 1 connections in most of the time.
We conjecture the connection characteristic will be different between LAN and WAN. For example, due to the fact that we never observe class 2 behavior in local connections, maybe we can assume most of the links in LAN are non-lossy (By non-lossy links, we mean links which may loss packets but don't have consistent lossy behavior). But in LAN connections, the division between small delay variance connections and large delay variance probably exists. We will investigate how this changing connection characteristic in LAN affects FEC performance. We conjecture FEC will be less effective for increasing load when the delay variation also increases. But on the other hand, when delay variation varies in a small range, we hope adaptive FEC performs better than fixed FEC and save more packets by adapting FEC_DELTA to the varying network situation --- we are interested to see at what point this advantage vanishes due to the soaring delay variation.