Proposal

[ Up ] [ Proposal ] [ Current Status ] [ Previous Status ]

High performance networking for off-the-shelf operating systems

Thorsten von Eicken, Assistant Professor, Computer Science Department

High-performance networking is currently in a crisis: the speed of network links increases so dramatically year after year that the fraction of that performance seen at the application level decreases steadily. The core of the problem lies in operating system structures that are too inflexible to adapt to the requirements of the new networks. In particular, new structures are required to reduce the path between applications and the network wires. The U-Net project at Cornell has developed a user-level networking architecture which addresses this problem and allows applications to harness the full performance (high bandwidth and low latency) of 100Mbps-1Gbps networks. The key idea is to move the protocol stack into user-level and access the network interface directly from user-space . Our research contribution has been to develop techniques that integrate into off-the-shelf operating systems and permit this direct access without compromising the standard protection boundaries between applications.

In addition to the performance benefits, U-Net also offers new flexibility by allowing application writers to customize the protocol stack. This enables experimentation with new protocols that are, for example, tailored towards real-time multimedia stream transmission or that provide special queuing mechanisms for guaranteeing end-to-end quality-of-service.

Currently, vendors are reacting to the networking crisis and are integrating various forms of user-level networking – from U-Net to shared memory – into the next round of products. This successful technology transfer is increasing the urgency for further research in user-level protocol stacks. For example, the experimental nature of U-Net has so far prevented a large number of applications from being modified to take advantage of the network performance, and therefore we have not been able to measure server systems running a large number of applications using U-Net simultaneously.

IDLInet and the native-mode ATM protocol stack

S. Keshav, Associate Professor, Computer Science Department

Over the last three years, we have created a portable, efficient, low-cost, quality-of-service-aware, PC-based native-mode ATM network called IDLInet . ATM connections can provide end-to-end guarantees on quality of service (QOS) that can be seen by an application only if the protocol stack at the endpoint did not multiplex traffic from many virtual circuits. In our stack, a packet received by a host adapter is never multiplexed with packets from other connections. Moreover, we carefully manage the CPU and buffer resources at endpoints to allocate connections differential qualities of service. In contrast, the popular TCP/IP stack multiplexes connections at the IP layer, so that it cannot give different connections different qualities of service. We also exploit the checksumming done in the AAL5 layer to dispense with data checksums at the transport layer. By eliminating this costly operation, we achieve much better performance than TCP/IP.

Our design is portable, in that it has been implemented in the Plan 9, FreeBSD, and Linux operating systems, and we are working on a port to NT. It uses cheap personal computers, and it is highly efficient. For example, we are able to saturate an OC3 line from a user-space process on a 90MHz Pentium processor, sending at a rate in excess of 135 Mbps, while still providing error control, flow control, and congestion control. This is 50% faster than TCP/IP over ATM on a Sparcstation. These features of IDLInet make it an excellent platform for research into traffic management, cluster computing, flow and congestion control, and signaling protocol design.

We plan to extend IDLInet technology to a larger user base. This will enable an entire class of applications that require end-to-end QOS guarantees. We plan to extend the work in IDLInet in two ways: to support variable bit rate connections using renegotiated constant bit rate service; and to meet industry standards for next-generation Internet protocols.

Applying IDLInet to cluster computing. The high performance of the IDLInet native mode ATM stack makes it suitable for cluster computing. A cluster consists of a set of computers connected by a high-speed LAN. The idea is that computations are distributed across one or more elements of the cluster, with active monitoring of processes to detect and recover from faults. With careful implementation, cluster computing has the potential to deliver high throughput (measured in transactions per second), scalability, and high availability at a low cost. Thus, clusters of high-end personal computers may well take on the role played by mainframes in current installations. Our goal is to use IDLInet, in synergy with other technologies available at Cornell, to synthesize a high-performance computing cluster. By providing guaranteed-performance communication among cluster elements, IDLInet would allow such a system to serve the needs of real-time applications. This has the potential to fundamentally transform the way large compute servers are built.

Reliable visual communications over heterogeneous networks

Sheila Hemami, Assistant Professor, School of Electrical Engineering

The wide range of different network connections currently in use precludes universal access to visual information: if visual information is coded for users having a particular bandwidth and above, users with lower bandwidths may find accessing that information prohibitively slow. Likewise, if a network connection cannot deliver the reliability that an information service requires, users with these network connections cannot receive the visual information. If a compression technique for visual information can be designed to provide appropriately compressed data for users with any bandwidth and any reliability, then information services can become universal: they can service all users. The objective of this research is to provide such compression techniques for still images and video by considering the special requirements of visual communications over a heterogeneous network.

The differing available bandwidths and packet handling characteristics mandate special capabilities for image and video compression algorithms: scalability and redundancy. A scalable compression algorithm allows visual communication at many rates, without requiring additional compression or processing after the data has been compressed the first time. Such an algorithm generates a single stream, from which many streams at different rates may be extracted. Extraction requires no additional processing. A compression algorithm with redundancy allows accurate reconstruction of lost visual information when packets are lost or overly delayed in transmission.

Multi-resolution (MR) coding techniques such as subband and wavelet coding naturally provide scalability for image compression. We are developing redundant source coding techniques for both subband and wavelet coded images. Previously developed scalable compression algorithms have been motivated solely by providing maximum compression, with no provisions for packet loss. Our strategy includes not only developing the source coding but also developing a packetization strategy to exploit added redundancy. Our first major thrust area in scalable image coding involves expanding the mathematical theory of wavelet coefficient decay. We hypothesize that decay information can provide the desired redundancy in a wavelet coding technique because it contains information about all wavelet coefficients representing an image across all scales. As such, it is well suited to be efficiently included in a redundant source-coding technique. Our second major thrust in developing redundant scalable image coding involves matching the redundancy to human perception. We will develop human-visual-system-(HVS) matched subband and wavelet redundant image coding techniques. The motivation for incorporating HVS information is that areas of less visual importance should require less redundancy, while areas of high visual importance should require more redundancy to provide a visually perfect reconstruction.

Last modified on: 07/30/99