Performance and Overhead

The extensive use of layering raises important performance issues in Horus. On the one hand, the layering improves performance, since applications can choose the minimal stack for their requirements. For example, an application can decide whether or not it needs end-to-end guarantees, and, if so, whether STABLE or PINWHEEL will be optimal. Also, because each layer is small and simple, they can easily and effectively be optimized individually. Although the performance of Horus currently compares very favorably to other systems (see [15]), performance could still be improved. The performance of the current system suffers for the following reasons:

There is an indirect procedure call each time a layer boundary is crossed.
Since Horus is thread-safe, multiple procedure calls into the same layer often have to be synchronized by a lock. To avoid deadlock, it is sometimes necessary to invoke an upcall as a thread.
Layers push their own header onto the message. For convenience, this header is aligned to a word boundary. This leads to a considerable overhead of unused bits on messages that need be transfered. Also, each pop and push operation has an associated overhead.

We have no detailed overhead measurement, but can report that on a Sparc 10 the overhead of the fragmentation/reassembly layer FRAG (which only needs one bit of header space) adds about 50 secs to the one-way latency, which is considerable. We believe we could bring this down somewhat by more careful coding, but we are working on more rigorous solutions to each of these problems.

For the first problem, we will avoid unnecessary invocations of a layer, skipping layers that take no action on the way down or up. We also envision that it will be possible to take common substacks of protocols, and (from the reference implementation) create one single production layer. Ideally, a compiler might implement optimizations such as these.

To address the second problem, we are eliminating intra-stack threading, having discovered that concurrency within a stack does not lead to significant gains. This way we can reduce the use of locks and the frequency of thread creation, except when entering a stack from the top or bottom. Since synchronization between stacks is seldom necessary, we can still run each stack within its own thread.

For the last problem, we are changing the protocol implementations. A protocol will specify, instead of the layout of their header, the fields that it needs (in terms of size and alignment, both specified in bits). When building a stack, Horus will precompute a single header in which the necessary fields are compacted. This should reduce wasted space on a message to a minimum, and eliminate the header push and pop operations currently used by most layers.

Next: Status and Challenges Up: A Framework for Protocol Previous: The End-to-End Argument

Robbert VanRenesse
Mon May 15 12:16:43 EDT 1995