Masking the Overhead of Protocol Layering

Notes by Dan Dumitriu; previous notes by Li Li and Alfred Landrum.

Problem

Protocol layering degrades performance because (1) it adds more headers and poor alignment increases header size, and (2) it adds overhead in crossing layer boundaries.

van Renesse’s Protocol Accelerator (PA) addresses both these issues. It compresses headers, packing them efficiently, ignoring layer boundaries, send connection specific information only once, and bypasses the protocol stack altogether in some instances on sends and receives.

 

Reducing Header Overhead

The field of protocol header is divided into four classes: connection identification, protocol-specific information, message-specific information and gossip.

Use connection cookie (a 62-bit magic number which is chosen at random and identifies the connection) to reduce the overhead introduced by connection identification part of message header.

The Protocol Accelerator (PA) collects all the fields of each protocol layer and compiles them into four compact headers. It does so as efficient as possible. Traditionally each field is aligned to 4 or 8 byte boundary, it's easy to see this trick reduces a great deal of the padding overhead.

Eliminating Layered Protocol Processing Overhead

It tries to minimize the critical path by delaying all updating of the protocol state until the actual message sending or delivery.

By predicting the protocol-specific header of the next message, in most cases the creation or checking of the protocol-specific header can be eliminated.

Packet filters, both in the send and delivery critical path, avoids passing through the layers all together.

 To reduce the time waiting for the post processing of previous message, PA uses message packing to deal with backlogs.

Application to Standard Protocols

The PA could be used by with standard protocols, like TCP/IP, to improve latency, but of course only if both peers implement PA. However, the pre-processing and post-processing techniques would still be applicable.

Implementation

Each connection has a PA. The implementation of PA is about 1500 lines of C code. For architecture, see Figure 2 in the paper.

Problems with the PA

 

Discussion