Fbufs: A High-Bandwidth Cross-Domain Transfer Facility

Notes by Alin Dobra, April 1999 (based on notes by Yu Zhang)

The problem:

Modular OS system design requires frequent cross-domain data transfers
Moving data from one protection domanin to an other is very expensive. High speed networks and mikrokernel architecture just agravate this problem.
Both latency and throughput are important

Problems of traditional approaches
In a monolitic kernel architecture usualy every layer has his own buffers and data are copied from one buffer to the other. Mikrokernel OS's ussualy use one
of the following mecanisms (each of them with it's own problems):
page remapping
- move semantics: too limited.
- copy semantics: 2 context switch, acquire locks to VM data structures,&nbs p; change VM mappings, perform TLB/cache consistency
shared memory
- compromise protection and security
- only reduce the number of copies, not eliminate copying

High-level Idea
cross-domain transfer + buffer management
combine two techniques: page remapping + shared memory
associate IO buffers with IO datapaths not with layers.
use the following two forms of locality:

all data that starts and ends in the same place have the same datapath
if there is trafic on a datapath than more trafic is expected on the same datapath. A buffer used for a particular datapath probably will be used for the same datapath in the future

Requirements on the buffer management/transfer facility
(Premise: traditional network subsystem ---- a sequence of software layers)
- support both single, contiguous buffers (sender's ADU) and non-contiguous aggregates of buffers ( receiver's ADU)
due to sender-side fragmentation and receiver-side aggregation
- a data path-specific allocator.
At the time of allocation, I/O data path that a buffer will traverse is often know (decided by two endpoints).
It also implies that the locality in network communication could be exploited.
- Use only immutable buffers. So copy semantics can be used for efficiency.
- Either eagerly or lazily raise the protection on a buffer to protect against asynchronous write on it by originator domain
- pageable buffers to avoid memory leak ( malicious domain holds a buffer forever)

Fbufs Design
Basic Mechnism
1. fbufs: 1 or more contiguous VM pages. ( So not for small messages!)
2. aggregate objects : hierarchical structure of fbufs, provide logic ops on fbufs (join, split, clip, etc.)
3. conventional page remapping with copy semantics
4. transfer steps ( see Section 3.1)
Optimizations
Restricted Dynamic Read Sharing
- fbuf region: restrict fbuf allocation from this globally shared VM region, implies fbuf is mapped at
the same virtual addr. in the orginator and all receivers. ( eliminate finding mapping in the receiver)
- read sharing on fbuf, eliminate the need for copy-on-write.
based on two typical types of data manipulation: applied to the entire data, or localized to the header/trailer
such that logical editing functions can be used instead.

Fbuf Caching
- put fbuf in a free list associated with the I/O data path for reuse instead of unmapping and clearing
it, increase locality of reference at the level of TLB, cache, and main memory.

Integrated Buffer manager/Transfer
- place the entire aggregate object into fbufs. No translation to/from list of fbufs is needed in sender and receiver.

Volatile fbufs
- no write protection from the originator, elim. one page table update.
- in integrated buffer manager/transfer, deal with potential damage of integrity of DAG.

Result: In the common case, no kernel involved in cross-domain data transfer. Speed up the common case!

Performance
Micro experienments show that fbufs offer an order of magnitude better throughput than page remapping for a
single domain corssing. Macro experiments with UDP/IP show when cached/volatile fbufs are used, domain
crossings have virtually no impact on end-to-end throughput for large messages

Discussing Points
1. Basically Fbufs is designed for large messages (>256kB) in the traditiona l layering network subsystem, and
U-net is designed for small msgs. Is there a way that we can make the best of both?
2. Problem with fbuf reclamation: similar to right revocation, malicious domain may fail to deallocate fbufs.
Limiting the quota of fbufs for each data path may be not a decent way to get around this. Can we do better?