Fbufs: A High-Bandwidth Cross-Domain Transfer Facility
Notes by Yu Zhang, April 7, 1998
Motivation
1. Modular OS system design requires efficient cross-domain invocation facility.
2. Cross-domain invocation facility is measured by:
- control transfer latency
- data transfer throughput
In the setting of I/O intensive applications (e.g.real-time
video, digital image retrieval...), the 2nd one is more important.
3. As to network I/O,
multiple domains: device drivers, network protocols, application
software
Cross-domain data transfer is bounded by
CPU/memory bandwidth for high-bandwidth network.
High-level Idea
cross-domain transfer + buffer management
combine two techniques: page remapping + shared memory
Requirements on the buffer management/transfer facility
(Premise: traditional network subsystem ---- a sequence of software layers)
- support both single, contiguous buffers (sender's ADU) and non-contiguous aggregates of
buffers ( receiver's ADU)
due to sender-side fragmentation and receiver-side aggregation
- a data path-specific allocator.
At the time of allocation, I/O data path that a buffer will traverse is often know
(decided by two endpoints).
It also implies that the locality in network communication could be exploited.
- Use only immutable buffers. So copy semantics can be used for efficiency.
- Either eagerly or lazily raise the protection on a buffer to protect against
asynchronous write on it by originator domain
- pageable buffers to avoid memory leak ( malicious domain holds a buffer forever)
Problems of traditional approaches
page remapping
- move semantics: too limited.
- copy semantics: 2 context switch, acquire locks to VM data structures,&nbs p; change
VM mappings, perform TLB/cache consistency
shared memory
- compromise protection and security
- only reduce the number of copies, not eliminate copying
Fbufs Design
Basic Mechnism
1. fbufs: 1 or more contiguous VM pages. ( So not for small messages!)
2. aggregate objects : hierarchical structure of fbufs, provide logic ops on fbufs (join,
split, clip, etc.)
3. conventional page remapping with copy semantics
4. transfer steps ( see Section 3.1)
Optimizations
Restricted Dynamic Read Sharing
- fbuf region: restrict fbuf allocation from this globally shared VM region, implies fbuf
is mapped at
the same virtual addr. in the orginator and all receivers. ( eliminate finding
mapping in the receiver)
- read sharing on fbuf, eliminate the need for copy-on-write.
based on two typical types of data manipulation: applied to the entire data, or
localized to the header/trailer
such that logical editing functions can be used instead.
Fbuf Caching
- put fbuf in a free list associated with the I/O data path for reuse instead of
unmapping and clearing
it, increase locality of reference at the level of TLB, cache, and main
memory.
Integrated Buffer manager/Transfer
- place the entire aggregate object into fbufs. No translation to/from list of fbufs is
needed in sender and receiver.
Volatile fbufs
- no write protection from the originator, elim. one page table update.
- in integrated buffer manager/transfer, deal with potential damage of integrity of DAG.
Result: In the common case, no kernel involved in cross-domain data transfer. Speed up the common case!
Performance
Micro experienments show that fbufs offer an order of magnitude better throughput than
page remapping for a
single domain corssing. Macro experiments with UDP/IP show when cached/volatile
fbufs are used, domain
crossings have virtually no impact on end-to-end throughput for large messages
Discussing Points
1. Basically Fbufs is designed for large messages (>256kB) in the traditiona l layering
network subsystem, and
U-net is designed for small msgs. Is there a way that we can make the best of both?
2. Problem with fbuf reclamation: similar to right revocation, malicious domain may fail
to deallocate fbufs.
Limiting the quota of fbufs for each data path may be not a decent way to get around this.
Can we do better?