Notes on Fbuf: A High-bandwidth Cross-Domain Transfer Facility"
by Linda Wu
Summary
Authors present a new method to do buffer management to elimate extra copy that
happen in the cross-domain transfer. The technique used here is the combination
of two existing techniques:page re-mapping and share memory.
Motivation
As we moving toward to micro-kernel model, components like protocol, drivers,
and software applications are in different domains. Network speed is getting
faster, however the memory stays relative same. Network bound may soon become
cpu-memory bound. How to provide fast buffer transfer across domain becomes
a challenge.
Buffer Management in Network Subsystem
- support both single, contiguous buffers, non-contiguous aggregates of
buffers
- At the time of allocation, I/O data path that a buffer will traverse
is often know, hence a data path-specific allocator.
- Use only immutable buffers. Consequently, providing only copy semantics.
- two mechanism to protect asynchronous access of buffer:
1. enforce immutability by raising the protection on a buffer when the
originator transfers it
2. lazily raise the protection upon request by a receiver.
- pageable buffers
Problems in Using Page-remapping and Share Memory:
Page-remapping
- used only system support VM. There're some overhead:the time it takes to
switch to supervisor mode, acquire necessary locks to VM data structures,
change VM mappings(can be at several levels) for each page, perform
TLB/cache consistency, and return to user mode.
Shared Memory
- globally shared memory compromise security, pairwise shared memory requires
copying when data is either not immediately consumed or is forwarded to a
third domain, and group-wise shared memory requires that the data path of
a buffer is always known at the time of allocation.
Key Design
Restricted Dynamic Read Sharing
- limited ranged to fbuf region, implies orginator and receivers are
mapped to same virtual address. This eliminates the finding free
VA for receiver.
- strict rule on write acess which eliminates the need for a COW mechanism
Caching
- put fbuf in the free list after using it instead of unmapped and clearing
the buffer. It reduces the number of page table update to two. increase
locality of reference at the level of TLB, cache, and main memory.
Integrated Buffer manager/Transfer
Volatile fbufs
- eliminated write permission from the originalator, hence one less page
table update.
Caution: together with integrated buffer manager and transfer, there
is potential problem with DAG.
Performance
Done on DECStation2000 with a prototype ATM borad, Osiris and a null modem
support a link spead of 622 Mbps. Results at micro experienment show fbufs offer an order of magnitude better throughput than page remapping for a single domain corssing. Macro experiments with UDP/IP show when cached/volatile fbufs are used, domain corsssings have virtualy no impact on end-to-end throughput for a large messages
Comments:
Things to take:
Analyze the behavior of memory access of a netowrk subsystem, and come
up with the requirements
design a system so that locality is there.
Personal rating: I don't find this paper ease of reading.