Notes on Fbuf: A High-bandwidth Cross-Domain Transfer Facility"
                        by Linda Wu
Summary
Authors present a new method to do buffer management to elimate extra copy that
happen in the cross-domain transfer.  The technique used here is the combination
of two existing techniques:page re-mapping and share memory.

Motivation
As we moving toward to micro-kernel model, components like protocol, drivers,
and software applications are in different domains.  Network speed is getting
faster, however the memory stays relative same.  Network bound may soon become
cpu-memory bound.   How to provide fast buffer transfer across domain becomes 
a challenge. 

Buffer Management in Network Subsystem
- support both single, contiguous buffers, non-contiguous aggregates of
  buffers
- At the time of allocation, I/O data path that a buffer will traverse 
  is often know, hence a data path-specific allocator.
- Use only immutable buffers.  Consequently, providing only copy semantics.
- two mechanism to protect asynchronous access of buffer:
  1.  enforce immutability by raising the protection on a buffer when the 
      originator transfers it
  2.  lazily raise the protection upon request by a receiver.
- pageable buffers

Problems in Using Page-remapping and Share Memory:
Page-remapping
- used only system support VM.  There're some overhead:the time it takes to 
  switch to supervisor mode, acquire necessary locks to VM data structures,
  change VM mappings(can be at several levels) for each page, perform 
  TLB/cache consistency, and return to user mode.

Shared Memory
- globally shared memory compromise security, pairwise shared memory requires 
  copying when data is either not immediately consumed or is forwarded to a 
  third domain, and group-wise shared memory requires that the data path of 
  a buffer is always known at the time of allocation.

Key Design 

Restricted Dynamic Read Sharing
- limited ranged to fbuf region, implies orginator and receivers are
  mapped to same virtual address.  This eliminates the finding free
  VA for receiver.
- strict rule on write acess which eliminates the need for a COW mechanism

Caching
- put fbuf in the free list after using it instead of unmapped and clearing 
  the buffer.  It reduces the number of page table update to two. increase 
  locality of reference at the level of TLB, cache, and main memory.

Integrated Buffer manager/Transfer

Volatile fbufs
-  eliminated write permission from the originalator, hence one less page
   table update.
Caution: together with integrated buffer manager and transfer, there
         is potential problem with DAG.

Performance
Done on DECStation2000 with a prototype ATM borad, Osiris and a null modem
support a link spead of 622 Mbps.  Results at micro experienment show fbufs offer an order of magnitude better throughput than page remapping for a single domain corssing.  Macro experiments with UDP/IP show when cached/volatile fbufs are used, domain corsssings have virtualy no impact on end-to-end throughput for a large messages 

Comments:
Things to take:
        Analyze the behavior of memory access of a netowrk subsystem, and come
        up with the  requirements
        design a system so that locality is there.
Personal rating:  I don't find this paper ease of reading.