Review by Kevin LoGuidice, March 1998
Introduction
� Surprisingly, most cross-address-invocations take place between domains within the same machine and not between computers as one might expect in client-server systems. As result, the conventional RPC communication mechanism incurs unnecessary overhead including needless scheduling, excessive run-time indirection, redundant copying, lock contention and unnecessary access validation.
Goal
� A lightweight communication facility for cross-address-space invocation based on optimizations concerning data copying and thread scheduling.
Benefits
� A safe, transparent communication alternative for small-kernel operating systems.
� Improved performance over conventional RPC
� Simple control transfer: client’s thread executes procedure in server.
� Simple data transfer: param-passing mechanism is similar to that used by procedure call.
� Simple stubs: simple control/data transfer model generates highly optimized stubs.
� Concurrency Support: avoids shared data structure bottlenecks and sensitive to speedup of multiprocessor.
Conventional RPC Overhead
� Stubs: a general interface and execution path for both cross domain and cross machine calls which is infrequently needed.
� Message buffer: message transfer can involve copy through kernel requiring two copy operations on call and two operations on return
� Access validation: Kernel validates on call and return
� Message Transfer: Flow control of message queues is often necessary
� Scheduling: indirection of threads is slow as a result of locking
� Context Switching: Virtual Memory context switch from client to server and back again
� Dispatching: Single receiver thread in server
interpreting message and dispatching.
LRPC Binding
Unlike conventional RPC, which sets up one or more threads which listen on ports for invocation request, the server exports a set of procedures that it is prepared to have called. The client may then "bind" to those procedures via kernel (as follows)
LRPC Call
High level of integration between Client, Kernel, and Server.
Client
Kernel
Server
Additional
� Multiple processors can be used to improve throughput and lower call latency
� Transparency is preserved. Binding object has bit to indicate that call is to remote server and uses LRPC or RPC respectively.
Performance
� Arguments are only copied once (onto A-Stack), as opposed to 4 times in RPC (client stub->message->kernal buffer->kernel buffer->message->server stub)
� Domain switching is roughly 3 times faster than RPC.
� TLB misses are minimized in LRPC (yet still account for much of the delay)
� No apparent limiting factor for calls-per-second on multiprocessor system.
Questions