Lightweight Remote Procedure Call (LRPC)

Review by Kevin LoGuidice, March 1998

Introduction

� Surprisingly, most cross-address-invocations take place between domains within the same machine and not between computers as one might expect in client-server systems. As result, the conventional RPC communication mechanism incurs unnecessary overhead including needless scheduling, excessive run-time indirection, redundant copying, lock contention and unnecessary access validation.

Goal

� A lightweight communication facility for cross-address-space invocation based on optimizations concerning data copying and thread scheduling.

Benefits

� A safe, transparent communication alternative for small-kernel operating systems.

� Improved performance over conventional RPC

� Simple control transfer: client’s thread executes procedure in server.

� Simple data transfer: param-passing mechanism is similar to that used by procedure call.

� Simple stubs: simple control/data transfer model generates highly optimized stubs.

� Concurrency Support: avoids shared data structure bottlenecks and sensitive to speedup of multiprocessor.

Conventional RPC Overhead

� Stubs: a general interface and execution path for both cross domain and cross machine calls which is infrequently needed.

� Message buffer: message transfer can involve copy through kernel requiring two copy operations on call and two operations on return

� Access validation: Kernel validates on call and return

� Message Transfer: Flow control of message queues is often necessary

� Scheduling: indirection of threads is slow as a result of locking

� Context Switching: Virtual Memory context switch from client to server and back again

� Dispatching: Single receiver thread in server interpreting message and dispatching.

LRPC Binding

Unlike conventional RPC, which sets up one or more threads which listen on ports for invocation request, the server exports a set of procedures that it is prepared to have called. The client may then "bind" to those procedures via kernel (as follows)

Server exports interface
Clerk registers interface
Client imports interface via kernel
Clerk Reply’s with Procedure Descriptor List (PDL)
Kernel allocates memory based on Procedure Descriptions (PD) in PDL
Kernel returns Binding object and A-stack list to client

LRPC Call

High level of integration between Client, Kernel, and Server.

Client

Client stub dequeues A-stack
Arguments are copied onto A-Stack
Registers are loaded with address of A-Stack, Binding Object, and procedure ID
Kernel is trapped

Kernel

Verifies the Binding Object and procedure ID, and locates the correct PD.
Verifies the A-Stack and locates linkage
Ensures that no other thread is using that A-Stack/linkage pair
Records the caller’s return address and current SP in linkage
Pushes the linkage onto the top of a stack of linkages kept in threads control block
Locates execution stack in the server’s domain
Updates the thread’s user SP to run off new execution stack
Reloads the processors virtual memory registers with those of the server domain.
Performs upcall into the servers stub at the address specified in the PD

Server

Server procedure executes and can directly access parameters via A-Stack
Procedure returns through its own stub and traps kernel
Kernel switches the thread back to client

Additional

� Multiple processors can be used to improve throughput and lower call latency

� Transparency is preserved. Binding object has bit to indicate that call is to remote server and uses LRPC or RPC respectively.

Performance

� Arguments are only copied once (onto A-Stack), as opposed to 4 times in RPC (client stub->message->kernal buffer->kernel buffer->message->server stub)

� Domain switching is roughly 3 times faster than RPC.

� TLB misses are minimized in LRPC (yet still account for much of the delay)

� No apparent limiting factor for calls-per-second on multiprocessor system.

Questions

Can a server control the degree of concurrency for LRPC ?
What are the implications of migrating a resource from a remote server to a local server and vice versa?
Do you feel the efficiency of LRPC is justified in terms of memory management costs?
Do you feel that clients and servers have a higher degree of risk for mutual interference than RPC ?