Virtual Memory Primitives for User Programs
Authors: Andrew W. Appel and Kai Lee at Princeton University
Original notes by Zhiyuan Chen, Feb. 16 1998.
Notes revised by ted bonkenburg (tb@cs) and presented Feb. 23 1999.
Overview
- Virtual memory has been found to have many uses other than it was originally intended. The
virtual memory system has been exploited by operating systems to do such "tricks" as
sharing pages bet. processes, implementing copy-on-write, making code reentrant, etc.
- Why not expose certain features of an O/S's virtual memory system to user applications?
- The paper describes a set of primitives that should be exposed by operating systems to user-level
applications and attempts to back this up by presenting some neat applications making use of them.
- Although they claim to answer the question: "What virtual-memory primitives should the operating
system provide to user processes, and how well do today's operating systems provide them?" they
seem to focus more on attempting to convince O/S developers that the given primitives will be
used and should therefore be optimized
The VM primitives
- TRAP (handle page-fault trap in user mode)
- PROT1 ( Protect one page, i.e., decrease accessibility of a page)
- PROTN ( Protect N pages at a time, for efficiency purposes)
- UNPROT ( Unprotect one page )
- DIRTY (return a list of dirtied pages since previous call)
- MAP2 (map the same physical page at two different virtual addresses, at different level
of protection.)
Example Applications That Use The Primitives
Concurrent garbage collection. {TRAP, PROTN, UNPROT, MAP2}
- Protect unscanned area from mutator. If a trap, invoke collector.
Shared virtual memory. {TRAP, PROT1, UNPROT, MAP2}
- Use protection to maintain single-write/multiple-reader coherence.
Concurrent checkpointing. {TRAP, PROT1, PROTN, UNPROT, DIRTY}
- Instead of stopping all threads and doing a memory dump, mark entire mem as read-only.
Checkpointing thread copies pages and unprotects them as it goes.
- When a fault occurs, the checkpointing thread immediately copies that page.
- DIRTY primitive used for incremental checkpointing.
Generational garbage collection. {TRAP, PROTN, UNPROT} | {DIRTY}
- Protect "older" generation. When an assignment occurs fault handler can detect if it
points into a new generation. Don't have to use additional checking instructions for assignments.
Persistent stores. {TRAP, UNPROT, file-mapping} | {TRAP, PROTN, UNPROT, MAP2}
- Use fault handler to detect when a change is being made in persistent store. Save changed
pages until commit time.
Extending addressability. {TRAP, PROT1/N, UNPROT, MAP2}
- When bringing into memory for the first time, transfer pointer from 64 bits to 32 bits.
Data-compression paging. {TRAP, PROT1, UNPROT}
- Keep less frequently used pages compressed in memory. When fault occurs decompress.
Heap overflow detection. {TRAP, PROTN, UNPROT}
- Instead of check for overflow each time, use guard pages and cause fault when overflow.
Performance comparison and Implementation Issues
- Characteristics of above tricks/algorithms:
- In memory operations (cost is proportional to page size).
- Protect pages in batches, then unprotect individually.
- User mode service routines need to access pages protected from user-mode client.
- Comparison of current OS implementation performance: (Page 8, Figure 2): Winner is
i386+NX/2, loser is Mach.
- What matters for performance: page size (smaller typically desired),
OS overhead for TRAP, PROT1/N and UNPROT.
- Other issues:
- TLB consistency, when decrease accessibility of a page, need to flush that page from TLB.
(shootdown problem...decrease cost by batching the shoot down)
- Pipelined processors. They may only be a problem for heap overflow detection. What about in general?
Questions
- Do you think these applications are very useful? What is the advantage of using page
protection in these applications?
- Do you think reducing page size (as they often suggest) is a good idea? How would
that affect page table sizes? Related: Can MAP2 be done w/ inverted page table?
- Do you think they made a good enough argument for the DIRTY primitive?
- Why is Mach consistenly the worst performer of these primitives? (See other paper presented today)
- Does NT provides such primitives? In general, has use of these primitives caught on?