Lecture 14: memory management

Hashed paging
Abusing TLB permission bits
Page replacement algorithms
- Random
- FIFO
  - Belady's anomaly
- OPT (Belady's algorithm)
- LRU, LFU
Board image
Spring slides

Hashed paging

I have added hashed paging to last lecture's notes since it is part of the design of an inverted page table.

Shared memory

Since processes have their address spaces divided up into pages, it is possible for them to share some but not all of their address spaces with other processes. This gives the best of both worlds between processes and threads: the processes can communicate with each other very quickly (just reading and writing memory), but if their stacks and heaps are not shared, it is much harder for them to cause each other to crash or do the wrong thing.

It is easy to accomplish this: just have the corresponding entries in both page tables mapping to the same frame of physical memory. For example, if the page size is 0x1000, we could have P1's logical address 0x12345678 point to the same physical address as P2's logical address 0x77771678. Note that the offsets have to be the same (since that part of the address is not translated).

A common approach is for processes to use shared memory for large amounts of data, and message passing (by making system calls) for small communication messages (much as we use DMA for data and MMIO for control at the level of the bus).

Abusing TLB permission bits

TLB permission bits give the OS a way to be interrupted when certain pages are accessed in certain ways (by clearing the corresponding permission bit). This can be used for various things other then protecting segments:

allocating zeroed pages: if a process wants a large chunk of memory to be zeroed out, the OS can do this lazily by revoking read and write access. When the process tries to access the page, a fault will be raised, and the OS can zero that page at that point in time.
copy on write: if a process wants to duplicate a large chunk of memory, the OS can instead make page table entries of the two copies refer to the same frame, but mark them read-only. If one of the processes tries to write, the OS can make the copy at that point in time.
debugger breakpoints on memory accesses: access permissions to pages of the debugged process can be cleared; when the process tries to access those pages, the OS will receive an exception. It will then block the process and notify debugger. The debugger can then check whether there is a break point set, and resume the program if not (notifying the user if there is).

Page replacement

The advantage of virtual memory is that processes can be using more memory than exists in the machine; when memory is accessed that is not present (a page fault), it must be paged in (sometimes referred to as being "swapped in", although some people reserve "swapped in to refer to bringing in an entire address space).

Swapping in pages is very expensive (it requires using the disk), so we'd like to avoid page faults as much as possible. The algorithm that we use to choose which pages to evict to make space for the new page can have a large impact on the number of page faults that occur. We discuss a number of these algorithms in this lecture.

Random

When we need to evict a page, choose one randomly

pros: extremely simple
cons: can easily make "bad" choices by swapping out pages right before they are needed.

First-in, First-out (FIFO)

When we need to evict a page, choose the first one that was paged in. This can be easily implemented by treating the frames as a circular buffer and storing a single head pointer. On eviction, replace the head, and then advance it. It will always point to the first-in page.

pros: extremely simple
cons: can still make "bad" choices if the first-in page is accessed in the near future.

FIFO is also susceptible to "Belady's anomaly": it is possible that adding more frames can actually make performance worse! For example, consider the trace

Step	1	2	3	4	5	6	7	8	9	10	11	12
Access	1	2	3	4	1	2	5	1	2	3	4	5

Using FIFO with three frames, we incur 9 page faults (work this out!). With four frames, we incur 10 faults! It would be nice if buying more RAM gave us better performance.

Belady's algorithm (OPT)

When we need to evict a page, evict the page that will be unused for the longest time in the future.

pros: provably optimal number of page faults
cons: requires predicting the future (impossible)

Example: on the same trace as above with three frames:

Step	1	2	3	4	5	6	7	8	9	10	11	12
Access	1	2	3	4	1	2	5	1	2	3	4	5

Initially, memory is empty:

frame:	0	1	2
page:

In the first three steps, we incur three page faults and load pages 1, 2, and 3

frame:	0	1	2
page:	1	2	3

In step 4, we access page 4, incurring a page fault. Page 1 is used in step 5, page 2 is used in step 6, but page 3 is not used until step 10, so we evict page 3.

frame:	0	1	2
page:	1	2	4

Steps 5 and 6 do not incur page faults. In step 7, we need to evict a page. Page 1 is used in step 8, page 2 is used in step 9, but page 4 isn't needed until step 11, so we evict page 4.

frame:	0	1	2
page:	1	2	5

Steps 8 and 9 do not incur page faults, but step 10 does. Again, we consider the future uses of the data in memory; neither page 1 nor 2 will be used in the future, so we can evict either.

frame:	0	1	2
page:	3	2	5

Similarly in step 11, we can evict either page 3 or page 2:

frame:	0	1	2
page:	3	4	5

Finally, we execute step 12, which incurs no page fault. This gives a total of 7 page faults. This is guaranteed to be the optimal number.

LRU, MRU, LFU

Although we cannot predict the future, we can estimate it based on past behavior. Most programs exhibit both spatial and temporal locality:

Spatial locality: after accessing an address, it is likely that nearby addresses will also be accessed.
Temporal locality: pages that are accessed frequently are likely to be accessed in the near future

Exploiting these assumptions leads to the following algorithms: - Least frequently used (LFU) assumes that pages that have been accessed rarely are unlikely to be accessed again. Keep a count of how many times each page is accessed, evict the page with the lowest count - Least recently used assumes that pages that were accessed recently are likely to be needed. Keep a timestamp of latest access, evict the page with the lowest timestamp. - Most recently used assumes that programs do not read the same addresses multiple times. For example, a media player will read a byte and then move on, never to read it again. As with LRU, keep a timestamp of latest access, but evict the page with the highest timestamp.

These algorithms exploit locality to approximate OPT, and thus can often do a good job of reducing page faults. However, implementing them is very difficult: - a count or timestamp needs to be updated on every access; this requires hardware support, and an extra register per TLB entry (expensive!) - there is one count/timestamp per page; to find the process to evict, we have to traverse the entire frame list.