Lecture 10: Hardware support for memory management

Managing memory

Thus far we have talked about sharing the processor between many processes. We also need to share memory between processes. There are several criteria we must consider for the design of the memory system:

First strawman: one big address space

Could just stick all processes into physical memory; treat the memory as a big heap. When a process needs a new object, allocate it some space, and when a new process comes along, allocate it space for its stack and code, etc.

Problems:

Second strawman: address ranges

Allocate each process a range of addresses. For example, process P2 might have access to addresses 0x1000 to 0x1200, while process 2 has access to addresses 0x1400 to 0x2000, and process 3 has access to 0x2000 through 0x4000:

Addr Contents
0x0000



0x1000 P1

P2

0x2000 P3









0x3000
0x4000

0x5000

To provide isolation, we need some way to control access. If process 2 tries to access address 0x2222, it should fail. To support this, we introduce a memory management unit (MMU). An MMU is a piece of hardware that sits between the processor and the memory bus; it does something for each load and store instruction.

In this case, the MMU stores the current range of addresses that are available to the running process. Each access to memory is compared to the range; if it is out of range, the MMU causes an exception. The address range is set by the operating system during a context switch.

For example, before scheduling process P2, the OS sets the address range registers to 0x1400 and 0x2000. If P2 accesses address 0x2222, the MMU sees that the address is out of range, and causes an exception. The OS exception handler can then terminate P2, or cause P2 to jump to an exception handling routine, for example.

Note: the values of the base and limit registers would be stored in the process's PCB.

Problems:

Third strawman: address translation

A small modification to the above gives us some virtualization. The addresses that the process knows about (virtual addresses) can be different from the actual addresses in physical memory (physical addresses). Continuing with the above layout, we can allow each process to number its addresses starting from 0 (so that P1 has addresses 0x000-0x200, P2 has addresses 0x000-0x600, and P3 has addresses from 0x0000 to 0x2000). The MMU can add the base register to the virtual address to find the physical address.

For example, when context switching to P2, the OS sets the MMU's base register to 0x1400 (since that is the start of P2's physical address space). Later, P2 may load from address 0x352. The MMU translates adds this to the base register (getting physical address 0x1752); it then compares it to the upper bound as above; since it is in range, the load proceeds normally.

This has two benefits: first, the program can hardcode addresses (not that this is necessarily a good idea). Second, we could imagine relocating a process to solve fragmentation issues: we would simply move the data in (physical) memory, and then update the process's base and limit registers. Note that this is also a bad idea: moving an entire address space would be quite expensive. But at least it is possible.

Translating a virtual address space to a physical address space is called address translation. Address translation and access control are the two most important jobs that the MMU performs.

Paging

Paging solves many of the remaining problems discussed above. The idea behind paging is that we split a process's logical address space into many small chunks called pages. Pages are typically on the order of a few kilobytes.

In addition, we divide up the physical address space into frames. We are allowed to place any page into any frame.

The terminology can be useful as a guide to understanding what's going on. Think of pages as numbered pages in a three-ring binder. A process (binder) contains a large number of pages. On your desk (physical memory), you have a small number of picture frames. In order to read a page, you must take it out of the binder and place it in a frame.

The advantages of this scheme are many:

To accomplish paging, we need a more complicated MMU. The MMU must do the following to translate a virtual address:

For example, suppose the page size is 4kb (0x1000b), and the contents of physical memory are:

frame # physical address frame contents
0x0 0x0000 Process 3, page 0x85
0x1 0x1000 Process 2, page 0x02
0x2 0x2000 Process 3, page 0x14
0x3 0x3000 Process 1, page 0x01

Suppose process 3 accesses (virtual) address 0x14303. This address refers to the 0x303rd byte of the 0x14th page of process 3's address space (the offset is 0x303, and the page number is 0x14). The MMU will find that process 3's page 0x14 is mapped into frame 0x2 (which always starts at physical address 0x2000). Therefore, the physical address that it outputs is the 0x303rd byte of the 0x2nd frame, or 0x2303.

How does the MMU actually work with these fancy paging schemes? The key component is the translation lookaside buffer (TLB).

The TLB is a small hardware associative array (think tens to hundreds of entries) that maps page numbers to frame numbers.

As the program executes, the page numbers stored in virtual addresses are compared with all of the entries in the TLB (this is done in hardware, so all comparisons can happen simultaneously). If an entry matches, the corresponding frame number is combined with the offset to give the physical address.

If no entry matches, there is a TLB miss. If using a software-managed TLB, this miss will cause an exception to be raised; the operating system is then responsible for traversing the page tables to find the corresponding frame; it then loads the mapping into the TLB and continues. If using a hardware-managed TLB, the TLB is responsible for traversing the page table structure; it only raises an exception if the page table has not yet been properly configured.

Recall that each process has its own address space, and thus its own page table. This means that when the OS context switches to a new process, it switches the pointer to the root of the page table to a the page table for the new process (this is the "VM info" stored in the PCB referred to in the first week). This will cause all of the TLB entries to become invalid. This is called a TLB flush. Repopulating the TLB is a large component of what makes context switching between processes expensive.

The cost can be mitigated somewhat by adding process identifiers to the TLB lines and allowing the TLB slots to be split between multiple processes. This is called a tagged TLB.

Swapping (Paging in and out)

It may seem like we can only run as many processes as we have physical memory to store, but in fact if we have a backing store (i.e. a disk), we can move pages that aren't currently being used to disk.

If a new page is needed and there is insufficient space to allocate for it, we can take an existing page and page it out: move the contents of the page to disk. We can then reuse the frame for a new page. Later, when the page is needed again, we can page it in: move it from the disk back into a region of physical memory.

Because processes do not have any information about their physical addresses, we don't have to swap a page back into the same physical location that we paged it out of. We simply need to update the page table to point to the current frame.

Note however that paging is extremely slow; communicating with disk is several orders of magnitude more expensive than communicating with main memory. But, if the user wants to run more processes than they have ram, there isn't much choice.

Note on terminology: most people use the terms "paging" and "swapping" interchangably. Originally, swapping referred to move an entire process's address space to disk (as opposed to a single page) but this is a silly thing to do.

Segmentation

It can be useful to mark different regions of a process's address space with different read/write/execute privileges. For example, a process is typically divided into a kernel area, a heap area, a stack area, a code area, and so forth. These large areas are called segments.

It makes sense to read and write in your heap, but not to jump there; conversely it makes sense to jump into your code section, but not to write it. Any access to unallocated space is an error.

The TLB can help us enforce these conventions. Each TLB entry has additional read, write, and execute bits. While translating an address, the TLB will also check whether the type of access is valid for the corresponding page. If not, it can raise an exception, and the OS can handle it appropriately. This is the source of your favorite C error: a segmentation fault occurs whenever you access a "bad" pointer: a pointer to a page of memory that hasn't been allocated with the corresponding permissions.