Lecture 10: Hardware support for memory management

Managing memory
- goals/problems: isolation, virtualization, fragmentation
Hardware support
- base/bounds MMU (strawman), TLB
key terms: physical address, logical/virtual address, MMU, TLB, page, frame, offset, address translation
Board image
Spring slides

Managing memory

Thus far we have talked about sharing the processor between many processes. We also need to share memory between processes. There are several criteria we must consider for the design of the memory system:

isolation: different processes should be able to have private spaces that cannot be accessed by other processes
virtualization: different processes may want to have specific addresses. For example, all processes may want an "address 0". Moreover, we'll see later we can give processes the illusion that they have access to more memory than is physically available in the machine.
utilization: RAM is expensive, we want to be able to use all of it.

First strawman: one big address space

Could just stick all processes into physical memory; treat the memory as a big heap. When a process needs a new object, allocate it some space, and when a new process comes along, allocate it space for its stack and code, etc.

Problems:

no isolation: nothing prevents processes from stomping on each other (or the kernel)
no virtualization.

Second strawman: address ranges

Allocate each process a range of addresses. For example, process P2 might have access to addresses 0x1000 to 0x1200, while process 2 has access to addresses 0x1400 to 0x2000, and process 3 has access to 0x2000 through 0x4000:

Addr	Contents
0x0000




0x1000	P1

	P2


0x2000	P3




0x3000




0x4000




0x5000

To provide isolation, we need some way to control access. If process 2 tries to access address 0x2222, it should fail. To support this, we introduce a memory management unit (MMU). An MMU is a piece of hardware that sits between the processor and the memory bus; it does something for each load and store instruction.

In this case, the MMU stores the current range of addresses that are available to the running process. Each access to memory is compared to the range; if it is out of range, the MMU causes an exception. The address range is set by the operating system during a context switch.

For example, before scheduling process P2, the OS sets the address range registers to 0x1400 and 0x2000. If P2 accesses address 0x2222, the MMU sees that the address is out of range, and causes an exception. The OS exception handler can then terminate P2, or cause P2 to jump to an exception handling routine, for example.

Note: the values of the base and limit registers would be stored in the process's PCB.

Problems:

fragmentation: as processes come and go, we end up with holes between the allocated blocks of memory. If a new process appears, we may find a situation where there is enough space to satisfy the new process, but we can't because the space is not contiguous. This situation is referred to as external fragmentation.
processes still know their locations: pointers within a process's address space are not relative, so the process cannot be relocated (which we're not doing in this strawman, but we may want to)
processes need to know how much space they need: to allocate slots for each process, we need to know how much space it might need. If we overestimate, we may have internal fragmentation: unused space within an allocated block which we cannot use because it has been allocated.

Third strawman: address translation

A small modification to the above gives us some virtualization. The addresses that the process knows about (virtual addresses) can be different from the actual addresses in physical memory (physical addresses). Continuing with the above layout, we can allow each process to number its addresses starting from 0 (so that P1 has addresses 0x000-0x200, P2 has addresses 0x000-0x600, and P3 has addresses from 0x0000 to 0x2000). The MMU can add the base register to the virtual address to find the physical address.

For example, when context switching to P2, the OS sets the MMU's base register to 0x1400 (since that is the start of P2's physical address space). Later, P2 may load from address 0x352. The MMU translates adds this to the base register (getting physical address 0x1752); it then compares it to the upper bound as above; since it is in range, the load proceeds normally.

This has two benefits: first, the program can hardcode addresses (not that this is necessarily a good idea). Second, we could imagine relocating a process to solve fragmentation issues: we would simply move the data in (physical) memory, and then update the process's base and limit registers. Note that this is also a bad idea: moving an entire address space would be quite expensive. But at least it is possible.

Translating a virtual address space to a physical address space is called address translation. Address translation and access control are the two most important jobs that the MMU performs.

Paging

Paging solves many of the remaining problems discussed above. The idea behind paging is that we split a process's logical address space into many small chunks called pages. Pages are typically on the order of a few kilobytes.

In addition, we divide up the physical address space into frames. We are allowed to place any page into any frame.

The terminology can be useful as a guide to understanding what's going on. Think of pages as numbered pages in a three-ring binder. A process (binder) contains a large number of pages. On your desk (physical memory), you have a small number of picture frames. In order to read a page, you must take it out of the binder and place it in a frame.

The advantages of this scheme are many:

each process can have an entire 32-bit address space without requiring 2^32 contiguous bytes of physical memory to be allocated to it.
in fact, with clever management and a little luck, you can use physical memory to store just the pages that the process is actually using (see page replacement, next lecture)
it is easy to implement shared memory between processes by just allocating the same page to both of them.

To accomplish paging, we need a more complicated MMU. The MMU must do the following to translate a virtual address:

split the address into a page number and an offset into the page
find the frame containing the page
add the offset to the physical address of the frame to find the desired physical address

For example, suppose the page size is 4kb (0x1000b), and the contents of physical memory are:

frame #	physical address	frame contents
0x0	0x0000	Process 3, page 0x85
0x1	0x1000	Process 2, page 0x02
0x2	0x2000	Process 3, page 0x14
0x3	0x3000	Process 1, page 0x01

Suppose process 3 accesses (virtual) address 0x14303. This address refers to the 0x303rd byte of the 0x14th page of process 3's address space (the offset is 0x303, and the page number is 0x14). The MMU will find that process 3's page 0x14 is mapped into frame 0x2 (which always starts at physical address 0x2000). Therefore, the physical address that it outputs is the 0x303rd byte of the 0x2nd frame, or 0x2303.

How does the MMU actually work with these fancy paging schemes? The key component is the translation lookaside buffer (TLB).

The TLB is a small hardware associative array (think tens to hundreds of entries) that maps page numbers to frame numbers.

As the program executes, the page numbers stored in virtual addresses are compared with all of the entries in the TLB (this is done in hardware, so all comparisons can happen simultaneously). If an entry matches, the corresponding frame number is combined with the offset to give the physical address.

If no entry matches, there is a TLB miss. If using a software-managed TLB, this miss will cause an exception to be raised; the operating system is then responsible for traversing the page tables to find the corresponding frame; it then loads the mapping into the TLB and continues. If using a hardware-managed TLB, the TLB is responsible for traversing the page table structure; it only raises an exception if the page table has not yet been properly configured.

Recall that each process has its own address space, and thus its own page table. This means that when the OS context switches to a new process, it switches the pointer to the root of the page table to a the page table for the new process (this is the "VM info" stored in the PCB referred to in the first week). This will cause all of the TLB entries to become invalid. This is called a TLB flush. Repopulating the TLB is a large component of what makes context switching between processes expensive.

The cost can be mitigated somewhat by adding process identifiers to the TLB lines and allowing the TLB slots to be split between multiple processes. This is called a tagged TLB.

Swapping (Paging in and out)

It may seem like we can only run as many processes as we have physical memory to store, but in fact if we have a backing store (i.e. a disk), we can move pages that aren't currently being used to disk.

If a new page is needed and there is insufficient space to allocate for it, we can take an existing page and page it out: move the contents of the page to disk. We can then reuse the frame for a new page. Later, when the page is needed again, we can page it in: move it from the disk back into a region of physical memory.

Because processes do not have any information about their physical addresses, we don't have to swap a page back into the same physical location that we paged it out of. We simply need to update the page table to point to the current frame.

Note however that paging is extremely slow; communicating with disk is several orders of magnitude more expensive than communicating with main memory. But, if the user wants to run more processes than they have ram, there isn't much choice.

Note on terminology: most people use the terms "paging" and "swapping" interchangably. Originally, swapping referred to move an entire process's address space to disk (as opposed to a single page) but this is a silly thing to do.

Segmentation

It can be useful to mark different regions of a process's address space with different read/write/execute privileges. For example, a process is typically divided into a kernel area, a heap area, a stack area, a code area, and so forth. These large areas are called segments.

It makes sense to read and write in your heap, but not to jump there; conversely it makes sense to jump into your code section, but not to write it. Any access to unallocated space is an error.

The TLB can help us enforce these conventions. Each TLB entry has additional read, write, and execute bits. While translating an address, the TLB will also check whether the type of access is valid for the corresponding page. If not, it can raise an exception, and the OS can handle it appropriately. This is the source of your favorite C error: a segmentation fault occurs whenever you access a "bad" pointer: a pointer to a page of memory that hasn't been allocated with the corresponding permissions.