# **Review: important OS concepts**

- Time-sharing, context, context-switch
- Interprocess communication
- Exception control flow
- Priority and scheduling
- Cache and memory hierarchy (this lecture)

#### **Cache & Memory Hierarchy**

### **Recall our abstraction: CPU + memory**

#### CPU

Register1

**Register2** 

#### (More registers)

#### **CPU** has a number of registers.

#### Memory is a two-column table.

| Address   | Content |
|-----------|---------|
| #ffffffff | 8bits   |
| •••       |         |
| • • •     |         |
| #0000002  | 8bits   |
| #0000001  | 8bits   |
| #00000000 | 8bits   |



### Recall our abstraction: CPU + memory







#### (More registers)

### Why cache in the middle?

#### Cache

- Cache is faster than memory.
- Memory has larger capacity than cache.
- Memory is cheaper than cache in terms of \$/byte.

| Address   | Content |
|-----------|---------|
| #fffffff  | byte    |
|           |         |
|           |         |
| #00000002 | byte    |
| #00000001 | byte    |
| #0000000  | byte    |



#### What to learn about cache?



Register1



(More registers)



### Write-through and Write-back cache

- Write-through and write-back are two types of cache.
  - You are going to implement both in P3.
  - The general structure is a 3-column table.

|         | Cache   |      |
|---------|---------|------|
| Address | Content | In u |
| Addr1   | 8bits   | Y    |
| ????    | ????    | Ν    |
| • • •   |         |      |



### **Read** in Write-back and Write-through



}

| ache    |          |      | Address   | Cont |
|---------|----------|------|-----------|------|
| Content | In use?  | load | #ffffffff | 8bi  |
|         |          |      |           |      |
|         |          |      |           |      |
|         |          |      | #0000002  | 8bi  |
| d(addr_ | t) {     |      | #0000001  | 8bi  |
| local   | structur | e or | #0000000  | 8bi  |

// read memory using load



#### Write in both Write-back and Write-through



|         | Cache                |         |       | Address   | Cont             |
|---------|----------------------|---------|-------|-----------|------------------|
| Address | Content              | In use? |       | #ffffffff | 8bi              |
|         |                      |         |       | •••       |                  |
|         |                      |         | store |           |                  |
|         |                      |         |       | #0000002  | 8bi              |
| -       | addr_t,<br>ite local |         |       | #0000001  | 8bi              |
|         | d maybe w            |         |       | #0000000  | 8bi <sup>-</sup> |
| // usi  | ing store            |         |       | L         |                  |



# Sync (or flush) in Write-back cache



### Write-through and Write-back cache

- Write-through and write-back are two types of cache.
  - You are going to implement both in P3.
  - The general structure is a 3-column table.
  - Write-through cache:
    - read + write using load + store
  - Write-back cache:
    - read + write + sync using load + store

| Cache   |         |         |
|---------|---------|---------|
| Address | Content | In use? |
| Addr1   | 8bits   | Yes     |
| ????    | ????    | ΝΟ      |
|         |         |         |



# Question: what about dirty bit? When is a dirty bit useful?

You may recall something called dirty bit that you learned in 3410.

#### Cache eviction

- When the cache is full and a new entry needs to be added, the cache evicts an entry back to the memory.
  - In write-through cache, the evicted cache entry does NOT need to be stored back to memory.
  - In write-back cache, the evicted cache entry, if dirty, needs to be stored back to memory.
- In P3, you will implement the CLOCK algorithm for cache eviction which will be taught in 4410 (Oct 27).

### Cache & Memory Hierarchy





Picture source: https://link.springer.com/article/10.1007/s00778-019-00546-z

### Example: internal of Intel i7 CPU





### **CPU cache hierarchy**



### **CPU cache hierarchy**



From Figure 6.39 of **Computer Systems A Programmer's Perspective** 



#### Memory hierarchy performance and capacity



| cess time | Capacity |
|-----------|----------|
| cycles    | 32KB     |
| 0 cycles  | 256KB    |
| 75 cycles | 8MB      |
| 00 cycles | 4-16GB   |
| M cycles  | >1TB     |

- Cache makes memory access faster, but cache has smaller capacity and is more expensive.
- Different levels of cache form a memory hierarchy.
  - CPU cache hosts KB and costs tens of CPU cycles
  - Main memory hosts GB and costs hundreds of CPU cycles
  - Disks hosts TB and costs millions of CPU cycles

## Take-aways

# Homework

- P3 is released today due on Nov 6. Implement write-back and write-through cache with the CLOCK algorithm.
- Read page241 of the Intel's IA-32 manual Volume2 (https://www.intel.com/content/dam/www/public/us/en/ documents/manuals/64-ia-32-architectures-softwaredeveloper-instruction-set-reference-manual-325383.pdf) about the CLFLUSH instruction.



# Just for fun

#### Main memory internal structure and row-hammer attack



- Further reading: section 6.1 of Computer Systems A Programmer's Perspective.



# Just for fun

# Main memory internal structure and row-hammer attack



(a) Select row 2 (RAS request).

(b) Select column 1 (CAS request).

Figure 6.4 Reading the contents of a DRAM supercell.