In the last project you wrote a simulator for the MIPS CPU. At the time, it was assumed that the CPU could accessed memory directly, and that every lw
or sw
instruction resulted in an actual memory read or write. In reality, accessing memory is a fairly slow operation, and one or more levels of high-speed cache are used improve performace. In this project you will implement a simple one-level cache for your simulator. The purpose is not to make your simulator run faster (actually, it'll run slower due to the increased overhead of simulating the cache), but to gain insight into how caching affects the performance of a real processor.
As a starting point, download the cache framework. After unpacking the archive, you'll need to copy print.c
and run.c
from your previous projects into the directory. After that you will only need to modify cache.c
.
You will implement a 4-way set associative cache. This means that any address in memory can be mapped to one of four possible slots in the cache. To take advantage of spacial as well as temporal locality, the cache will hold blocks of memory rather than individual words. Thus, when a particular word is accessed for the first time, a block of serveral words will be loaded into the cache. If the CPU then needs to access the word at the address immediately following the one accessed previously, chances are that it is in the same block and already in the cache. To determine which block a word is in, split its address into three parts:
tag | set | offset | |||||||||||||||||||||||||||||
31 | 30 | 29 | 28 | 27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 | 19 | 18 | 17 | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
The high order bits are the tag, a unique identifier for any block in memory. The middle bits are the set identifier, which specify which slots in the cache the block may be placed in. The lower order bits are the offset within the block. For each block in the cache, the tag is stored. A special tag may be reserved to indicate that that a cache slot is empty. To check whether a given address is in the cache, it is necessary to search the cache for a matching tag. Recall that a two-way set-associative cache is laid out as follows (you should modify this design to be a four-way set associative cache):
Set | Tag | Data | Tag | Data |
00 | ||||
01 | ||||
10 | ||||
11 | ||||
As shown above, use 25 bits for the tag, 2 bits for the set, and 5 bits for the offset. This means the block size is 32 bytes, or 8 words. Your cache will have 4 sets. All in all, this makes for a very small cache, compared to those on a real system. This is in proportion to the simple programs that you will run on the simulator, however.
At some point a set will fill up, and one block will need to be evicted to make room for another. You should implement a least recently used (LRU) policy, always choosing to evict the line from the set that has not been accessed for the longest time.
Memory writes should use write-through caching. This means that whenever some memory is written, the write updates both memory and the cache simultaneously. Note that the presense of the cache does not speed up write operations, but it does speed up any subsequent reads of the address that was just written. Write-through caches are the simplest to implement, and offer good performance when read operations significantly outnumber write operations.
Since the purpose of this project is to investigate how caching affects performance, you will need to keep track of how the cache affects memory access. Your cache simulator should count the total number of memory accesses. Keep separate counters for reads and writes. Also count how many reads are fulfilled by the cache (hits) versus how many are not (misses).
When the application terminates, calculate the hit rate and print out the following information:
Keep your implementation simple. Realize that you do not need to simulate parts of the behaviour of a real cache, such as keeping a local copy of memory values. It is enough to simply keep track of which memory blocks are presently in the cache, logging cache hits and misses, but storing the actual memory content only once, as in the previous project.
The file cache.c
contains the regular memory access functions (like mem_read_word()
) that you have been using previously. This will allow you to use your existing run.c
without modification on the cache simulator. Inside of cache.c
, you will need to access memory by prefixing the functions with an underscore (see the examples in cache.c
).
You may notice that the cache framework takes some command line parameters for printing the stack and disassembling the instructions it executes (features you used on homework 5, and possibly project 5). If you wish to use these features, you'll need to add something like the following to your run.c
:
// Dump registers if (opt_dumpreg) print_registers(); // Print the stack if (opt_printstack) print_stack(); // Print the instruction if (opt_disassemble) print(sim_to_real_addr(pc), 1, pc);
In case you have not previously compiled C programs on Linux, you will find the following commands helpful.
Extract the project from the tarball: tar -xvzf cache.tar.gz
Compile the project: make
Delete the executable and object files, leaving your source code intact: make clobber
(this is normally not necessary, as make
will selectively rebuild the portions of your project that have changed)
Run the project: ./simulate <filename>
Note: These suggestions for an extra challenge will be examined (and commented on, if your project works well) but not graded. They will have no impact on the class grades. They are here to provide some direction to those who finish their assignments early and are looking for a way to impress friends and family.
Ask the TAs for help. We expect to see most students in office hours during the course of the project. Extra hours will be scheduled as needed.
If you suspect a bug in the cache framework... ask Michael for help. There could be bugs, although none were encountered while creating the example solution to this project.