A9: Cache Simulator
Instructions:
Remember, all assignments in CS 3410 are individual.
You must submit work that is 100% your own.
Remember to ask for help from the CS 3410 staff in office hours or on Ed!
If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.
The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.
Submission Requirements
You will submit your completed solution to this assignment to Gradescope. You must submit:
cache.c, which will be modified with your cache implementationcache_stats.c, which will be modified with yourcalculate_stat_ratesimplementation
Restrictions
- You may not modify any files other than
cache.candcache_stats.c. - You may not include any libraries beyond what is already included for you.
Provided Files
The provided release code contains several files, of which you will modify two:
cache.c&cache.h—Data structures and functions for creating and running a simulated cache. You will do most of your programming work for this assignment incache.c.cache_stats.c&cache_stats.h—Data structures and functions for collecting data about simulated cache execution. You will need to make some modifications tocache_stats.c.cachesim.c—The main driver code for this simulator. You won’t modify this.print_helpers.c,print_helpers.h—Various functions to help in formatting the output of your simulation runs. You don’t need to modify these.simulator.c&simulator.h—Code for execution of the simulator itself. You won’t modify these, either.Makefile—A basic make file, with familiarallandcleantasks.trace/trace.[long|short|tiny].txt—The trace files.
Getting Started
To get started, obtain the release code by cloning the cachesim repository from GitHub:
git clone git@github.coecis.cornell.edu:cs3410-2025fa-student/<YOUR NET ID>_cachesim.git
- Note: Please replace the
<YOUR_NET_ID>with your NetID. For example, if you NetID iszw669, then this clone statement would begit clone git@github.coecis.cornell.edu:cs3410-2025fa-student/zw669_cachesim.git
Overview
In this assignment, you will write a performance simulator: a C program that simulates the behavior of a cache. Software performance simulators are used to explore cache designs because they are faster to implement and modify than circuit-level simulators (like Logisim, for example). The simulator will be configurable so that it can model the behavior of a variety of cache configurations.
Your cache simulator will take as input a memory trace: a list of load and store instructions that were executed during a single run of a single program. We refer to each load and store instruction as a memory reference. You will use a single-core memory trace to test your cache.
Trace Files
The memory traces you will be using were generated by running a program and recording all of the loads and stores issued by the processor during that run. What they are actually doing doesn’t really matter for the purposes of this assignment. The traces have names of the form “trace.size.txt”, where size is one of tiny, short, or long.
Each line of the trace file consists of two fields: a single letter (r or w) and an 8-digit hexadecimal value. For example:
r 1ce026e0
r 7d300000
w 026d7680
r 1ce024d0
r f6b3fca0
r f6b3fc98
r 2db23e10
r f6b3fca8
r f6b33178
w f7706ab8
The first field is always r or w. An r indicates a load (read) and a w indicates a store (write).
The second field is the address of the memory operation (in hex). Taking a look at the variable action (in simulator.c) will help you further understand the format of the trace file.
Implementation
All of the code you write for Tasks 1–4 will be done in the source file cache.c. The header comments in this file’s functions contain important information about the implementation requirements. In Tasks 5–6, you will modify code in the cache_stats.c. You should also study (but don’t modify) the cache.h and cache_stats.h header files.
Task 1: Initializing the Cache
Complete the make_cache function in cache.c. You need to set correct values for the variables for n_offset_bit, n_set, n_cache_line, n_index_bit, and n_tag_bit.
Notice there is a log2 function in the included <math.h> library.
Because we’re not really manipulating the actual data values (as a real processor would), but only calculating hits and misses for each memory reference, there is no need to store data values in the cache (in fact, the trace doesn’t even give you the data values).
The cache can be represented by a 2-dimensional array of lines containing only the metadata: tags, dirty bits, and is_valid. (The second dimension only becomes relevant when the associativity of the cache is higher than 1.) In the make_cache function, this is the variable cache->lines. Complete the make_cache function by creating the array of tags that will be used to determine whether a cache hit or miss has occurred.
Initialize all tag values to zero, dirty bit flags to false, and the valid flag to false.
Note that you need to call make_cache_stats() from inside make_cache() in cache.c. This is already done for you, but think about why this call is important.
For the time being, you can ignore the lru_way field. You will revisit this in Task 5.
Task 2: Extracting Bits from Addresses
Continue coding up your cache.c by completing the three methods used to extract the tag bits, index bits, and block address from any given address. The method names are as follows:
get_cache_tag(): returns the tag portion of a given address.get_cache_index(): returns the index portion of a given address.get_cache_block_addr(): returns the given address with the offset bits zeroed out.
For example, if your address is 0b111101010001 (3921 in decimal) and there are 4 bits each in the tag, index, and offset, then
get_cache_tag(0b111101010001)returns0b1111(15),get_cache_index(0b111101010001)returns0b0101(5), andget_cache_block_addr(0b111101010001)returns0b111101010000(3920).
Do not convert the integer values to character strings (which is slow and unnecessarily verbose). Integer bit manipulation instructions (shifts, bitwise and, or, not, etc.) can be used to easily and efficiently accomplish such bit extraction operations.
The autograder will call these three methods twice, beginning with the address 0b111101010001 and print out the values for these variables. Note that they will not match the above example, because the tag, offset, and index bits will not each be 4 bits long (the addresses in the simulator are all 32 bits).
Task 3: Implementing a Direct-Mapped Cache
Complete the access_cache function for a direct-mapped cache by following the steps described in the function’s header comment (you can ignore lru_wayanddirty_ffor now). This function performs the cache lookup for each load or store; it returns true if the access is a hit,false if the access results in a miss. Writing this function is the most challenging part of the assignment. Read the rest of the assignment and the code you have been given to correctly implement the access, and do not try to implement the entire function at once. Instead, build on the function as each new concept is introduced below. Every time you change your code, re-run a test that used to work and make sure that as you create new functionality, you don’t break something that used to work.
Your simulator can now simulate a direct-mapped cache. Let’s test it out!
For now, handle loads and stores in the same fashion. In the next task you will set dirty_f depending on whether or not it is a store. Additionally, if there is a tag match, then it is a hit, regardless of whether it was a load or a store.
Task 4: Implementing Set Associativity
Before moving on: create a copy of your working cache up to this point. Run the following command
cp cache.c cache_direct_mapped.c
to copy your cache.c into a new file cache_direct_mapped.c. Ultimately, your final cache.c will be a super-set of cache_direct_mapped.c, but you may find it helpful to have this working snapshot to refer to as you evolve your cache simulator.
Now modify your simulator to support n-way set associative caches. Each set in the cache will now need an array of cache lines (one for each way), and a single “LRU” (least recently used) field used for cache replacement. Note: you will not be implementing true LRU (which would require a counter per way and is not practical). Instead, you will implement an approximation of LRU (you might want to think of it as NMRU “not most recently used”).
- direct mapped: LRU field is always 0
- 2 way set associative: LRU field always identifies to the way you didn’t just touch
- > 2 way set associative: LRU field always identifies the way 1 higher (with wrap-around) than the way you just touched. Examples:
- If there are 4 ways, and you touch way 0, then the new LRU way should be way 1
- If there are 8 ways, and you touch way 7, the new LRU way should be 0
When replacing a block, replace the LRU entry (and don’t forget to update the LRU bit to point to the other entry). You should also update the LRU entry after a hit because a cache hit counts as a “use” of that cache line.
Task 5: Implementing Write-Back Caches
Have your simulator produce the 3 statistics associated with memory traffic in cache_stats.c: B_bus_to_cache, B_cache_to_bus_wb, and B_total_traffic_wb. We will also model write-through caches in Task 7!
Traffic into the cache (B_bus_to_cache) is easy to calculate: it is just the number of misses multiplied by the block size, because for each miss you must bring into the cache that entire block of memory. Traffic out of the cache is write traffic, and it depends on whether the cache is write-back or write-through. Your simulator can calculate the traffic for both write-back caches in a simulation as follows:
Write-back: For the write-back cache system, the traffic out of the cache (B_cache_to_bus_wb) is the number of writebacks multiplied by the block size. Because at most one writeback happens per miss, this write-back traffic can at most double the overall traffic (in the case in which all evicted blocks were dirty). To calculate the number of writebacks, modify your simulator to update the “dirty” bit for each block in the cache. Set the dirty bit whenever a store operation writes to the block. This dirty bit is not per word or per byte; it is one dirty bit for the entire cache line.
Lastly, compute and store the total traffic for your write-back cache in the cache_stats.c variable B_total_traffic_wb.
Use a write-allocate policy, meaning if a write misses in the cache, that cache line should be brought into the cache.
Task 6: Write-Through Statistics
Have your simulator produce 2 more statistics - B_cache_to_bus_wt and B_total_traffic_wt. These are hypothetical statistics if the cache was a write-through cache instead of a write-back cache.
In a write-through cache, dirty bits are not used - any write operations pull that word (and therefore block) to the cache, in addition, that word is immediately written back to memory.
Since each word is 4 bytes, the number of bytes written cache to bus in write-through is 4 times the number of store operations encountered in the trace. Note that this therefore has no dependency on cache parameters - block size, associativity.
Running and Testing
To make it easier to compile and run your code, we’ve provided a Makefile. To
build your program, simply type rv make.
Testing
To help you debug your cache simulator, we’ve provided some basic tests from the shorter trace.short.txt (1000-3000 lines) trace. The way to use it is to run the sample input command and compare your result with the sample output command. Comparing your results to these sample outputs should help ensure your code is compiling properly and satisfies basic functionality. These debug runs are a completely optional portion of the assignment, but it is one way to make sure you are off to a good start.
Here are the three sample commands and their outputs:
Test 1
rv qemu cachesim -t trace.short.txt -cache 9 5 1
Simulation Printout for CS 3410
----------------------------------
Trace trace.short.txt
Instruction Limit none
*** Cache Configuration ***
capacity 512 B
block_size 32 B
associativity 1-way
n_set 16
n_cache_line 16
tag: 23, index: 4, offset: 5
Processing trace...
Processed 1000 lines.
n_cpu_accesses 1000
n_loads 790
n_stores 210
n_hits 703
n_misses 297
hit_rate 70.30
miss_rate 29.70
n_writebacks 69
Memory Traffic:
B_written_bus_to_cache 9504
B_written_cache_to_bus_wb 2208
B_written_cache_to_bus_wt 840
B_total_traffic_wb 11712
B_total_traffic_wt 10344
Test 2
rv qemu cachesim -t trace.short.txt -cache 12 6 2
Simulation Printout for CS 3410
----------------------------------
Trace trace.short.txt
Instruction Limit none
*** Cache Configuration ***
capacity 4096 B
block_size 64 B
associativity 2-way
n_set 32
n_cache_line 64
tag: 21, index: 5, offset: 6
Processing trace...
Processed 1000 lines.
n_cpu_accesses 1000
n_loads 790
n_stores 210
n_hits 843
n_misses 157
hit_rate 84.30
miss_rate 15.70
n_writebacks 11
Memory Traffic:
B_written_bus_to_cache 10048
B_written_cache_to_bus_wb 704
B_written_cache_to_bus_wt 840
B_total_traffic_wb 10752
B_total_traffic_wt 10888
Test 3
rv qemu cachesim -t trace.short.txt -cache 16 4 2
Simulation Printout for CS 3410
----------------------------------
Trace trace.short.txt
Instruction Limit none
*** Cache Configuration ***
capacity 65536 B
block_size 16 B
associativity 2-way
n_set 2048
n_cache_line 4096
tag: 17, index: 11, offset: 4
Processing trace...
Processed 1000 lines.
n_cpu_accesses 1000
n_loads 790
n_stores 210
n_hits 838
n_misses 162
hit_rate 83.80
miss_rate 16.20
n_writebacks 0
Memory Traffic:
B_written_bus_to_cache 2592
B_written_cache_to_bus_wb 0
B_written_cache_to_bus_wt 840
B_total_traffic_wb 2592
B_total_traffic_wt 3432
You can check your output against the sample output for basic, incomplete testing purposes. To compare your output with ours systematically, we recommend using the command diff in the terminal. diff file1 file2 will output contents that are different between file1 and file2. Run man diff on the command line to see how to get it to ignore whitespace, or to display the two files next to each other.
Statistics
All of the statistics that are printed out should contain correctly populated data. Initially, some of these data will be 0, but as your simulator grows in functionality, each statistic should be meaningful.
Verbose Mode
Verbose mode will print out each line of the trace and what its result was. If you want to use verbose mode, you must supply the correct set and way using the methods log_set() and log_way(); it is only within the method access_cache (which you write) that these numbers are known.
For verbose mode, it is recommended that you use a tiny trace (e.g., trace.tiny.txt, which we have provided), or else your screen will be overwhelmed with print statements.
Submission
Submit all the files listed in Submission Requirements to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.
You are also required to fill in the AI survey form to mark your submission as completed. You can find the survey as a separate assignment on Gradescope.
Rubric
- smoketest: 5 points
- cache trace output & task 2 correctness: 95 points
Acknowledgments
A version of this assignment was originally written by Milo Martin at University of Pennsylvania and subsequently modified to include the coding infrastructure and an autograder by Anne Bracy at Washington University in St. Louis. The assignment code base was converted from Java to C by CS 3410 course staff at Cornell University to meet the needs of the class.