A9: Cache Simulator

Instructions: Remember, all assignments in CS 3410 are individual. You must submit work that is 100% your own. Remember to ask for help from the CS 3410 staff in office hours or on Ed! If you discuss the assignment with anyone else, be careful not to share your actual work, and include an acknowledgment of the discussion in a collaboration.txt file along with your submission.

The assignment is due via Gradescope at 11:59pm on the due date indicated on the schedule.

Submission Requirements

You will submit your completed solution to this assignment to Gradescope. You must submit:

  • cache.c, which will be modified with your cache implementation
  • cache_stats.c, which will be modified with your calculate_stat_rates implementation

Restrictions

  • You may not modify any files other than cache.c and cache_stats.c.
  • You may not include any libraries beyond what is already included for you.

Provided Files

The provided release code contains several files, of which you will modify two:

  • cache.c & cache.h—Data structures and functions for creating and running a simulated cache. You will do most of your programming work for this assignment in cache.c.
  • cache_stats.c & cache_stats.h—Data structures and functions for collecting data about simulated cache execution. You will need to make some modifications to cache_stats.c.
  • cachesim.c—The main driver code for this simulator. You won’t modify this.
  • print_helpers.c, print_helpers.h—Various functions to help in formatting the output of your simulation runs. You don’t need to modify these.
  • simulator.c & simulator.h—Code for execution of the simulator itself. You won’t modify these, either.
  • Makefile—A basic make file, with familiar all and clean tasks.
  • trace/trace.[long|short|tiny].txtThe trace files.

Getting Started

To get started, obtain the release code by cloning the cachesim repository from GitHub:

git clone git@github.coecis.cornell.edu:cs3410-2025fa-student/<YOUR NET ID>_cachesim.git
  • Note: Please replace the <YOUR_NET_ID> with your NetID. For example, if you NetID is zw669, then this clone statement would be git clone git@github.coecis.cornell.edu:cs3410-2025fa-student/zw669_cachesim.git

Overview

In this assignment, you will write a performance simulator: a C program that simulates the behavior of a cache. Software performance simulators are used to explore cache designs because they are faster to implement and modify than circuit-level simulators (like Logisim, for example). The simulator will be configurable so that it can model the behavior of a variety of cache configurations.

Your cache simulator will take as input a memory trace: a list of load and store instructions that were executed during a single run of a single program. We refer to each load and store instruction as a memory reference. You will use a single-core memory trace to test your cache.

Trace Files

The memory traces you will be using were generated by running a program and recording all of the loads and stores issued by the processor during that run. What they are actually doing doesn’t really matter for the purposes of this assignment. The traces have names of the form “trace.size.txt”, where size is one of tiny, short, or long.

Each line of the trace file consists of two fields: a single letter (r or w) and an 8-digit hexadecimal value. For example:

r 1ce026e0
r 7d300000
w 026d7680
r 1ce024d0
r f6b3fca0
r f6b3fc98
r 2db23e10
r f6b3fca8
r f6b33178
w f7706ab8

The first field is always r or w. An r indicates a load (read) and a w indicates a store (write).

The second field is the address of the memory operation (in hex). Taking a look at the variable action (in simulator.c) will help you further understand the format of the trace file.

Address Size

Assume that addresses are 32 bits for this entire project.

Implementation

All of the code you write for Tasks 1–4 will be done in the source file cache.c. The header comments in this file’s functions contain important information about the implementation requirements. In Tasks 5–6, you will modify code in the cache_stats.c. You should also study (but don’t modify) the cache.h and cache_stats.h header files.

Task 1: Initializing the Cache

Complete the make_cache function in cache.c. You need to set correct values for the variables for n_offset_bit, n_set, n_cache_line, n_index_bit, and n_tag_bit.

Notice there is a log2 function in the included <math.h> library.

Because we’re not really manipulating the actual data values (as a real processor would), but only calculating hits and misses for each memory reference, there is no need to store data values in the cache (in fact, the trace doesn’t even give you the data values).

The cache can be represented by a 2-dimensional array of lines containing only the metadata: tags, dirty bits, and is_valid. (The second dimension only becomes relevant when the associativity of the cache is higher than 1.) In the make_cache function, this is the variable cache->lines. Complete the make_cache function by creating the array of tags that will be used to determine whether a cache hit or miss has occurred.

Initialize all tag values to zero, dirty bit flags to false, and the valid flag to false.

Note that you need to call make_cache_stats() from inside make_cache() in cache.c. This is already done for you, but think about why this call is important.

LRU Way Field

For the time being, you can ignore the lru_way field. You will revisit this in Task 5.

Task 2: Extracting Bits from Addresses

Continue coding up your cache.c by completing the three methods used to extract the tag bits, index bits, and block address from any given address. The method names are as follows:

  • get_cache_tag(): returns the tag portion of a given address.
  • get_cache_index(): returns the index portion of a given address.
  • get_cache_block_addr(): returns the given address with the offset bits zeroed out.

For example, if your address is 0b111101010001 (3921 in decimal) and there are 4 bits each in the tag, index, and offset, then

  • get_cache_tag(0b111101010001) returns 0b1111 (15),
  • get_cache_index(0b111101010001) returns 0b0101 (5), and
  • get_cache_block_addr(0b111101010001) returns 0b111101010000 (3920).

Do not convert the integer values to character strings (which is slow and unnecessarily verbose). Integer bit manipulation instructions (shifts, bitwise and, or, not, etc.) can be used to easily and efficiently accomplish such bit extraction operations.

The autograder will call these three methods twice, beginning with the address 0b111101010001 and print out the values for these variables. Note that they will not match the above example, because the tag, offset, and index bits will not each be 4 bits long (the addresses in the simulator are all 32 bits).

Task 3: Implementing a Direct-Mapped Cache

Complete the access_cache function for a direct-mapped cache by following the steps described in the function’s header comment (you can ignore lru_wayanddirty_ffor now). This function performs the cache lookup for each load or store; it returns true if the access is a hit,false if the access results in a miss. Writing this function is the most challenging part of the assignment. Read the rest of the assignment and the code you have been given to correctly implement the access, and do not try to implement the entire function at once. Instead, build on the function as each new concept is introduced below. Every time you change your code, re-run a test that used to work and make sure that as you create new functionality, you don’t break something that used to work.

Your simulator can now simulate a direct-mapped cache. Let’s test it out!

For now, handle loads and stores in the same fashion. In the next task you will set dirty_f depending on whether or not it is a store. Additionally, if there is a tag match, then it is a hit, regardless of whether it was a load or a store.

Task 4: Implementing Set Associativity

Before moving on: create a copy of your working cache up to this point. Run the following command

cp cache.c cache_direct_mapped.c

to copy your cache.c into a new file cache_direct_mapped.c. Ultimately, your final cache.c will be a super-set of cache_direct_mapped.c, but you may find it helpful to have this working snapshot to refer to as you evolve your cache simulator.

Now modify your simulator to support n-way set associative caches. Each set in the cache will now need an array of cache lines (one for each way), and a single “LRU” (least recently used) field used for cache replacement. Note: you will not be implementing true LRU (which would require a counter per way and is not practical). Instead, you will implement an approximation of LRU (you might want to think of it as NMRU “not most recently used”).

  • direct mapped: LRU field is always 0
  • 2 way set associative: LRU field always identifies to the way you didn’t just touch
  • > 2 way set associative: LRU field always identifies the way 1 higher (with wrap-around) than the way you just touched. Examples:
    • If there are 4 ways, and you touch way 0, then the new LRU way should be way 1
    • If there are 8 ways, and you touch way 7, the new LRU way should be 0

When replacing a block, replace the LRU entry (and don’t forget to update the LRU bit to point to the other entry). You should also update the LRU entry after a hit because a cache hit counts as a “use” of that cache line.

Task 5: Implementing Write-Back Caches

Have your simulator produce the 3 statistics associated with memory traffic in cache_stats.c: B_bus_to_cache, B_cache_to_bus_wb, and B_total_traffic_wb. We will also model write-through caches in Task 7!

Traffic into the cache (B_bus_to_cache) is easy to calculate: it is just the number of misses multiplied by the block size, because for each miss you must bring into the cache that entire block of memory. Traffic out of the cache is write traffic, and it depends on whether the cache is write-back or write-through. Your simulator can calculate the traffic for both write-back caches in a simulation as follows:

Write-back: For the write-back cache system, the traffic out of the cache (B_cache_to_bus_wb) is the number of writebacks multiplied by the block size. Because at most one writeback happens per miss, this write-back traffic can at most double the overall traffic (in the case in which all evicted blocks were dirty). To calculate the number of writebacks, modify your simulator to update the “dirty” bit for each block in the cache. Set the dirty bit whenever a store operation writes to the block. This dirty bit is not per word or per byte; it is one dirty bit for the entire cache line.

Lastly, compute and store the total traffic for your write-back cache in the cache_stats.c variable B_total_traffic_wb.

Use a write-allocate policy, meaning if a write misses in the cache, that cache line should be brought into the cache.

Task 6: Write-Through Statistics

Have your simulator produce 2 more statistics - B_cache_to_bus_wt and B_total_traffic_wt. These are hypothetical statistics if the cache was a write-through cache instead of a write-back cache.

In a write-through cache, dirty bits are not used - any write operations pull that word (and therefore block) to the cache, in addition, that word is immediately written back to memory.

Since each word is 4 bytes, the number of bytes written cache to bus in write-through is 4 times the number of store operations encountered in the trace. Note that this therefore has no dependency on cache parameters - block size, associativity.

Running and Testing

To make it easier to compile and run your code, we’ve provided a Makefile. To build your program, simply type rv make.

Testing

To help you debug your cache simulator, we’ve provided some basic tests from the shorter trace.short.txt (1000-3000 lines) trace. The way to use it is to run the sample input command and compare your result with the sample output command. Comparing your results to these sample outputs should help ensure your code is compiling properly and satisfies basic functionality. These debug runs are a completely optional portion of the assignment, but it is one way to make sure you are off to a good start.

Here are the three sample commands and their outputs:

Test 1

rv qemu cachesim -t trace.short.txt -cache 9 5 1

Simulation Printout for CS 3410
----------------------------------
Trace           trace.short.txt
Instruction Limit       none
 *** Cache Configuration ***
capacity                  512 B
block_size                 32 B
associativity           1-way
n_set                   16
n_cache_line    16
tag: 23, index: 4, offset: 5
Processing trace...
Processed 1000 lines.
n_cpu_accesses  1000
n_loads                 790
n_stores                210
n_hits          703
n_misses                297
hit_rate                70.30
miss_rate               29.70
n_writebacks    69
Memory Traffic:
B_written_bus_to_cache  9504
B_written_cache_to_bus_wb       2208
B_written_cache_to_bus_wt       840
B_total_traffic_wb      11712
B_total_traffic_wt      10344

Test 2

rv qemu cachesim -t trace.short.txt -cache 12 6 2

Simulation Printout for CS 3410
----------------------------------
Trace           trace.short.txt
Instruction Limit       none
 *** Cache Configuration ***
capacity                 4096 B
block_size                 64 B
associativity           2-way
n_set                   32
n_cache_line    64
tag: 21, index: 5, offset: 6
Processing trace...
Processed 1000 lines.
n_cpu_accesses  1000
n_loads                 790
n_stores                210
n_hits          843
n_misses                157
hit_rate                84.30
miss_rate               15.70
n_writebacks    11
Memory Traffic:
B_written_bus_to_cache  10048
B_written_cache_to_bus_wb       704
B_written_cache_to_bus_wt       840
B_total_traffic_wb      10752
B_total_traffic_wt      10888

Test 3

rv qemu cachesim -t trace.short.txt -cache 16 4 2

Simulation Printout for CS 3410
----------------------------------
Trace           trace.short.txt
Instruction Limit       none
 *** Cache Configuration ***
capacity                65536 B
block_size                 16 B
associativity           2-way
n_set                   2048
n_cache_line    4096
tag: 17, index: 11, offset: 4
Processing trace...
Processed 1000 lines.
n_cpu_accesses  1000
n_loads                 790
n_stores                210
n_hits          838
n_misses                162
hit_rate                83.80
miss_rate               16.20
n_writebacks    0
Memory Traffic:
B_written_bus_to_cache  2592
B_written_cache_to_bus_wb       0
B_written_cache_to_bus_wt       840
B_total_traffic_wb      2592
B_total_traffic_wt      3432

You can check your output against the sample output for basic, incomplete testing purposes. To compare your output with ours systematically, we recommend using the command diff in the terminal. diff file1 file2 will output contents that are different between file1 and file2. Run man diff on the command line to see how to get it to ignore whitespace, or to display the two files next to each other.

Statistics

All of the statistics that are printed out should contain correctly populated data. Initially, some of these data will be 0, but as your simulator grows in functionality, each statistic should be meaningful.

Verbose Mode

Verbose mode will print out each line of the trace and what its result was. If you want to use verbose mode, you must supply the correct set and way using the methods log_set() and log_way(); it is only within the method access_cache (which you write) that these numbers are known.

For verbose mode, it is recommended that you use a tiny trace (e.g., trace.tiny.txt, which we have provided), or else your screen will be overwhelmed with print statements.

Submission

Submit all the files listed in Submission Requirements to Gradescope. Upon submission, we will provide a smoke test to ensure your code compiles and passes the public test cases.

You are also required to fill in the AI survey form to mark your submission as completed. You can find the survey as a separate assignment on Gradescope.

Rubric

  • smoketest: 5 points
  • cache trace output & task 2 correctness: 95 points

Acknowledgments

A version of this assignment was originally written by Milo Martin at University of Pennsylvania and subsequently modified to include the coding infrastructure and an autograder by Anne Bracy at Washington University in St. Louis. The assignment code base was converted from Java to C by CS 3410 course staff at Cornell University to meet the needs of the class.