CS 3410 Spring 2016
Due: 11:59pm, Friday, April 29th
Reminder: you must work in a group of two for this project. You don't need to have the same partner as the earlier projects.
This assignment will help you and your partner further hone your skills as elite programmers. Your goal is to program a multicore bot (called a droid) who will race against an opponent to fill memory. This game simultaneously tests your knowledge of computer systems, your programming skills, and your ability to protect programs from being hacked. Writing a top notch droid will require all the skills you have been developing so far in the course and, hopefully, be a fun and interesting challenge. At the end of the assignment, we will hold a tournament pitting all your droids against each other and the winner will receive not only bragging rights but actual bonus points on the assignment! Plus we will be holding a banquet (read: pizza, cookies, and soda) for all while we watch the tournament!
It is a dark time for the galaxy. The former Jedi knight Calloc Ren has turned to the dark side after his hunger for power led him to attempt to access memory outside of his control. Now, he leads the sinister First Order in their hunt for the last Jedi Master, Lucas, reknowned throughout the galaxy for his vast wisdom on all matters of Vagrant and Logisim.
As the JALactic Republic bracys itself for war, the First Order is preparing to unleash its greatest weapon: Stackkiller Base, a planet-sized battle station armed with a devastating super-laser. Only Master Lucas can save the Republic, but the map to his hiding place has been corrupted and can be restored only by a droid running the galaxy's fastest memory writing algorithm.
Across the galaxy, Resistance fighters and First Order agents alike begin a desperate race against time to find Lucas and restore balance to the Force....
The game is played by two droids with four cores each. All eight cores are placed in a single shared memory space, running on a simulated 8-core MIPS machine with a very small partitioned, direct-mapped data memory cache. The are executed by the same simulator we have used previously, but modified to simulate a simple data cache, and to simulate 8 MIPS CPUs simultaneously: one instruction of core 0 then one instruction of core 1, and so forth.
Each droid is assigned a payload
, a particular byte
of data. The goal of the contest is to fill memory with their
team's payload
as much as possible (capturing the location in
memory), while simultaneously trying to impede the progress of the
other team. All access to data memory goes through a very small cache,
and there is a large penalty for missing the cache, so a clever droid
will need to carefully optimize its algorithms for filling
memory. Each core can normally only read and write its own team's half
of the memory space using its own half of the data cache. However, a
core can enter Mind_Trick
mode, which enables the core to
both read or write the opponent's half of memory space and to take
advantage of the opponent's half of the data cache.
The droids have several dangerous tools at their disposal to aid in their mission. Each droid can use any of its four cores can call in a Laser Strike from Stackkiller Base, which can stun the opposing droid and stall its cores for a short period of time (You might be wondering why the droid is only stunned - normally, the Stackkiller laser is powerful enough to destroy a planet, however it isn't operating at full capacity right now because one of the base's radar technicians made a critical error while rewiring a calcinator. The incident is documented here). Each droid's cores can also plant Rathtars - these are dangerous alien creatures which, if encountered by a core, will stall it for a long period of time. The operation of these attacks is described further below.
Each team supplies a single MIPS executable program. At the beginning of the game, the simulator starts each droid by calling the function
void __start(int core_id, int num_crashed, unsigned char payload);
Each of the four cores devoted to one team will invoke the function
when it first boots, passing its core ID (a number from 0 to 3), and
the team's payload
(from 0 to 239). If a core ever crashes, it
will simply reset and invoke the function again.
The num_crashes
parameter is 0 when a core first boots, and
is incremented each time the core crashes.
A core can enter Mind_Trick
mode by calling
void Mind_Trick();
and can leave Mind_Trick
mode by calling
void retreat();
Mind_Trick
mode can potentially let a droid fill memory
more quickly, because it can take advantage of the opponent's memory
cache, and can also be used to impede the progress of the opponent by
interfering with the opponent's use of that same cache. However, these
come with risks. Whenever a core is in Mind_Trick
mode, its core
ID is written into a taunt[]
array that is available to its
opponent. Additionally, it is possible to detect the presence of an
invader by looking at cache usage statistics and timing
information. A core can accuse one of its opponent's cores
of Mind_Trick
mode by calling:
int Call_In_Tie_Fighters(int opponent_core_id);
If an opponent's core is caught by the Tie Fighters while
in Mind_Trick
mode, that core is captured by the opponent
and shut down - essentially locking up the core for
the rest of the game. If a core attempts to use Call_In_Tie_Fighters on a core
that is not in Mind_Trick
mode, the accusing core stalls for
100,000 (STALL_ON_NOT_THE_DROIDS_YOURE_LOOKING_FOR
) cycles, because it takes time
to call the Tie Fighters.
A core can set up a dangerous Rathtar by writing 0xF0 to a
location in memory. Rathtars are set up immediately once 0xF0 is
written to a memory address but due to the intense work to smuggle in Rathtars from distant planets, the core will rest for 100
(STALL_TO_SET_RATHTAR
) cycles before executing the next
instruction. Rathtars are triggered when any core attempts to
write to a memory location that currently holds the trap byte. Such an
attempt causes the write to fail and the writing core to be stalled
for 1,000,000 (STALL_ON_RATHTAR
) cycles. Since the write
fails, multiple cores can be Rathtar-ed at the same memory
location. Rathtars are not triggered by reading memory. Due to
the effort required to set up a Rathtar, each team's droid can set
up at most 15 (RATHTARS_PER_TEAM
) Rathtars in Cache
Wars.
To prevent accidental self-"Rathtar-ing" from functions that
happen to contain the address 0xF0, the upper 4 MB (4
* STACK_SIZE
) of memory is protected from Rathtars.
A core can trap and neutralize a Rathtars by closing the blast doors at its location. It does so by writing 0xF2 to the Rathtar memory location. However, it takes some time to access the blast door control panel, and it requires 100
(STALL_TO_CLOSE_BLAST_DOORS
) cycles. This will not
retroactively un-stall any core that has already been Rathtar-ed
at that location, but will of course save you from being Rathtar-ed there in the future.
Droids can also temporarily stun their opponents by using
Laser
. This is done by writing the byte 0xF1 into a
location in memory. When the Stackkiller laser is fired upon one half of
memory, any enemy cores who try to write to that half of memory
during the next 100 (LASER_DURATION
) cycles will be
pinned down for 300 (STALL_WHEN_LASERED
) cycles. Since the Stackkiller laser can only be fired a limited number of times before needing to be recharged,
each team's cores can perform at most 30 (LASER_SHOTS_PER_TEAM
)
laser blasts.
The game ends when one team manages to fill the equivalent of half
of its data memory with its payload
, or after a fixed number of
cycles, whichever comes first. The team having the most memory filled
with its payload
wins the match and the rights to proceed in
the Cache Wars tournament. But beware...for if
you do win the tournament, a great challenge awaits.
Cache Wars is interesting because the droid cores on each team must cooperate to make effective use of the limited data memory cache, and be resilient to interference from the opponent's attacks. A successful team will use clever strategies to fill memory quickly and try to slow the opponent's progress, while also ensuring that it is not vulnerable to the same types of interference.
The simulator makes it seem like all four of your team's cores live in the lower half of the memory address space (addresses below 0x80000000). The first 1 MB of this memory is read-only, and is where the text segment of your MIPS executable is loaded. The next 31 MB is read-write memory, where global variables and stacks are placed. Global variables are typically placed near the bottom of this segment by the compiler, and the stacks are initialized to near the top of the segment and grow down. All four of the cores on a team share this same space, and share the same global variables. Your opponent is mapped into the upper half of the memory address space (addresses starting at 0x80000000), with an identical layout.
Program stacks: The four program stacks for the four cores on a team are initially spaced evenly near the top of the 31 MB read-write segment (though your code can set $sp to anything you like once it starts executing). The stacks are spaced 1 MB apart; this is plenty of space for most programs, but if you invoke a deeply recursive function, the call stacks can overwrite each other.
Status and cores: The simulator keeps track of the
information about each team and each core, and maps this data into
part of the 1 MB read-only memory region, where your droids can access
it, and where your opponent can access it when it is
in Mind_Trick
mode. This includes information about your
current score (how many bytes of memory currently contain your
team's payload
), how many cycles are left remaining in the
game, the contents of all four of your core's register files, and
statistics about cache misses and stalls for each of your four
cores. Some of this information is available in other places as
well: your team's payload
is passed to __start()
, for
example; and cache statistics are also accessible via a system
call.
Physical addresses: Both droids view the lower half of memory as their own, and both access their own code, data, and stacks using addresses below 0x80000000. These, of course, are virtual addresses. In actuality, the first player's virtual addresses correspond directly to physical memory addresses, while the second player's virtual addresses are mapped in reverse: virtual address 0x00001234, for example, is mapped to physical addresses 0x80001234, and virtual address 0x80001234 is mapped to physical address 0x00001234. This virtual to physical mapping is completely transparent to the program.
Caches: Instruction fetch is assumed to use an infinitely
large cache with no miss penalty. All loads and stores to memory done
by the program use a data cache. Cache hits are serviced in a single
cycle, with no processor stalls. Cache misses cause roughly 100 cycles
of processor stalling. The data cache is direct-mapped and physically
tagged, with 4 cache lines, and a 256 byte block size, for a total of
1024 bytes of cache. However, the data cache is
partitioned
, with the first 2 lines used to cache the lower half of
physical memory (the first player's half), and the last 2 lines used to cache
the upper half of physical memory (the second player's half).
Cache Management: The version of MIPS we are using (MIPS-I) lacks instructions to manage the cache, but we have provided system calls instead. A droid can call
void invalidate(void *ptr);
to remove from the cache any block containing the data pointed to
by ptr
. This only works if the droid has access to the
memory in question at the time the call is made. Similarly, when a
droid calls
void prefetch(void *ptr);
the cache will start to fetch the block containing the given address into the cache. The call returns immediately with no stalling, and roughly 100 cycles later the data will arrive in the cache and be available for use. This call only works if the droid actually has access to the specified block of memory. For each cache line, there can be at most one outstanding fetch in progress. Prefetch requests are ignored if a fetch is already in progress for that line (in this context, "fetch" means either an earlier prefetch, or a core waiting on the cache line to fill because of a cache miss), and actual data fetch requests (e.g. from load/store instructions, rather than prefetch requests) supersede any prefetch requests that are in progress. There are additional system calls to read the current cache tags and the tags for lines that are being fetched. This same information is also available in the memory mapped data-structures.
We have provided a header file (described below) containing various useful macros for accessing your own code and data segments, and those of your opponent. The header also describes in detail the memory layout, cache organization, and available system calls.
Pointer arithmetic in C: When performing pointer arithmetic in C, the compiler implicitly multiplies by the size of the data type that is pointed to. For instance:
char *byteptr = (char *)0x1000; byteptr[3] = 0xFF; // store one byte at memory three BYTES after ptr, so address 0x1000+3 = 0x1003 byteptr += 5; // increments pointer by five BYTES, to address 0x1005 int *wordptr = (int *)0x1000; wordptr[3] = 0xFFFFFFFF; // store one word at memory three WORDS after ptr, so address 0x1000+12 = 0x100c wordptr += 5; // increments pointer by five WORDS, to address 0x1000+20 = 0x1014
From this, we can see that the following code is almost certainly a bug (two bugs, in fact), since the first statement advances by 12 cache lines instead of 3 cache lines, and the second statement advances by 8 cache lines instead of 2 cache lines:
int *ptr = (int *)HOME_DATA_START + 3*CACHE_LINE; pre += 2*CACHE_LINE;
To advance by 3 cache lines (first statement) or 2 cache lines (second statement) when using integer pointers, one could do:
int *ptr = (int *)(HOME_DATA_START + 3*CACHE_LINE); pre = (int *)((int)pre + 2*CACHE_LINE);
Or:
int *ptr = (int *)HOME_DATA_START + (3*CACHE_LINE)/4; pre += (2*CACHE_LINE)/4;
Design code that makes efficient use of the data cache to fill
memory quickly, and find clever ways to beat your opponents, stop them
from posting their payload
, and interfere with your opponent's
cache usage. And although you cannot directly modify your opponent's
registers, you can modify their call stack and global
variables. Coupled with what you know about exploiting programs, you
may even be able to get your opponent to work for you. At the same
time, you need to be on the lookout for cores from your opponent's
team. Because all of these tasks require processing time and access to
memory, you will need to carefully plan how to coordinate use of these
scarce resources.
To get started, we will supply several example teams that employ some basic strategies. Their source code can be found in the same directory as the Cache Wars simulator. The example teams are:
Note: these teams do not do a particularly good job making effective use of the cache. For instance, most write a single byte at a time to memory, and take only a little care to ensure that their droids don't compete with each other for cache lines.
Perform the environment setup from Lab 3 if you haven't already. Once your environment is setup, download pa3 with the following commands.
$ cd $ wget http://www.cs.cornell.edu/courses/cs3410/2016sp/projects/pa3/pa3.zip $ unzip pa3.zip
Inside the pa3 directory is two executables, cachewars111
and cachevall
,
and a directory droids/
. Within droids should be a Makefile
and the folder src/
. Within src/
, you will
find sample course bots that are described above. You will code
your own droid within this folder.
Each team should create a single program, written in C and compiled
into a MIPS binary executable. The four cores on a team all share the
same code (though your program can obviously do different things
depending on the core_id
parameter it receives from the
simulator). Going up from the src/
directory
(cd ..
), and typing make
will compile all
the bots within the src/
directory.
**Important!** By default, your new droid will not be optimized
by the compiler. You can (and should) tell the Makefile to compile your
droid with optimizations enabled. We recommend using the -O3 optimization
flag (the letter "O", not the number "0"). For example, if my droid is
named BB8
, I would add the following line to the Makefile.
FLAGS_BB8 = -O3
We have already included these optimizations in the Makefile for the course bots, so you can look at those to make sure you're doing it correctly
In the main directory you will also find cachewars111
, the
simulator used to play the game. The simulator cachewars111
takes two arguments, the names of the teams in the match, for
example:
$ ./cachewars111 droids/bin/greedy droids/bin/fbi
You will also find simple shell script cachevall
which
takes the name of a single team and pits it against every other team
in turn. This lets you compare how one team does in comparison to all
the others:
$ ./cachevall droids/bin/greedy droids/bin
Compilers don't always produce the code you expect them to, so you will likely want to read the compiled assembly. As you have seen before, you can decompile compiled code using objdump:
$ mipsel-linux-objdump -xdl droids/bin/greedy
Finally you have two tools at your disposition. First, you will
want to read the cachewars111.h
header file in
the droids/src
directory. It is the only header file you
should import into your droid programs, besides any you might write
yourself, but it is designed to be all you will need. It contains the
basic functions Mind_Trick
, Call_In_Tie_Fighters
,
retreat
and so on, along with various other system calls
like printf
and rand()
. But more importantly, its
comment describes in detail how your program is laid out in memory and
how simulator statistics and other data is mapped into your address
space.
Your second major tool is the cachewars111
simulator which
has several useful command line options. Type ./cachewars111
with no arguments to see a list.
The cache is write-allocate: If you want to write to an address, it must be in the cache first (and therefore you suffer a cache miss if it wasn't in there already).
The cache is write-through: Once a line is in the cache, any writes to it immediately appear in memory (and therefore immediately increase your score).
Finally, the cache automatically updates from memory: In particular, if you have the taunt array in cache, and a new bot enters Sneak_Attack mode, the taunt array in cache updates immediately.
It's possible to prefetch too early, such that the line you prefetch arrives in the cache while you're still working on a different line. In this case, you'll doubly penalize yourself, as you'll suffer a miss for the line you're currently accessing, and then miss on the line you meant to prefetch in the first place. Oops! Don't do that.
If you prefetch slightly late, you only lose the difference. If address
0x01000000 is not in the cache, but you prefetch it, wait 50 cycles, then try
to write to/read from it, you'll still miss but only stall 50 cycles instead of
the usual 100, since 50 of the cycles were alerady taken up by the prefetch.
This is why prefetching is beneficial. If you run the simulator in "very
verbose mode" via simulate -vvvv
, you'll see messages like "upgrades existing
prefetch", and this is the meaning of that message.
On the other hand, if address 0x01000400 is in the cache, you prefetch 0x01000200, then try to read/write 0x01000000, the prefetch is canceled, You miss 0x01000000, and stall the full 100 cycles, no matter how far along the prefetch was.
prefetch(ptr)
does not automatically call invalidate(ptr)
; that's why they're separate function calls. The line currently in the cache is still valid until the prefetch finishes.The earlier one sticks. The writeup states "prefetch requests are ignored if a fetch is already in progress for that line", and in this context "fetch" means either an earlier prefetch or a core waiting on the cache line to fill due to a cache miss.
Traps are triggered when any core attempts to write to a memory location that currently holds the trap byte.
Such an attempt causes the write to fail and the writing core to be stalled for 1,000,000 (STALL_WHEN_WALKERED
) cycles.
Since the write fails, multiple cores can be trapped at the same memory location.
Traps are not triggered by reading memory.
Your grade for this assignment will have three parts.
The source code for your droid
, with comments,
and a Makefile so we can build it.
Good luck!