In this project you will write a disassembler for MIPS. A disassembler takes an executable or object file that has been produced by a compiler and prints out a listing of assembly code instructions. This is useful in determining exactly what goes on, at the hardware/software boundary, when a program compiled from a higher-level language is run.
MIPS is a reduced instruction set computer (RISC) architecture. This greatly simplifies the the process of writing a disassembler, because there are only a few dozen instructions (as opposed to the hundreds on some other architectures). Furthermore, all MIPS instructions are of the same size (32 bits), making it easy to tell where each instruction begins and ends.
As a starting point, download the disassembler framework. This contains the necessary functionality to read an executable file and create an array of instructions. You will write the code to iterate through this array and disassemble each instruction, printing it in a human-readable format. The only file you will modify is print.c
. Initially this file contains an empty print function:
void print(unsigned int* text, int num_instructions, unsigned int start_addr) { }
Your job is to fill in the contents of this print function. The framework will call the print function and pass in an array of instructions (the program text), the number of instructions in the array, and the virtual address of the first instruction. Do not change the signature of the print function or anything outside of print.c
. You are free to add functions inside print.c
if you wish.
Your program should output the virtual address and the hexadecimal representation of each instruction, followed by the disassembled instruction. The disassembled instruction should look much like the "Assembly Language Format" column in the list of instructions to implement below, except that symbols such as Rt and I will be replaced by the actual numeric value. Absolute addresses should be printed in hex; all other numbers should be in decimal. Run ./disassemble-example tests/test1
(and similarly the test2 through test4) to see how the output should look. Please follow the exact same format, including the colons and square brackets, to simplify the grading (don't worry about slight differences in whitespace).
As an example, let's disassemble the instruction 0x24040011
. We first look at the opcode, which is in the six most significant bits. The opcode can be isolated by a bitwise AND operation against a mask where only the bits we care about are set to 1, namely 0xfc00000
. (You may wish to write out the binary representation of the instruction and the mask to visualize how this works.) The result of 0x24040011 & 0xfc000000
is 0x24000000
, which we can look up to be the addiu
(add immediate) instruction. Next, we must extract the values for Rt, Rs, and I from the instruction. Since I occupies the 16 low order bits, extracting it is a simple matter of a bitwise AND operation against a mask of 0x0000ffff
. Rt and Rs must additionally be shifted into the low order bits (so that printf will interpret them correctly). For Rt, the operation looks like (instruction >> 16) & 0x0000001f
and for Rs it'll be (instruction >> 21) & 0x0000001f
.
MIPS instructions come in one of three formats, as illustrated in the table below. Notice that the opcode is always in the most significant 6 bits; this allows you to look at the opcode first and then determine how to interpret the rest of the instruction. You will need to use the Instruction Set Reference to determine which format is used for a particular instruction (or infer this from the list below).
Format | 6 bits | 5 bits | 5 bits | 5 bits | 5 bits | 6 bits | Comments |
---|---|---|---|---|---|---|---|
R | op | Rs | Rt | Rd | shamt | funct | Arithmetic |
I | op | Rs | Rt | address/immediate | Transfer, branch,immediate | ||
J | op | target address | Jump |
Key | |
---|---|
op | Operation code |
Rs | First source register operand |
Rt | Second source register operand |
Rd | Destination register operand |
shamt | Shift amount (for shift instructions) |
funct | Function (for arithmetic instructions) |
In case you have not previously compiled C programs on Linux, you will find the following commands helpful.
Extract the project from the tarball: tar -xvzf disassembler.tar.gz
Compile the project: make
Compile and link the test programs: cd tests; make; cd ..
Run the project: ./disassemble <filename>
, for instance, ./disassemble tests/test1
. Initially, this will not generate any output because your print routine is empty.
Run the sample binary we provided, so you can see what kind of output your project should be generating: ./disassemble-example <filename>
, for instance, ./disassemble tests/test1
.
Delete just the object files, leaving your executable program and source code intact: make clean
(this is normally not necessary, as make
will selectively rebuild the portions of your project that have changed).
Delete the executable and object files, leaving your source code intact: make clobber
(this is normally not required unless you want to start from a clean slate).
Note: These suggestions for an extra challenge will be examined (and commented on, if your project works well) but not graded. They will have no impact on the class grades. They are here to provide some direction to those who finish their assignments early and are looking for a way to impress friends and family.
Ask the TAs for help. We expect to see most students in office hours during the course of the project. Extra hours will be scheduled as needed.
If you suspect a bug in the disassembler framework... ask Michael for help. There could be bugs, although none were encountered while creating the example solution to this project.
Implement all of the instructions in the following tables. If your disassembler encounters any instructions not listed here, it should print "Unknown instruction" and continue. (As always, you're free to implement additional instructions if you would like to.) For more details see the manuals linked on the class web site, especially Volume 2: Instruction Set Reference.