Lab Assignment 3 - Buffer Overflows
CS3410 Spring 2014
Due: 11:59pm, Wednesday, April 16, 2014.
Late Policy: Don't be late.
Reminder: you must work alone for this and other labs.
Setting Up Your Linux Environment
In this lab, you should be using the course VM. (You may also opt to use the csuglab machines, but then you will have less control of the environment, including any potential breakdowns.) Please, see this page to get instructions on setting up your environment.
In addition to setting up the environment, please copy the browser executable from the CSUG machines under /courses/cs3410/lab3/ to a local location of your choosing. You will run this executable via emulator which you set up from above.
The goal of this lab is to get intimately familiar with the layout and use of call stacks, as well as MIPS machine language, assembly and disassembly, debugging, and reverse engineering. As a side benefit, we hope to raise your awareness of computer security issues. To this end, you will write a buffer overrun exploit to break a program that we provide to you.
WARNING: These kinds of friendly hacking challenges have a long history, and hacking skills are priceless, as they reflect a deep understanding of the operation of a computer system. But you must be responsible and use your skills wisely. Taking over machines or hacking the Internet carries stiff penalties, is a sure-fire way to get expelled from Cornell, interferes with other people's lives, and is a waste of your talent. It is also plain wrong.
In this homework, you will "0wn" a binary program called browser that we will provide to you. We will not be providing the source code for this program. All that you know about this program is what is documented here, and what you can figure out for yourself by running or examining the binary. The browser is a simplified web browser. The normal operation of browser is very simple. When executed, it prompts you for a URL, and then prints a simple message (the '$' shown here is the linux shell prompt):
$ simulate browser Where to connect? www.google.com Connected to www.google.com!
I can also send input to browser from another program using the linux shell '|' operator, with the same results:
$ echo "www.google.com" | simulate browser Where to connect? Connected to www.google.com!
However, this browser was supplied to you by Google, which has decided it is finally time for them to abandon their long-time motto of "Don't be evil"!!! The browser only lets you connect to www.google.com. All other URLs will be rejected — try it and see!
The rumor is that browser suffers from a buffer overflow vulnerability. Since the program only takes two inputs, it's not difficult to guess where the problem might lie. You are a typical college student wanting to use social networking tools to connect with your friends. Thus, you would like to to get this browser to let you connect to Facebook, even though Google originally designed it to only allow access to its own website.
0wning browser. Your job is to craft some input to browser that will cause it to print out a different message, specifically: "LOL 0wn3d! <netid> is on Facebook!" (substitute your own NetID)1. The fact that the normal "Only www.google.com is allowed" message is missing constitutes proof that you have completely subverted the browser, and have gotten it to do something that it could not do before.
$ cat exploit | simulate browser Where to connect? LOL 0wn3d! hw342 is on Facebook!
To do this, you will need to inject new code into the browser program as it is running. You are not allowed to modify or replace the browser program on disk. The only way you get to interact with browser is to feed it some carefully crafted input.
The simulator. The browser program is compiled to run on a MIPS CPU. Since most of you don't have access to a real MIPS CPU (neither do we), you will not be able to natively execute the program. Instead, you can run a program which takes browser and simulates the execution of the code. The simulate program we provide is just such a simulator. (Another popular simulator is spim, but it won't work for this assignment: spim only deals with assembly code, rather than actual binary machine code as our in-house simulator does.)
To figure out how to attack browser, you'll need to step through its code as it is executing and reverse engineer the parts that matter, namely, where (i.e., at which memory location) the input buffer is stored, what the values are that lie near them in memory, and what precise instruction sequence is vulnerable to a buffer overflow attack. Since you have the MIPS binary, you can use various mipsel-linux tools, like mipsel-linux-objdump and mipsel-linux-readelf, to disassemble the browser binary and learn about its layout and code.
You can also use the "-d" option to the simulator, which starts an interactive debugger for the simulated program execution. This lets you step through the execution one instruction at a time, examine memory and the stack contents, and so on. See the README file in /courses/cs3410/mips-sim/ for help using the simulator and it's built-in debugger.
Stack Randomization. Note that in a feeble effort to thwart just such attacks, the simulator, like many real machines, implements stack randomization, a limited kind of program layout randomization. When the simulator starts, it initializes the stack to a variable address, rather than the standard 0x7FFFFFFC. The starting location of the stack is derived from the $USER or $NETID environment variable. If you are using the csuglab machines, you don't need to do anything special, since $USER matches your NetID. If you're using another machine, however, and your username doesn't match your NetID, then it is very important that you set the environment variable $NETID properly before running the simulator. (See "Setting Up Your Linux Environment" below.)
Executing the Attack. Once you have figured out the program and stack layout, you need to come up with a carefully crafted input that will take over browser. This input will likely contain some binary data (the attack payload) that corresponds to MIPS instructions you want to have executed. There are several tools you might want to use to create the payload and inject it into the running browser: a MIPS assembler (e.g. mipsel-linux-gcc, Logisim's "MIPS Rom" assembler, web-based MIPS assemblers, or yourself with the book's green reference card) to convert from MIPS assembly into MIPS machine language; xxd for converting text files containing hex digits to (or from) raw binary files; and cat for sending raw binary input to browser.
Once your attack causes browser to print the "LOL 0wn3d! <netid> is on Facebook!" message, it can just crash, loop forever, or exit gracefully. It is trivial to make it loop forever. A clean exit only takes a few extra instructions to invoke the normal exit() routine.
Standard convention is for main(int argc, char **argv) to be invoked with the name of the program itself as argv. In order to support this, the simulator pushes the name of the MIPS program file on to the stack before invoking main. You may have noticed this file name appears near the top of the stack.
Important: An unfortunate consequence is that the stack alignment of main and all other functions depends on how you invoke the program. There are ways to avoid this.
Option 1: Always be sure to invoke the simulator using the exact same file name argument for the browser file. In other words, whatever you are using as the last argument to the simulate program, you should always use as the last argument.
So if you invoke the simulator like this:
$ simulate browser
Then any of these is acceptable as well, and will not change the stack alignment:
$ /courses/cs3410/mips-sim/bin/simulate browser $ simulate -d browser $ /courses/cs3410/mips-sim/bin/simulate -d -i exploit.binary browser $ cat exploit.txt | xxd -r | simulate browser
But any of these will cause the stack to have a different alignment:
$ simulate ./browser $ simulate /courses/cs3410/lab3/browser $ simulate ~/browser $ simulate lab3/browser
Option 2: Use the "-noargs" option of the simulator which will always invoke main with some constant-length arguments regardless of what you write on the command line. Using the "-noargs" option, you can invoke simulator any way you like and it will always use the same stack layout:
$ simulate -noargs browser $ simulate -noargs ./browser $ simulate -noargs ~/browser $ simulate -noargs /courses/cs3410/lab3/browser
Although this is how real programs do work, this extra wrinkle in the assignment isn't intended to add extra difficulty to it. You may specify in your documentation what path to browser is appropriate for your solution, and we will honor it when testing.
Without any specification, we will assume that "browser" was used, or that the "-noargs" flag was used. You should prefer one of these two options unless you have already tested extensively with another option before reading this paragraph. This should never happen, since every student reads this handout carefully before starting the assignment, right? :)
What to submit
Submit your raw binary exploit file containing the specially crafted input. We will try it out on our own copy of browser to see if it successfully breaks it.
Also submit a text document that briefly explains the exploit file. A text listing from xxd of the bytes in your exploit file, annotated with a few comments to explain what it is doing (or trying to do) is sufficient documentation. We will use this document for partial credit and for understanding your exploit.
Here are a few tools you might find useful for this homework.
xxd is a tool for converting back and forth between raw binary files and text representations of the binary data. For example, if I create a file exploit.txt (using a regular text editor) specifying twenty-eight consecutive "bytes" in hex:
68 77 33 34 32 20 00 00 00 00 00 00 00 00 00 00 01 02 03 04 aa bb cc dd 11 22 33 44
then I can convert this into raw binary using xxd in "reverse plain" mode:
$ xxd -r -p exploit.txt > exploit $ ls -l exploit* -rw-r--r-- 1 hw342 hw342 28 2011-02-25 12:06 exploit -rw-r--r-- 1 hw342 hw342 84 2011-02-25 12:06 exploit.txt
You can see that the text version is 84 bytes (includes spaces and 2 digits of text per "byte"), and the raw binary version is exactly 28 bytes (one byte per byte, of course). xxd is extremely picky about the format of the input file in "reverse" mode (spaces at the ends of lines silently mess things up, for example). So you may want to convert the raw file back to text and compare to your desired bytes to make sure nothing went wrong:
$ xxd exploit 0000000: 6877 3334 3220 0000 0000 0000 0000 0000 hw342 .......... 0000010: 0102 0304 aabb ccdd 1122 3344 ........."3D
mipsel-linux-objdump can give you a listing of the assembly code for browser:
$ mipsel-linux-objdump -xdl browser
Pipes and redirection, you may recall, are shell command line operators that let you connect the output of one program (say cat or xxd) to the input of another program or to a file. So you can, for example, concatenate two text files using cat, send the resulting text as input to xxd -r -p, send the resulting raw binary to the simulated browser, then send the resulting output to a file output.txt, all using a single command:
$ cat exploit_part1.txt exploit_part2.txt | xxd -r -p | simulate browser > output.txt
We're here to help. Take advantage of our office hours if you are stuck.
For an entertaining (and a somewhat dated) read on buffer overflow attacks, check out:
Aleph One. Smashing the Stack for Fun and Profit. Phrack Magazine, 7(49), November 1996.
And finally, to reiterate: a friendly hacking challenge can be fun, and hacking skills are invaluable for working with real systems. But you must be responsible for your own behavior. We are not giving you free reign to launch attacks on CMS, the csuglab, fellow students' machines, or any anything else. Such behavior is unethical and most likely illegal as well.