Project 4 - Buffer Overflow

CS 3410 Spring 2017


Due: 11:59pm, Friday, April 14, 2017

Late Policy: Up to 2 slip days can be used for this project. If you are out of slip days, submissions after the due date will incur a 25% deduction per day late.

Grace Period Policy: Do not rely on the grace period to submit on time. Everything should be uploaded BEFORE the due date above.

Reminder: You must work alone for this project.

Warning: Read the ENTIRE writeup before you begin. Regrades will not be honored for submissions that do not follow the writeup.


Setting up your Environment

For this project you should either be SSH'd into a UGCLINUX machine or be using the course VM found on the course webpage.

Files: The files you will need for this assignment will be in your personal github repository.

We assume that you cloned your github repo into your home directory. If you placed it elsewhere, you will have to modify the commands below.

Setting up $PATH: The first cd will put you in your $HOME directory, but we need to update our $PATH. This is how you add these files to your $PATH so that your shell knows where to find the simulate program and all of the mipsel-linux tools. This means that you can just type simulate as opposed to /home/<<NETID>>/<<REPONAME>>/mips-sim/binary/simulate every time you want to run something. From here forward, we assume you are SSH'd into a UGCLINUX machine.

  • To avoid having to reset your $PATH every time you log in, you can do the following just once:
    Note: You should replace <<NETID>> with your own NetID and <<REPONAME>> with the name of your GitHub Repository (which is most likely also your NetID).
  • $ cd
    $ echo 'export PATH=${PATH}:/home/<<NETID>>/<<REPONAME>>/p4/mipsel-linux/bin:/home/<<NETID>>/<<REPONAME>>/p4/mips-sim/bin' >> .bashrc
    
  • Then logout and log back in.

Setting up $NETID: You must follow these steps to set up your environment correctly before attempting this assignment.

  • In order to set your $NETID variable, do the following just once:
    Once again, note that you should replace <<NETID>> with your own NetID.
  • $ cd
    $ echo 'export NETID=<<NETID>>' >> .bashrc
    
  • Then logout and log back in.
  • You should now be able to check that your $NETID variable is set correctly:
  • $ echo $NETID
  • If you see your NetID printed to your terminal, you are all set.

We assume that you cloned your github repo into your home directory. If you placed it elsewhere, you will have to modify the commands below.

Setting up $PATH: The first cd will put you in your $HOME directory, but we need to update our $PATH. This is how you add these files to your $PATH so that your shell knows where to find the simulate program and all of the mipsel-linux tools. This means that you can just type simulate as opposed to /home/vm/<<REPONAME>>/mips-sim/binary/simulate every time you want to run something. From here forward, we assume you are using a Virtual Machine.

  • To avoid having to reset your $PATH every time you log in, you can do the following just once:
    Note: You should replace <<REPONAME>> with the name of your GitHub Repository (which is most likely also your NetID).
  • $ cd
    $ echo 'export PATH=${PATH}:/home/vm/<<REPONAME>>/p4/mipsel-linux/bin:/home/vm/<<REPONAME>>/p4/mips-sim/bin' >> .bashrc
    
  • Then logout and log back in.

Setting up $NETID: You must follow these steps to set up your environment correctly before attempting this assignment.

  • In order to set your $NETID variable, do the following just once:
    Note that you should replace <<NETID>> with your own NetID.
  • $ cd
    $ echo 'export NETID=<<NETID>>' >> .bashrc
    
  • Then logout and log back in.
  • You should now be able to check that your $NETID variable is set correctly:
  • $ echo $NETID
  • If you see your NetID printed to your terminal, you are all set.

Overview

The goal of this project is to get intimately familiar with the layout and use of call stacks, as well as MIPS machine language, assembly and disassembly, debugging, and reverse engineering. As a side benefit, we hope to raise your awareness of computer security issues. To this end, you will write a buffer overrun exploit to break a program that we provide to you.

WARNING: These kinds of friendly hacking challenges have a long history, and hacking skills are priceless, as they reflect a deep understanding of the operation of a computer system. But you must be responsible and use your skills wisely. Taking over machines or hacking the Internet carries stiff penalties, is a sure-fire way to get expelled from Cornell, interferes with other people's lives, and is a waste of your talent. It is also plain wrong.

What to Submit

Submit your raw binary exploit file containing the specially crafted input. We will try it out on our own copy of browser to see if it successfully breaks it.

Also submit a text document that briefly explains the exploit file. A text listing from xxd of the bytes in your exploit file, annotated with a few comments to explain what it is doing (or trying to do) is sufficient documentation.

The Story

In this project, you will "0wn" a binary program called browser that we will provide to you. We will not be providing the source code for this program. All that you know about this program is what is documented here, and what you can figure out for yourself by running or examining the binary. The browser is a simplified web browser. The normal operation of browser is very simple. When executed, it prompts you for a URL, and then prints a simple message (the '$' shown here is the linux shell prompt):

$ simulate browser
Where to connect? www.google.com
Connected to www.google.com!

I can also send input to browser from another program using the linux shell '|' operator, with the same results:

$ echo "www.google.com" | simulate browser
Where to connect?
Connected to www.google.com!

However, this browser was supplied to you by Google, which has decided it is finally time for them to abandon their long-time motto of "Don't be evil"!!! The browser only lets you connect to www.google.com. All other URLs will be rejected — try it and see!

The rumor is that browser suffers from a buffer overflow vulnerability. Since the program only takes one input, it's not difficult to guess where the problem might lie. You are a typical cs3410 student wanting to use an interpreter to test your MIPS code while paying tribute to our glorious leader, inventor of MIPS, Danny Qiu. Thus, you would like to to get this browser to let you connect to dannyqiu.me, even though Google originally designed it to only allow access to its own website.

0wning browser: Your job is to craft some input to browser that will cause it to print out a different message, specifically: "LOL 0wn3d! <netid> is on dannyqiu.me!" (substitute your own NetID). The fact that the normal "Only www.google.com is allowed" message is missing constitutes proof that you have completely subverted the browser, and have gotten it to do something that it could not do before.

$ cat exploit | simulate browser
Where to connect?
LOL 0wn3d! hw342 is on dannyqiu.me!

To do this, you will need to inject new code into the browser program as it is running. You are not allowed to modify or replace the browser program on disk. The only way you get to interact with browser is to feed it some carefully crafted input.

The simulator: The browser program is compiled to run on a MIPS CPU. Since most of you don't have access to a real MIPS CPU (neither do we), you will not be able to natively execute the program. Instead, you can run a program which takes browser and simulates the execution of the code. The simulate program we provide is just such a simulator. (Another popular simulator is spim, but it won't work for this assignment: spim only deals with assembly code, rather than actual binary machine code as our in-house simulator does.)

To figure out how to attack browser, you'll need to step through its code as it is executing and reverse engineer the parts that matter, namely, where (i.e., at which memory location) the input buffer is stored, what the values are that lie near them in memory, and what precise instruction sequence is vulnerable to a buffer overflow attack. Since you have the MIPS binary, you can use various mipsel-linux tools, like mipsel-linux-objdump and mipsel-linux-readelf, to disassemble the browser binary and learn about its layout and code.

You can also use the -d option to the simulator, which starts an interactive debugger for the simulated program execution. This lets you step through the execution one instruction at a time, examine memory and the stack contents, and so on. See the README file in your repo for help using the simulator and it's built-in debugger.

Stack Randomization: Note that in a feeble effort to thwart just such attacks, the simulator, like many real machines, implements stack randomization, a limited kind of program layout randomization. When the simulator starts, it initializes the stack to a variable address, rather than the standard 0x7FFFFFFC. The starting location of the stack is derived from the $NETID environment variable.

Executing the Attack: Once you have figured out the program and stack layout, you need to come up with a carefully crafted input that will take over browser. This input will likely contain some binary data (the attack payload) that corresponds to MIPS instructions you want to have executed. There are several tools you might want to use to create the payload and inject it into the running browser: a MIPS assembler (e.g. mipsel-linux-gcc, Logisim's "MIPS Rom" assembler, web-based MIPS assemblers, or yourself with the book's green reference card) to convert from MIPS assembly into MIPS machine language; xxd for converting text files containing hex digits to (or from) raw binary files; and cat for sending raw binary input to browser.

Once your attack causes browser to print the "LOL 0wn3d! <netid> is on dannyqiu.me!" message, the browser program should exit gracefully (this means, exit with status 0). It is trivial to make it loop forever. A clean exit only takes a few extra instructions to invoke the normal exit() routine.

Command-line Arguments

Standard convention is for main(int argc, char **argv) to be invoked with the name of the program itself as argv[0]. In order to support this, the simulator pushes the name of the MIPS program file on to the stack before invoking main. You may have noticed this file name appears near the top of the stack.

Important: An unfortunate consequence is that the stack alignment of main and all other functions depends on how you invoke the program. There are ways to avoid this.

Option 1: Always be sure to invoke the simulator using the exact same file name argument for the browser file. In other words, whatever you are using as the last argument to the simulate program, you should always use as the last argument.

So if you invoke the simulator like this:

$ simulate browser

Then any of these is acceptable as well, and will not change the stack alignment:

$ /home/<<USERNAME>>/<<REPONAME>>/mips-sim/bin/simulate browser
$ simulate -d browser
$ /home/<<USERNAME>>/<<REPONAME>>/mips-sim/bin/simulate -d -i exploit.binary browser
$ cat exploit.txt | xxd -r -p | simulate browser

where <<USERNAME>> represents the username depending on whether you are using SSH or the VM, and <<REPONAME>> is the name of your repository.

But any of these will cause the stack to have a different alignment:

$ simulate ./browser
$ simulate /home/<<USERNAME>>/<<REPONAME>>/p4/browser
$ simulate ~/browser
$ simulate p4/browser

Option 2: Use the -noargs option of the simulator which will always invoke main with some constant-length arguments regardless of what you write on the command line. Using the -noargs option, you can invoke simulator any way you like and it will always use the same stack layout:

$ simulate -noargs browser
$ simulate -noargs ./browser
$ simulate -noargs ~/browser
$ simulate -noargs /home/<<USERNAME>>/<<REPONAME>>/p4/browser

Although this is how real programs do work, this extra wrinkle in the assignment isn't intended to add extra difficulty to it. You may specify in your documentation what path to browser is appropriate for your solution, and we will honor it when testing.

Without any specification, we will assume that "browser" was used, or that the -noargs flag was used. You should prefer one of these two options unless you have already tested extensively with another option before reading this paragraph. This should never happen, since every student reads this handout carefully before starting the assignment, right? :)

Tools

Here are a few tools you might find useful for this homework.

xxd is a tool for converting back and forth between raw binary files and text representations of the binary data. For example, if I create a file exploit.txt (using a regular text editor) specifying twenty-eight consecutive "bytes" in hex:

68 77 33 34 32 20
00 00 00 00 00 00 00 00 00
00
01 02 03 04
aa bb cc dd
11 22 33 44

then I can convert this into raw binary using xxd in "reverse plain" mode:

$ xxd -r -p exploit.txt > exploit
$ ls -l exploit*
-rw-r--r-- 1 hw342 hw342 28 2011-02-25 12:06 exploit
-rw-r--r-- 1 hw342 hw342 84 2011-02-25 12:06 exploit.txt

You can see that the text version is 84 bytes (includes spaces and 2 digits of text per "byte"), and the raw of the input file in "reverse" mode (spaces at the ends of lines silently mess things up, for example). So you may want to convert the raw file back to text and compare to your desired bytes to make sure nothing went wrong:

$ xxd exploit
0000000: 6877 3334 3220 0000 0000 0000 0000 0000  hw342 ..........
0000010: 0102 0304 aabb ccdd 1122 3344            ........."3D

mipsel-linux-objdump can give you a listing of the assembly code for browser:

$ mipsel-linux-objdump -xdl browser 

Pipes and redirection, you may recall, are shell command line operators that let you connect the output of one program (say cat or xxd) to the input of another program or to a file. So you can, for example, concatenate two text files using cat, send the resulting text as input to xxd -r -p, send the resulting raw binary to the simulated browser, then send the resulting output to a file output.txt, all using a single command:

$ cat exploit_part1.txt exploit_part2.txt | xxd -r -p | simulate browser > output.txt

Epilogue

We're here to help. Take advantage of our office hours if you are stuck.

For an entertaining (and a somewhat dated) read on buffer overflow attacks, check out:

Aleph One. Smashing the Stack for Fun and Profit. Phrack Magazine, 7(49), November 1996.
http://www.phrack.org/issues.html?issue=49&id=14

And finally, to reiterate: a friendly hacking challenge can be fun, and hacking skills are invaluable for working with real systems. But you must be responsible for your own behavior. We are not giving you free reign to launch attacks on CMS, fellow students' machines, or any anything else. Such behavior is unethical and most likely illegal as well.

FAQ

Syscalls

As you should be able to tell from the disassembled code for browser, the SYSCALL instruction works pretty much just like a regular function call. The only difference is that before invoking, $v0 must contain a numeric code for which operating system service should be invoked: 1 for "printi", 2 for "prints", 3 for "putc", and so on. These numeric codes are not standard, but are specific to the cs3410 simulator. Other simulators you might find on the web will use their own numeric codes.

You need the newlines!

Yes, you need the newlines both before and after the "LOL 0wn3d!" message. Of course getting the message in the first place is worth the most points, but the newlines will get you those final few points.

So, an exploit that looks like this:

$ simulate browser < pht24-soln
Where to connect?
LOL 0wn3d! pht24 is on dannyqiu.me!
MIPS program exits with status 0 (approx. 8105 instructions in 249071 nsec at 32.32540 MHz)

... is preferable to an exploit that looks like this:

$ simulate browser < pht24-bad
Where to connect?  LOL 0wn3d! pht24 is on dannyqiu.me! MIPS program exits with status 0 (approx. 8105 instructions in 252959 nsec at 32.32040 MHz)

As you may have discovered, you can't simply embed a newline or carriage return in the message, because the browser stops reading when it encounters these characters. Something more clever is called for.

Aha! I found this handy vertical tab (0x0b) character! I can just use that instead of a newline, right?

No, a vertical tab is not a newline. You must embed a newline into the message.

Why doesn't my exploit work when I try to use it with the debugger?

The short answer: Use the command simulate -i exploit -d browser

The long answer: You are likely trying to use one of these two commands:

simulate -d browser < exploit
cat exploit | simulate -d browser

This works just fine without the debugger (simulate browser), but what happens when we run it with the debugger? It takes the contents of the file named exploit and pipes that to the standard input of simulate -d browser ... but what input does simulate -d browser expect? The debugger is expecting a command from the user (step, continue, info, etc.). Your exploit code is being used as a debugger command, which confuses the debugger and causes it to exit without executing the browser. The -i flag to simulate will solve this problem.

Don't forget that you can find out about the -i flag and other potentially useful flags by running simulate without any arguments.

Is there a way to get around the command-line arguments limitation?

There are ways to make your program resistant to changes in stack layout. These clever exploits work when the stack starts in some small region, instead of only working for one fixed location. If you implement such an exploit, feel free to brag about it in your documentation for extra credit!

Finally, there is a way to make your program work with any arbitrary stack layout. We'll leave this one for the adventurous. If you find this exploit, again, specify clearly in your documentation what we need to do to see this awesome exploit in action, and you will be awarded more extra credit.