Project 3: Write an LLVM Pass

Overview

The point of Project 3 is to learn to use the LLVM compiler infrastructure. The scope is similar to Project 1, i.e., small: you don't need to make anything too ambitious; just build something that works and evaluate it rigorously.

The end result of your project should be an LLVM pass—a thing that reads or transforms LLVM IR programs. Unless you have a very good reason not to, you will implement your pass in C++ so you have access to the full power of the LLVM API. Unlike in Project 1, your pass needs to work on full-scale, real-world programs rather than just on small tests.

To understand the scope of Project 3, think back to Project 1, where you built something cool for Bril. Project 3 is like that, with some important differences:

Finally, for simplicity, we're requiring that projects be implemented as passes on LLVM IR. Your pass need not actually transform the IR; implementing an analysis that generates warnings or errors, for example, is in scope. But building a new frontend that emits LLVM IR is out of scope.

Ideas

Here are some general categories of LLVM passes you might implement. This is not meant as an exhaustive list; please consider getting creative with your proposal:

Doing the Project

Start with my LLVM pass skeleton repository. (I recommend you use the noauto branch, which avoids a problem on recent LLVM version.) Before you do anything else, make sure you can build and run the SkeletonPass that just prints out the names of functions in a program. Then you can steal my skeleton code, copy it to your own repository, and start hacking from there.

For a walkthrough of some LLVM basics, see my example-driven blog post. Invariably, you will want to rely on the official documentation and the auto-generated Doxygen pages for the LLVM API while you're working. You might also find the "Kaleidoscope" tutorial useful, although it is meant for people building frontends rather than passes.

To evaluate your pass, use real code written in a language that compiles to LLVM. That probably means C and C++ code via Clang, but it might include code in other languages that compile to LLVM. Here are some benchmark suites you can consider:

Because time is short, you don't need to get a complete benchmark suite running if that's too hard. But you should not rely exclusively on hand-written tests.