Lesson 8: Interprocedural Analysis

Gist

Recap: we have done local (within a basic block) and global (within a function) analyses & optimizations. But what about optimizations that have to cross function boundaries? Those are interprocedural.

Like, what if we want to do LICM to this code?

main() {
    let x = ...;
    for (...) {
        g(f(x));
    }
}

f(x) {
    return x * 42;
}

The multiplication in f is clearly loop invariant, but no global optimization can tell that. And worse still, we can't very well move the multiplication "out of" the body of f because there might be other places that call f that expect it to work as written.

The Call Graph

In interprocedural analysis, you often want to know which functions call which other functions. Like a CFG but for functions, in a way.
In a call graph, every function is a vertex and every edge represents a call from one function to another.

Open vs. Closed World

Practically speaking, most interprocedural analyses need to assume they work in an open world, i.e., they don't get to see all the code that will eventually turn into the final program. Some code is "hidden," and the analysis must assume it can do anything.
In more exotic scenarios, you get to make a closed-world assumption, i.e., that your analysis gets to see everything. That's often called a whole-program analysis.
Assuming a closed world, for example, lets you delete a function if you see that it is never called. In an open world, you can't be sure the function is actually dead. (You don't get to see the whole call graph.)
While whole-program analysis is strictly more powerful, there are many reasons why realistic analyses need to assume an open world:
- Separate compilation. People want the ability to compile *.c files to *.o files independently—for many good reasons.
- Speed: it can be impractical to analyze whole programs if they're really big.
- In many more dynamic languages, including Java even, it's possible for the program to load more code at run time. So assumptions you make about the "whole program" may be invalidated when the application loads a plugin.
Practically speaking, whole-program analysis generally happens in these scenarios:
- Link-time optimization (LTO) is an extra optimization phase that happens after you independently compile all those *.c files to *.o files, when you want to link them together into an executable, so you do get to see all the code.
- Just-in-time (JIT) compilers get to see a snapshot of the code for the entire program right before it runs. They can temporarily apply closed-world optimizations and then invalidate them later on if the program loads more code later on.

Inlining

Inlining is a pretty simple idea: take a function call, and replace it with a copy of the called function's body. You eliminate the call and just do the computation right there.
Inlining is the "ur-optimization" because it gives interprocedural superpowers to local & global optimizations. If you can do a good job with inlining, you unlock many more downstream global optimizations.
Of course, you can't inline everything.
- Inlining the entire call graph into main would make the code exponentially large. And it's impossible when there's recursion, of course.
- In general, inlining has a cost (code size increases, worse instruction locality) and a benefit (remove the cost of the call, enable more optimization). Inliners need to decide when the benefit outweighs the cost.
- Inevitably, you need some kind of heuristic. For example, an easy one is to only inline functions that are small enough, i.e., below a fixed instruction-count threshold.

Devirtualization

Lots of languages (but not Bril—yet!) have virtual function calls.
- Every method call in Java is a virtual call, for example: the actual code you invoke for o.m() depends on the run-time type of o and whether the subclass overrides the m method.
- In assembly, these show up as indirect jumps.
It would be really nice to inline virtual function calls, but doing it in a straightforward way is impossible because we don't know which function is being called!
Devirtualization is an optimization that turns virtual (indirect) calls into direct calls, when possible. Then inlining and other interprocedural optimizations can work on those direct calls.
To use Java as an example again, an easy case is when you initialize an object with Foo o = new Baz() and then immediately, like on the next line of code, call o.m(). You know that o will be a Baz object at that call site, you so you know it must call Baz's version of the m method.
In general, you want to do a data flow analysis to propagate information about the dynamic types of objects from assignments to method calls, and then use that information to decide whether there is exactly one possibility for the function you need to invoke directly.
I recommend this blog post about devirtualization in LLVM.

Context Sensitivity

Sometimes the answer to a question about a function depends on another question: which call are we talking about?

For example, imagine we are trying to optimize this somewhat funky program that uses lots of evil mutable global state:

bool b;  // global variables
int i = 0;

main() {
    g1();
    print(i):

    g2()
    print(i);
}

g1() {
    b = true;
    f();
}

g2() {
    b = false;
    f();
}

f() {
    if (b) {
        i++;
    }
}

The call to f in g1 matters, but the one in g2 is "dead code" that can't affect anything the program will do. Inlining could reveal this fact, of course, but we know it's not always practical to inline everything. And any self-respecting (i.e., sound) interprocedural analysis that asks does f modify i? must say yes, it might!

So is there a way to tell that the second f can be eliminated? For that, we need a context-sensitive analysis. The idea is to use some kind of contextual information to distinguish different invocations of the same code. For example, one common kind of context is the call stack: a context-sensitive analysis could draw different conclusions about calls to f from g1 versus calls to f from g2.

Context-sensitive analyses can get expensive quickly. For example, in our imaginary does the function modify i? analysis, a context-insensitive analysis would need to answer n queries where n is the number of functions in the program. But a context-sensitive analysis that uses the calling function as the context needs n² queries. And you can even use deeper call stacks as context to get even more precise answers—if you use the i most recent calls as context, you now need to answer nⁱ queries. So in general, context sensitivity represents a pretty steep trade-off between analysis precision and cost.

Tasks

There are no tasks to turn in for this lesson. For "fun," you can consider implementing inlining for Bril function calls!