Lesson 9: Alias Analysis

Gist

Motivation

Lots of languages have pointers! Whether you call them that or not.

Tricky quiz to emphasize the point that pointers make semantics hard: what does this C program return?

bool foo(char *a, char *b) {
    *a = 'a';
    *b = 'b';
    return *a == 'a';
}

The answer is that it depends on aliasing, i.e., whether a and b point to the same location in memory. Really, doing anything with pointers depends on aliasing! Basically anything that you can normally do with non-aliasable local (stack) variables is extra hard to do on heap variables (i.e., pointers) without aliasing information.

An example especially close to my heart is parallelization. Aliasing is a major impediment (the major impediment?) to automatic parallelization of code—you need to be sure that two things running in parallel can't be interfering with each others' data, which requires knowing that they don't touch pointers that alias.

Stating the Alias Analysis Problem

The problem: "For every program point, and for every pair of pointer-typed variables p and q, do p and q point to the same memory location at that point in time?"

Alias Analysis with Data Flow

Let's try to concoct a simple alias analysis using the data flow framework!

Heap Models

Any alias needs a definition of what a "memory location" is. A common answer is that there is one location per static allocation site. In Bril, for example, every alloc instruction becomes a memory location.

For realistic languages, it often helps to disambiguate memory locations further: for example, to give every offset in an array a different location, or to give every field in an object a different location.

Context-Sensitive Alias Analysis

See last time's discussion about context sensitivity in general. Context sensitivity is a big topic in alias analysis: it's common to use some limited calling-context context to disambiguate memory locations and alias information.

Of course, there is a sharp trade-off between cost and precision. Scalable & precise alias analysis remains an open problem. Seriously, it's its own world of ongoing research. For much more, see Smaragdakis and Balatsouras.

Tasks

There are no implementation tasks for this lesson. If alias analysis is your bag, you can start with using your data flow implementation to implement a straightforward may-alias analysis for Bril pointers, then proceed on to the literature to find and implement more and more interesting pointer analyses.