Type-Based Alias Analysis
Type-Based Alias Analysis by Diwan et al. describes a set of efficient yet precise alias analysis algorithms using the types of type-safe programming languages, called type-based alias analysis (TBAA). Three techniques are introduced: (1) TypeDecl, a very conservative analysis which decides that two memory references may alias if they have the same type; (2) FieldTypeDecl, which uses type declarations of fields and other high-level information of the program to improve TypeDecl; and (3) SMTypeRefs, which examines the effects of assignments to more accurately determine the types that a memory reference may access. The "final version", referred as SMFieldTypeRef in the paper, combines the three techniques mentioned above.
One highlight of the paper is that the authors evaluated their proposed approaches in a pretty rigorous way. Aside from the traditional static evaluation where only the sizes of may-alias and point-to sets are examined, the authors evaluated the effect of TBAA on potential further optimizations by applying Redundant Load Elimination (RLE) to programs analyzed by TBAA. The authors further show a limit analysis, which demonstrates that at least RLE would not benefit much from an alias analysis that is more accurate that TBAA.
Since we have not covered alias analysis in class when I read the paper (and may not finish when I present the paper), I think it would be useful to briefly describe what alias analysis does in this post. Alias analysis tries to statically disambiguate memory references in the program so that the compiler can get a knowledge on which instructions might access the same memory location. On a traditional computer architecture, such information can be useful when the compiler tries to reorder memory loads and stores; if we want to map the code to hardware, alias analysis provides information about how to actually arange the memory in hardware, and how to design the control FSMs of the hardware. Despite the importance of alias analysis, a very precise alias analysis can be prohibitively slow due to the complexity of analyzing each pair of memory references in the program.
Modula-3 Programming Language
The authors describe their TBAA techniques and evaluate these
techniques using programs written in Modula-3. Modula-3 is a statically-typed, type-safe
programming language. Since the language is type-safe, it does not allow arbitrary pointer casting
like C and C++. The language seems fairly old and limited information can be found online. I found
this site which might be useful if you are interested in the details of the language.
Modula-3 only allows three types of memory references:
p.f to access a field of an object,
dereference a pointer, and
p[i] to access the i-th element of an array
p. Pointer type casting
is only allowed between a type and its subtypes.
Section 2 of the paper describes the three TBAA techniques mentioned above. Aside
from assuming that the language is type-safe, the authors further assume that the compiler has
access to the whole program except for standard libraries. This assumption is later abandoned in
Section 4, where the authors evaluate the effectiveness of TBAA when only part of the program is
available. It is also assumed that all references of a type
T might access all fields of
- Access Path (AP): An access path is a combination of the three types of allowed memory references,
a.b^.c[i]. Distinct object fields are assumed to have different names.
Type(p)is the static type of
pis an AP.
Subtypes(T)is a set of all subtypes of
Titself. For subtyping, if
T1is a subtype of
T, then all objects of type
T1are also of type
The first technique proposed in the paper is surprisingly simple. Assuming that an
AP can reference any object with the same type or subtypes, TypeDecl sees two APs
aliasing if SubTypes(Type(p)) $\cap$ SubTypes(Type(q)) $\neq \phi$. This is apparently very,
very conservative, since the condition is clearly too strict and TypeDecl does not take any
program syntax information into account.
FieldTypeDecl solves part of the problem by considering the field names and
the different types of memory accesses. For example,
t.g do not alias even if they are
of the same type, because these two APs are accessing different fields of objects. As another
q[i] do not alias because Modula-3 does not allow that. In more general cases,
FieldTypeDecl will check whether the program has ever taken address of an object of the target
type. If not, the involved instruction cannot alias with any other references. If this access check
fails, or if the two APs are just "raw" references to objects, FieldTypeDecl reverts back to
TypeDecl to make a final decision.
SMTypeRefs solves the other part of the problem by actually examining the assignments in the instructions. While TypeDecl makes a very conservative assumption that all references to the same type and subtypes might alias, SMTypeRefs goes through the program and tries to "merge" the types together only when there is a pointer assignment between the types. This subtle change makes SMTypeRefs seemingly much more powerful than TypeDecl.
The final version of TBAA combines FieldTypeDecl with SMTypeRefs, where SMTypeRefs is used in the place of TypeDecl in FieldTypeDecl. The output of TBAA is a type-based table indicating whether accesses to certain types may alias with each other, rather than a table recording whether each pair of memory references alias or not. As a result, the time complexity of constructing this table is linear with respect to the number of instructions in the program and the number of types in the language. However, obtaining the alias status of each memory reference pair can require $O(e^2)$ time where $e$ is the number of memory references in the program.
The authors thoroughly evaluate TBAA using static evaluation, dynamic evaluation, and limit analysis:
- Static evaluation focuses on examining the sizes of may-alias and point-to sets, where smaller sets are better. While it seems very convicing and straightforward, static evaluation has two major drawbacks. Firstly, static evaluation cannot reflect how effective the analysis is to the optimization passes that use it. Secondly, static evaluation uses only the set sizes as metric, which cannot reflect the strengths and weaknesses of different alias analysis methods.
- Dynamic evaluation actually examines how the alias analysis affects the runtime of the optimized program when used together with other optimization passes. It actually reflects how effective the alias analysis assists the optimizations. On the downside, results of dynamic evaluation depend on the benchmarks, inputs, and the "client optimization" that uses alias analysis. In this paper, Redundant Load Elimination (RLE) is implemented as this "client".
- Limit analysis evaluates how much improvement we can possibly get compared with a "perfect" alias analysis. It can be performed together with dynamic evaluation, where the oracle alias pairs can be found by profiling.
The authors assembled their own benchmark suite to evaluate the performance of TBAA. One possible reason why the authors did not use a standard benchmark like SPEC (which came out in 1992) is that the standard benchmark suites were not written in Modula-3. The authors' experiments also got impeded by GCC bugs. All the optimizations and analysis are implemented in the middle-end of the author's compiler toolchain.
The authors evaluated TypeDecl, FieldTypeDecl, and SMFieldTypeRefs on their benchmark suite. Clearly, the latter two versions of TBAA are much more powerful than the simple TypeDecl, both in identifying intra-procedural and inter-procedural aliases. In general TBAA is much less effective in eliminating inter-procedual aliases. My guess for this behavior is that since TBAA is almost a pure type-based analysis, it should be inherently more conservative when analyzing programs with a large amount of instructions, where pointer assignments and memory references appear for more times. For inter-procedural alias analysis, the compiler gets to see more assignments, and TBAA might end up believing that all types and subtypes alias with each other.
It is surprising to me that SMFieldTypeRefs offers very limited improvement over FieldTypeDecl.
While the authors did not explain the reason, my guess is that the process of checking whether the
address has been taken or not in FieldTypeDecl implicitly provides a lot of information about
assignments. Unfortunately the authors did not provide a study that reveals more insights. In
addition, personally I am interested in seeing how effective SMTypeRefs is when it's used alone.
Perhaps SMTypeRefs is actually not that much more powerful than TypeDecl on real programs, which
can explain why FieldTypeDecl and SMFieldTypeRefs have very similar performance. The last
interesting thing I found from this section is that for
m2tom3, SMFieldTypeRefs generates a
larger local alias set than FieldTypeDecl.
Case Study on Redundant Load Elimination
Redundant Load Elimination (RLE)
RLE simplifies redundant memory expressions with variable references, and tries to move the memory references out of the loop if there is no alias inside the loop. Similar optimizations also apply to branches.
By statically examining the number of removed redundant loads, we can draw the same conclusion as in the static evaluation section: FieldTypeDecl and SMFieldTypeRefs are strictly more powerful than TypeDecl. However, when comparing the performance of optimized programs, all three techniques offer similar speedup when used together with RLE. This result demonstrates that at least for RLE, a more precise alias analysis may not provide much benefit over TBAA. The limit analysis confirms this statement, because almost all redundant loads can be removed by using RLE together with TBAA for most benchmarks. For the redundant loads that cannot be removed, the authors studied the cause and found that they are mostly caused by problems other than alias analysis.
Performance on Incomplete Programs
The assumption of the compiler having access to the whole program is often violated in cases such as separate compilation. As a result, the authors performed additional experiments to evaluate how TBAA performs when this assumption does not hold. The authors use "open world" to refer to the cases where the assumption does not hold, and use "close world" to refer to situations where the whole program is available to the compiler. To ensure that TBAA on incomplete programs still yield correct results in the "open world" scenario, the authors made some changes to make it more conservative:
- When evaluating whether the address of a memory access has been taken elsewhere, the modified version also checks the function arguments. If two function arguments are both references and have the same type, then the instructions accessing these two function arguments might alias, because the code that calls the function might assign aliasing objects to the function arguments.
- When merging types in SMFieldTypeRefs, any two types with a subtype relationship are merged together, since the unavailable code might assign them. To my understanding, this modification makes SMFieldTypeRefs almost equivalent to FieldTypeDecl.
While these changes seem very conservative, actually they do not affect the performance of TBAA that much. The authors show that the "open world" assumption has negligible effect on the execution time of the compiled program when RLE is applied. The result is not surprising, because even in the "close world" scenario, SMFieldTypeRefs does not have clear advantage over FieldTypeDecl. The modified version of SMFieldTypeRefs is very close to FieldTypeDecl, with just slightly more restrictions. When I first read the paper, I felt that there should be an additional experiment showing how TBAA performs when different portions of the code are available. Thanks to Adrian's suggestions, now I think that the critical point of this experiment is to show that the effectiveness of TBAA is not greatly affected by the two seemingly conservative assumptions. In the "open world" scenario, no matter how much code the compiler gets to see, it always needs to make the same assumptions.
Conclusion and Discussions
This paper introduces Type-Based Alias Analysis (TBAA), a simple but powerful technique for disambiguating memory references. The technique is efficient, because building the data structure for the analysis requires only linear time with respect to both the number of instructions in the program and the number of types in the language. The time complexity for querying a pair of instructions is linear with respect to the number of types in the language. Using a self-implemented optimization pass as the "client" to alias analysis, the authors show that TBAA can offer close-to-optimal performance improvement to the compiled programs.
The authors tried their best to thoroughly evaluate TBAA and performed some evaluations that previous work never did. Unfortunately, they did not further dig into some points that I am personally interested in, and some experimental details are either missing or not clearly explained. This echos the importance of doing thorough, complete experimental evaluations.
TBAA is definitely a very useful type of alias analysis, since it achieves very good trade-off between complexity and accuracy. LLVM has implemented TBAA as an analysis pass. Tools that are built on top of LLVM also leverage the results of TBAA as hints to optimizations. One family of such tools, named High-Level Synthesis tools, try to enable designers to describe their hardware in pure software languages by performing automatic analysis, optimizations, and hardware generation inside the compiler. With such tools becoming popular, alias analysis will have a completely different group of "clients" compared with what the authors had twenty years ago.