Lesson 5: Global Analysis
- discussion thread
- global analysis & optimization
- tasks due September 21
Gist
Dominators
Lots of definitions!
- Reminders: Successors & predecessors. Paths in CFGs.
- A dominates B iff all paths from the entry to B include A.
- The dominator tree is a convenient data structure for storing the dominance relationships in an entire function. The recursive children of a given node in a tree are the nodes that that node dominates.
- A strictly dominates B iff A dominates B and A ≠ B. (Dominance is reflexive, so “strict” dominance just takes that part away.)
- A immediately dominates B iff A dominates B but A does not strictly dominate any other node that strictly dominates B. (In which case A is B’s direct parent in the dominator tree.)
- A dominance frontier is the set of nodes that are just “one edge away” from being dominated by a given node. Put differently, A’s dominance frontier contains B iff A does not strictly dominate B, but A does dominate some predecessor of B.
- Post-dominance is the reverse of dominance. A post-dominates B iff all paths from B to the exit include A. (You can extend the strict version, the immediate version, trees, etc. to post-dominance.)
An algorithm for finding dominators:
dom = {every block -> all blocks}
dom[entry] = {entry}
while dom is still changing:
for vertex in CFG except entry:
dom[vertex] = {vertex} ∪ ⋂(dom[p] for p in vertex.preds}
The dom
relation will, in the end, map each block to its set of dominators.
We initialize it as the “complete” relation, i.e., mapping every block to the set of all blocks.
The exception is the entry block, which we ensure always only has itself as a dominator.
(This keeps the algorithm from being confused by blocks that jump back to the entry node.)
The loop pares down the sets by iterating to convergence.
The running time is O(n²) in the worst case. But there’s a trick: if you iterate over the CFG in reverse post-order, and the CFG is well behaved (reducible), it runs in linear time—the outer loop runs a constant number of times.
Natural Loops
Some things about loops:
- Natural loops are strongly connected components in the CFG with a single entry.
- Natural loops are formed around backedges, which are edges from A to B where B dominates A.
- (Side note: There are actually two common definitions of backedges: this one, and one that relies on a depth-first search (DFS). By the other definition, a backedge is any edge that takes you to an already-visited node during DFS. The relationship between these two definitions is not 100% clear to me, although they are certainly not equivalent, at least for irreducible CFGs.)
- A natural loop is the smallest set of vertices L including A and B such that, for every v in L, either all the predecessors of v are in L or v=B.
- A CFG is reducible iff every backedge has a natural loop.
- A language that only has
for
,while
,if
,break
,continue
, etc. can only generate reducible CFGs. You needgoto
or something to generate irreducible CFGs.
- A language that only has
Loop-Invariant Code Motion
Here’s a preview of what we’ll do with natural loops. The loop-invariant code motion (LICM) optimization transforms code like this:
let a = ...;
let b = ...;
for (let i = 0; i < 100; ++i) {
f(a * b);
}
Into this, by moving code that does the same thing on every iteration to the loop’s pre-header block:
let a = ...;
let b = ...;
let c = a * b;
for (let i = 0; i < 100; ++i) {
f(c);
}
That is, we want to move code from inside the loop to before it—when that computation always results in the same value. We’ll return to LICM in Lesson 8.
Tasks
- Implement some dominance utilities:
- Find dominators for a function.
- Construct the dominance tree.
- Compute the dominance frontier.
- Devise a way to test your implementations. For example, is there a way you can algorithmically confirm that a block A dominates a block B? While computing these sets should be cheap, checking their output could use slow, naive algorithms.