Register allocation allocates registers to variables. But sometimes allocating just one register to a variable is not important. For example, consider the following code:
int i = 1 ... i = i + 1 ... a[i] = 0
There are two definitions (defs) of i
in this code, and two uses. It is
defined at the first and second lines shown, and used as the second
and third lines. If we refer to these defs and uses as def1
and
def2
, and use1
and use2
respectively, and then
draw a graph in which each definition is connected to each use
that it can affect, we get a disjoint graph:
This disjointness means we can use two different registers to hold i
, since
the two uses of the variable don't communicate. Each of these uses
has a smaller live range than the whole variable, which may help us do
a better job of register allocation by removing unnecessary constraints.
Register allocation is one motivation for the analysis known as reaching definitions, though other optimizations also need this analysis. Reaching definitions attempts to determine which definitions may reach each node in the CFG. A definition reaches a node if there is a path from the definition to the node, along which the defined variable is never redefined.
We can set up reaching definitions as a dataflow analysis. Since there is only one definition per node, we can represent the set of definitions reaching a node as a set of nodes. A definition reaches a node if it may reach any incoming edge to the node:
\[ \IN{n} = \bigcup_{n'≺n}{\OUT{n'}} \]A definition reaches the exiting edges of a node if it reaches the incoming edges and is not overwritten by the node, or if it is defined by the node:
\[ \OUT{n} = \GEN{n} ∪ (\IN{n} - \KILL{n}) \]With \(\DEFS{x}\) denoting the set of nodes that define variable \(x\), \(\GEN{n}\) and \(\KILL{n}\) are defined very straighforwardly:
\[ \begin{array}{c|c|c} n & \GEN{n} & \KILL{n} \\ \hline x ← e & n & \DEFS{x} \\ \hline \text{everything else} & ∅ & ∅ \\ \hline \end{array} \]Viewing this analysis through the lens of dataflow analysis frameworks, we can see that it works correctly and finds the meet-over-paths solution. It is a forward analysis where the meet operation is \(∪\), so the ordering on lattice values is \(⊇\) and the top value is \(∅\). The height of the lattice is the number of definitions in the CFG, which is bounded by the number of nodes. So we have a finite-height lower semilattice with a top element. The transfer functions have the standard form we have already analyzed, which is monotonic and distributive, so the analysis is guaranteed to converge on the meet-over-paths solution.
Using the reaching definitions for a CFG, we can analyze how defs and
uses relate for a given variable. Suppose that for a given variable,
we construct an undirected graph we will call the DU-graph, in which
there is a node for each def of that variable and a node for each
distinct use (if a CFG node both uses and defines a variable, the def
and the use will be represented as distinct nodes in the DU-graph).
In this there is an edge from a def to a use if the def reaches the
node containing the use. The graph above is an instance of the DU-graph
for the variable i
.
Any connected component of the DU-graph graph represents a set of defs and
uses that ought to agree on where the variable will be stored. For
example, we showed that the DU-graph for variable i
had two
connected components. We refer to these connected components as
webs. Webs are the natural unit of register allocation; in general,
their liveness is less than that of variables, so using them
avoids creating spurious edges in the inference graph. Therefore the
graph coloring problem is less constrained and the compiler can
do a better job of allocating registers.
A standard way to think about the construction of webs is in terms of DU-chains and UD-chains. A DU-chain is a subgraph of the DU-graph connecting just one def to all of the uses it reaches. A UD-chain is a subgraph connecting a use to each of the defs that reach it. If a UD-chain and a DU-chain intersect, they must be part of the same web, so a web is also a maximal intersecting set of DU-chains and UD-chains.
Once reaching definitions have been determined, webs can be computed efficiently using a disjoint-set-union data structure (union-find with path compression). If a def reaches a use, then both must be in the same web. The algorithm works by finding a representative node in each web. Each node keeps track of its representative node during the algorithm; initially every node is its own representative. For each def–use edge, one node's representative node is changed to point to the other's, with path compression used to flatten the chain of representative-node pointers. The running time of this algorithm is nearly linear in the number of def–use edges.
Different compiler optimizations can enable each other, and in general we want the compiler to run multiple optimizations repeatedly until no further improvement is possible. However, optimizations can also invalidate analyses needed by other optimizations. Rerunning these analyses repeatedly makes the compiler more complex and slower. Reaching definitions is a good example of an analysis that ends up being run repeatedly.
Modern compilers typically use a slightly different CFG representation than the one we have been studying. It is called single static assignment form, or SSA for short. The idea is to avoid nontrivial UD-chains: each variable in SSA has exactly one def, and therefore each use does too.
For example, consider the following code:
x = 0 while (x < 10) { x = x + 1 } y = x
As a control-flow graph, it looks like the following, which is not in SSA form because
there are two defs of x
:
This code has two definitions of x
, so in SSA form it must have at
least two distinct variables representing the original x
. However, we cannot
simply renumber those two defs, because the use of x
in the if
-node
is reached by both defs. The solution is to add a new construct: a fictitious
function \(φ\) (phi) that picks among its arguments according to the edge along
which control reached the current CFG node.
The resulting SSA CFG would be something like the following. A new φ node has been
added to the CFG at the point where the two definitions of x
meet. The
behavior of this node is to do an assignment x3←x1
if control enters from the top, but the assignment x3←x2
if control enters from the left. In effect, a definition using φ happens on the
edge, and that is how we go about doing code generation from SSA form: the φ definition
is split back into ordinary definitions.
The variable x
has become three variables x1
, x2
, and
x3
in SSA form. Variables x1
and x2
correspond to the two
definitions in the original program. Variable x3
arises because the
use x < 0
has a non-trivial UD-chain: both of the other definitions
reach it.
After introducing appropriate additional definitions that use the φ-function, every use has exactly one def, and the webs are all disjoint DU-chains that can be (and are) given distinct names.
In the SSA literature it is standard to work with basic blocks as the CFG nodes rather than with individual statements as we have just done. One reason that basic blocks are natural is that at a given point in the control-flow graph, it may be necessary to have φ definitions for multiple variables. Alternatively we can consider that a φ node is in general an assignment to multiple variables.
The advantage of SSA is that it simplifies analysis and optimization. For example, consider the following four optimizations, which all become simpler in SSA form.
x←e
is dead iff there are
no uses of x
. We assume that for each def in the program, we keep track of the set
of corresponding uses. If that set is empty, the definition is dead
and can be removed. All use sets for variables in expression e
can
then be updated correspondingly to remove this use.
x←c
, where c
is a
constant, can be propagated by replacing each use of x
with c
.
This works because there is only one definition of x
. The assignment
is then dead code and can be removed as described above.
x←y
can similarly be
propagated, just like the copy propagation case.
In fact, given code in SSA form and use sets for each variable, all four of these optimizations can be performed in an interleaved fashion without further analysis. By contrast, performing interleaved optimizations in the original non-SSA form would require redoing dataflow analyses between optimization passes.
This improvement does not come for free, however. We have to convert our CFG to SSA form, which is tricky to do efficiently. The challenge is where to insert φ-functions so that every use has exactly one def. Once this property has been achieved, the resulting webs can be renamed (e.g., by adding subscripts to their variable name) accordingly.
A simple-minded approach is just to insert φ-functions for every variable at the head of every basic block with more than one predecessor. But this creates a more complex CFG than necessary, with extra variables that slow down optimization.
When does basic block \(z\) truly need to contain a φ-definition for a variable \(a\); that is, inserting \(a ← φ(a, a)\), with appropriate numbering of the different occurrences of \(a\) at its beginning? This question can be answered using the path convergence criterion:
This criterion implies that \(m\) must be a node with multiple predecessors, because otherwise two paths would have to share the single predecessor and therefore would not be disjoint. It similarly implies that \(m\) might appear in the middle of one of the paths from \(n_1\) and \(n_2\) to \(z\), but cannot be in the middle of both.
A definition \(a_i ← φ(a_j,a_k,...)\) is needed at path convergence points \(m\) for variable \(a\) where \(a\) is live-in.
Note that for evaluating the path convergence criterion, we consider the start node of the CFG to implicitly define every variable, representing its initial value (initialized or uninitialized) on entry.
Although path convergence gives us a clear criterion for when to insert a φ-function, it is expensive to evaluate directly. SSA conversion is therefore usually done using a dominator analysis.
SSA conversion uses the key idea of dominators. A node \(A\) dominates another node \(B\) (written \(A\dominates B\)) if every path from the start node of the CFG to node \(B\) includes \(A\). An edge from \(A\) to \(B\) is called a forward edge if \(A\dominates B\) and a back edge if \(B\dominates A\). Every loop must contain at least one back edge.
The domination relation has some interesting properties. It is reflexive, because a node dominates itself. It is also obviously transitive. Finally, it is antisymmetric. If \(A \dominates B\) and \(B \dominates A\), they must be the same node. If they were different, then \(A \dominates B\) means you can get to \(A\) without going through \(B\), so \(B\) can't dominate \(A\). These three properties mean that domination is a partial order.
In addition, two nodes cannot both dominate a third node without there being some domination relationship between the two dominating nodes. Suppose for purpose of contradiction that \(A\dominates C\) and \(B \dominates C\) but neither \(A \dominates B\) nor \(B \dominates A\). But getting to \(C\) requires going through both \(A\) and \(B\) in some order. Suppose it's \(A\) and then \(B\). But in that case if \(A\) doesn't dominate \(B\), there must be a path from start to \(B\) to \(C\) that doesn't go through \(A\). So \(A\) couldn't dominate \(C\).
These properties of the domination relationship imply that domination is essentially tree-like. In particular, the Hasse diagram for a CFG is always a tree rooted at the start node. For example, the figure below shows a control-flow graph on the left and its corresponding dominator tree on the right.
A control-flow graph (left) and its dominator tree (middle). Back edges in the control-flow graph, indicating loops, are dashed.
The domination relation can be computed efficiently using a dataflow analysis. Define \(\OUT{n}\) to be the nodes that dominate \(n\). Since domination is reflexive, \(\OUT{n}\) includes \(n\) itself. Other nodes that dominate \(n\) must also dominate all predecessors of \(n\), since otherwise there would be a path to \(n\) that misses them. From this reasoning, we obtain the following dataflow equations: \begin{align*} \OUT{n} &= \{n\} ∪ ⋂_{n'≺n} \OUT{n'} & \text{(for nodes other than the start node)}\\ \OUT{\START} &= \{ \START \} \end{align*}
We can solve this as a forward analysis starting with all variables initialized to the set of all nodes. This initial value is the top element of the lattice in which \(∩\) is the meet operator. Note that the transfer function is monotonic and distributive.
We say that node \(A\) postdominates node \(B\) if all paths from \(B\) to an exit node go through node \(A\). Postdominance happens exactly when \(A\) dominates \(B\) in the transposed (dual) CFG, in which all edge directions are reversed and start and exit nodes are interchanged.
We have seen that SSA form is a convenient form for optimization and analysis of code. However, converting code to SSA form is itself not trivial. Conversion can be broken down into two steps:
In Step 1 we don't want to use φ more often than necessary, because this will create unnecessary variable names and impede optimization.
Intuitively, φ is needed at a node when it is the earliest place that two paths from different definitions converge. The path convergence criterion identifies such earliest convergence points, but it does not naturally lead to an efficient algorithm for finding these locations.
Dominance frontier
Instead, dominators can be used to efficiently insert φ exactly where the path convergence criterion says it is needed in order to select the right reaching definition. The intuition is that if a node \(n\) defines a variable \(x\), the path convergence criterion will not demand that φ be used for \(x\) at any node dominated by \(n\) where the definition reaches, since the definition is already on all paths that reach the node. As illustrated in the figure, the nodes such as \(n'\) inside the colored boundary are all dominated by node \(n\), so \(φ\) is not needed to make sure the definitiation at \(n\) reaches them. On the other hand, node \(m\) does need a φ-function, because it has a predecessor dominated by \(n\), yet it is not itself dominated by \(n\).
An edge crossing from a node dominated by \(n\) to a node not dominated by \(n\) is said to lie on the dominance frontier for \(n\). And we consider the destination node of that edge (such as \(m\)) to also lie on the dominance frontier. The nodes lying on the dominance frontier of some definition of \(x\) are exactly the nodes that need a \(φ\) definition added.
Notice that adding this definition indeed adds a new definition to the control flow graph. And this new definition has its own dominance frontier that may induce additional definitions using \(φ\). However, this iterated dominance frontier process does eventually converge on a set of φ definitions such that every node on the dominance frontier of every definition of a variable \(x\) starts with a corresponding definition \(x = φ(x,x)\).
The picture below
shows a small example of this process. We
start with the code on the upper left, which has two defs of
x
and is therefore not in SSA form. Each of these defs has a
dominated region indicated by the dashed bubble of the corresponding
color. The edge from node x←x+1
to node if x < 10
crosses the
boundary of the (blue) region dominated by x←x+1
, so node if x < 10
is
on the dominance frontier of this assignment. Therefore it acquires
a new (green) def using φ, as shown in the middle. This new
def has its own dominated region, and we again look for nodes on
its dominance frontier. There are none, so we can number the different
defs of x
and rename all the uses accordingly to arrive at the
SSA code on the right.
SSA conversion using iterated dominance frontiers
Let \(\DF{n}\) denote the dominance frontier of node \(n\): the set of nodes not dominated by \(n\), but with a predecessor dominated by \(n\). Assuming we have computed the dominance relation, we can easily check whether any given node lies on the dominance frontier of node \(n\). This observation leads to an obvious quadratic algorithm.
However, we can make the computation of dominance frontiers more efficient by observing that every node on the dominance frontier of \(n\) is either:
Thus, to compute the dominance frontier of \(n\), we recursively compute the dominance frontier of each of \(n\)'s children in the dominator tree, then iterate over all the nodes in the childrens' dominance frontiers and over the direct successors, checking whether each of these nodes is on the dominance frontier of \(n\) itself.