\( \newcommand{\bigsqcap}{{\Large ⨅}} \) \( \newcommand{\MOVE}{\mathit{MOVE}} \newcommand{\TEMP}{\mathit{TEMP}} \newcommand{\CALL}{\mathit{CALL}} \newcommand{\MEM}{\mathit{MEM}} \newcommand{\ADD}{\mathit{ADD}} \newcommand{\CONST}{\mathit{CONST}} \newcommand{\SEQ}{\mathit{SEQ}} \newcommand{\SEQ}{\mathit{SEQ}} \newcommand{\ESEQ}{\mathit{ESEQ}} \newcommand{\JUMP}{\mathit{JUMP}} \newcommand{\CJUMP}{\mathit{CJUMP}} \newcommand{\LABEL}{\mathit{LABEL}} \newcommand{\dest}{\mathit{dest}} \newcommand{\RV}{\mathit{RV}} \newcommand{\NAME}{\mathit{NAME}} \newcommand{\OP}{\mathit{OP}} \newcommand{\EXP}{\mathit{EXP}} \newcommand{\IF}{\mathtt{if}} \) \( \newcommand{\START}{\mathtt{start}} \newcommand{\RETNODE}{\mathtt{return}} \newcommand{\IN}[1]{\mathit{in}[#1]} \newcommand{\OUT}[1]{\mathit{out}[#1]} \newcommand{\USE}[1]{\mathit{use}[#1]} \newcommand{\VARS}[1]{\mathit{vars}(#1)} \newcommand{\DEF}[1]{\mathit{def}\hspace{1pt}[#1]} \newcommand{\GEN}[1]{\mathit{gen}[#1]} \newcommand{\KILL}[1]{\mathit{kill}[#1]} \)

Iterative solving

Recall that a dataflow analysis can be characterized as a four-tuple \((D, L, ⊓, F)\): the direction of analysis \(D\), the space of values \(L\), transfer functions \(F_n\), and a meet operator \(⊓\). However, we're not yet guaranteed that iterative analysis such as the worklist algorithm works.

Let's consider a simpler algorithm that computes the answer to a dataflow analysis. If the dataflow analysis framework satisfies certain properties to be identified, this algorithm will compute the same thing as the worklist algorithm, but less efficiently. We can think of the worklist algorithm as an optimized version of this iterative analysis algorithm, which avoids recomputing \(\OUT{n}\) for nodes \(n\) whose value couldn't have changed (because it hasn't changed for any predecessors of \(n\)).

Iterative solving (for forward analyses):

The algorithm updates \(\OUT{n}\) for all \(n\) on each iteration. If we imagine each of the nodes \(n\) as having one of the distinct indices \(1,\ldots,N\), we can think of all the values \(\OUT{n}\) as forming an \(N\)-tuple \((\OUT{n_1}, \ldots, \OUT{n_N})\), which is an element of the set \(L^N\).

We can think of the action of each iteration of the loop as mapping an element of \(L^N\) to a new element of \(L^N\); that is, it is a function \(F: L^N→L^N\). The action of the algorithm produces a series of tuples until the same tuple happens on two consecutive iterations:

\begin{align*} ~& (⊤,⊤,\ldots,⊤) \\ \longrightarrow ~& (l^1_1, l^1_2, \ldots, l^1_N) \\ \longrightarrow ~& (l^2_1, l^2_2, \ldots, l^2_N) \\ ~& \vdots \\ \longrightarrow ~& (l^k_1, l^k_2, \ldots, l^k_N) \\ \longrightarrow ~& (l^k_1, l^k_2, \ldots, l^k_N) \\ \end{align*}

To get answers to these questions, we need to understand the theory of partial orders, because we will want the space of dataflow values \(L\) to be a partial order.

Partial orders

A partial order (or partially ordered set, or poset) is a set of elements (called the carrier of the partial order) along with a relation \(⊑\) that is:

The key thing that makes this a partial order is that it is possible for two elements to be incomparable; they are not related in either direction.

For dataflow analysis, we interpret the ordering \(l_1⊑l_2\) to means that \(l_2\) is a better or more informative result.

Some examples of partial orders are the integers ordered by ≤ (i.e., \((\mathbb{Z}, ≤)\), types ordered by the subtyping relation ≤ (in many languages), sets ordered by ⊆ (or ⊇), booleans ordered by ⇒. If \((L,⊑)\) is a partial order, the dual partial order \((L,⊒)\) is too. Some examples of non-partial orders are the reals ordered by \(\lt\) and pairs of integers ordered by their sums.

Hasse diagram

Hasse diagram

A useful way to visualize a partial order is through a Hasse diagram, as shown in the figure on the right, which depicts the subsets of \(\{a,b,c\}\) ordered by set inclusion (⊆). In the diagram, elements that are ordered are connected by a line if there is no intermediate element that lies between them in the ordering. And elements connected by a line are displaced vertically to show which is the greater in the relation. Therefore, any two related elements are connected by a path that goes consistently upward or downward in the diagram.

The height of a partial order is the number of edges \(n\) on the longest chain of distinct elements \(l_0 ⊑ l_1 ⊑ l_2 ⊑ \ldots l_n\). Therefore, the height of the partial order in the figure is 3.

Lattices

A lower bound of two elements \(x\) and \(y\) is an element that is less than both of them. Some partial orders have the property that every two elements have a greatest lower bound, or GLB, or meet. It is written \(x⊓y\), and pronounced as “x meet y”.

The meet of two elements is above all other lower bounds in the ordering: \(z ⊑ x ∧ z ⊑ y ⇒ z ⊑ x⊓y\).

Dually, for some partial orders, every two elements have a least upper bound (LUB), written \(x⊔y\) and pronounced “x join y”.

If a partial order has both a meet and a join for every pair of elements, it is called a lattice. If it has a meet for every pair of elements, it is a lower semilattice. If it has a join for every pair of elements, it is an upper semilattice. We will be interested only in meets, so we will be working with lower semilattices, which we may simply abbreviate to “lattice” (and most of the partial orders we care about are, in fact, full lattices).

Tuples

Suppose that \(L\) is a partial order. Then the set of tuples \(L^N\) is also a partial order under the componentwise ordering:

\[ (l_1, l_2, \ldots, l_N) ⊑ (l_1', l_2', \ldots, l_N') \iff ∀_{i∈1..N}~l_i⊑l_i' \]

You can check for yourself that if \(L\) is a partial order, this ordering on \(L^N\) is also reflexive, transitive, and antisymmetric.

If \(L\) is a lattice, then \(L^N\) is also a lattice, with the meet (or join) taken componentwise: \begin{align*} (l_1, \ldots, l_N) ⊓ (l_1', \ldots, l_N') &= (l_1⊓l_1', \ldots, l_N⊓l_N') \\ (l_1, \ldots, l_N) ⊔ (l_1', \ldots, l_N') &= (l_1⊔l_1', \ldots, l_N⊔l_N') \end{align*}

To see that this works for meets, we need to show that \((l_1⊓l_1', \ldots, l_N⊓l_N')\) is greater than any other lower bound for \((l_1, \ldots, l_N)\) and \((l_1', \ldots, l_N')\). Suppose we have such a lower bound \((l_1'',\ldots,l_N'')\). Since it is a lower bound, for all \(i\), \(l_i''⊑l_i\) and also \(l_i''⊑l_i'\). But that implies that \(l_i''⊑l_i⊓l_i'\). Therefore, according to the componentwise ordering on \(L^N\), \((l_1'',\ldots,l_N'')⊑(l_1⊓l_1', \ldots, l_N⊓l_N')\).

Monotonicity

The iterative analysis algorithm starts from the top of the lattice \(L^N\), \((⊤,⊤,\ldots,⊤)\), and repeatedly applies a function \(F:L^N→L^N\) to it, until a fixed point of the function is reached: a tuple \(X = (l^k_1,\ldots,l^k_N)\) such that \(F(X) = X\). As the algorithm executes, a series of tuples \(X_0, X_1, X_2, \ldots, X_k\) is produced, where \(X_0 = (⊤,⊤,\ldots,⊤)\) and \(X_k\) is the fixed point of \(F\).

Given that a fixed point is reached, all the dataflow equations must be satisfied; otherwise, a different tuple would have resulted from the last iteration of the loop. So if the algorithm terminates, it does find a solution. How do we know that it finds a solution?

The key is to observe that the transfer functions \(F_n\) are normally monotonic, and therefore the function \(F\) is too. A function on a partial order is monotonic if it preserves ordering:

Monotonicity:
A function \(f : L→L\) is monotonic if \(x⊑y ⇒ f(x) ⊑ f(y)\).

In the context of dataflow analysis, monotonicity makes sense. We can think about the transfer functions \(F_n\) as describing what we know after a node executes, given what we know beforehand. Having more information before the node executes should not cause us to have less information afterward; it should only help or at worst have no benefit.

The function \(F\) is constructed out of the transfer functions \(F_n\) and the meet operator. If the transfer functions are monotonic on \(L\), the function \(F\) is monotonic on \(L^N\). To see why, let us first check that the meet operator is monotonic.

Theorem: The meet operator is monotonic on its arguments
\[ x⊑y ⇒ x⊓z ⊑ y⊓z \]
Monotonicity of meet

This proposition is depicted in the figure on the right. Since the ordering ⊑ is transitive, we know that \(x⊓z ⊑ y\). This means \(x⊓z\) is a lower bound for both \(y\) and \(z\), and therefore, it is bounded above by the greatest lower bound for \(y\) and \(z\), which is \(y⊓z\).

The function \(F\) is formed by the composition of monotonic transfer functions and the monotonic meet operator, as depicted in the figure below, so it is also monotonic.

Dataflow analysis components

Termination

Iterative solving starts with top element \(X_0\) and applies \(F\) to it. The result, which we called \(X_1\), must be ordered with respect to \(X_0\); that is, \(X_1⊑X_0\). Because \(F\) is monotonic, \(F(X_1)⊑F(X_0)\); that is, \(X_2⊑X_1\). This pattern must continue: for all \(n\), \(X_{n+1}⊑X_n\), which we can see by induction. If we assume that \(X_n⊑X_{n-1}\), then by monotonicity of \(F\), we have \(X_{n+1}⊑X_n\). Therefore the successive dataflow values produced by the algorithm form a chain of distinct elements:

\[ X_k ⊑ X_{k-1} ⊑ \ldots ⊑ X_2 ⊑ X_1 ⊑ X_0 \]

If the lattice \(L^N\) has infinite height, there is no guarantee that this chain won't continue indefinitely. But for most of the problems we care about, the lattice \(L\) has some finite height (call it \(h\)). Therefore, the lattice of tuples \(L^N\) has height at most \(Nh\), since the longest downward chain we can make in \(L^N\) involves moving downward independently on each of the \(N\) dimensions for \(h\) steps. Once the iterative analysis algorithm has run \(Nh\) iterations, it must have arrived at the bottom of the chain: convergence is achieved in \(k\) iterations where \(k≤Nh\). Thus, the value of \(F^{Nh}(\vec{⊤})\) is necessarily a solution to the equations. This is the shortest description of iterative solving yet!

Solution quality

When it terminates, the solution \(F^{Nh}(\vec{⊤})\) is not only a solution to the equations, but the best solution. Recall that greater elements in the partial order \(L\) correspond to more precise information about the program. The value \(F^{Nh}(\vec{⊤})\) is not only a fixed point of \(F\), but the greatest fixed point of \(F\).

We can see why with a simple inductive argument. Suppose there were some other solution \(X\) to the equations. To be a solution, \(X\) must be a fixed point of \(F\): that is, \(X = F(X)\). Now, we know that the initial iterative value \(X_0\) is at least \(X\) since it is the top element \(\vec{⊤}\): \(X ⊑ \vec{⊤}\). Since \(F\) is monotonic, applying it to both sides of this inequality derives the inequality \(F(X) ⊑ X_1\). But since \(X\) is a fixed point, we immediately get \(X ⊑ X_1\). Applying \(F\) to both sides again, we have \(X ⊑ X_2\), and so on. Thus by induction, \(X ⊑ X_k = F^{Nh}(\vec{⊤})\). No solution \(X\) can be better than \(F^{Nh}(\vec{⊤})\).

Example: live variable analysis

In live variable analysis, the dataflow values are sets of live variables. We want to find as few variables live as possible to enable the most optimization, so the ordering \(⊑\) is \(⊇\), the top element \(⊤\) is \(∅\), and the meet operator \(⊓\) is \(∪\).

Are the transfer functions monotonic? Recall that:

\[F_n(l) = \USE{n} ∪ (l - \DEF{n})\]

So if \(l⊑l'\), then \(l ⊇ l'\). Suppose we have an element \(x∈F_n(l') = \USE{n} ∪ (l' - \DEF{n})\). Then either \(x∈\USE{n}\), or else \(x∈l'-\DEF{n}\), in which case \(x∈l-\DEF{n}\). In either case \(x∈F_n(l)\). Since this is true for arbitrary \(x\), \(F_n(l)⊇F_n(l')\), as required.

Generalizing the worklist algorithm

The version of iterative analysis that we've used for proving the properties of the analysis result is not as efficient as we might like for typical control flow graphs.

In fact, the versions of the worklist algorithm seen so far are just instances of a more general algorithm that computes the same result as the algorithm we used for proving correctness. We are trying to solve for a dataflow value for each node \(n\). Without loss of generality, we consider forward analysis (for backward analysis, just turn all the arrows around). At the start of the worklist algorithm, each variable \(x_i\) is initialized to ⊤. The equation that is applied to update \(\OUT{n}\) at each iteration is \(\OUT{n} = F(⊓_{n'≺n}~\OUT{n'})\).

We can view the problem more generally as a set of equations over a set of variables \(x_i\) where each variable is the dataflow value of the corresponding node. The equations express the value of each \(x_i\) as a monotonic function \(f_i\) over some subset of the variables. This description is general enough to encompass some of non-dataflow problems we saw earlier, such as the computation of FIRST and FOLLOW sets, or the algorithm for minimizing DFAs by computing the set of distinguishable DFA states.

The worklist algorithm in general form is, then:

  1. Initialize all \(x_i\) to \(⊤\).
  2. Initialize the worklist \(w ← \{i \mid i \in 1..n\}\).
  3. While some \(i∈w\) repeat:
    1. \(w ← w - \{i\}\)
    2. \(x_i ← f_i(x_1,\ldots,x_n)\)
    3. If \(x_i\) changed in the previous step, \(w ← w ∪ \{ j \mid f_j \text{ depends on } x_i\}\)

The worklist is a set of variable identifiers (e.g., node indices). Usually the set acts as a FIFO queue, so that newly added elements go to the end of the queue. It works well to have the initial ordering of nodes be reverse postorder, so that in the case of a forward analysis, information starts from the \(\START\) node (in the case of a forward analysis) and propagates efficiently forward through the control flow graph.

Example: DFA minimization

As we observed when introducing DFA minimization, to minimize a DFA we compute the set of pairs of DFA states that can be distinguished. two DFA states \(q_i\) and \(q_j\) are distinguishable if either one is an accepting state and the other is not, or if they can both transition on the same symbol \(a\) to distinguishable states \(δ(q_i, a)\) and \(δ(q_j, a)\), respectively. We can write this rule as a logical equation about states \(q_i\): \[ q_i \not≈ q_j \iff (q_i ∈ F \Leftrightarrow q_j ∉ F) ∨ ∃a. δ(q_i, a) \not≈ δ(q_j, a) \]

DFA minimization on an \(n\)-state DFA amounts to computing the value of a tuple of \({n \choose 2}\) boolean variables \(x_{ij}\) such that \(1 ≤ i \lt j ≤ n\), to avoid redundant variables. The meaning of variable \(x_{ij}\) is that it is false when states \(q_i\) and \(q_j\) are known to be distinguishable. Rewriting the same equation in terms of \(x_{ij}\), we have: \[ x_{ij} = (q_i ∈ F \Leftrightarrow q_j ∈ F) ∧ \bigwedge_{a} \{x_{kl} \mid δ(q_i, a) = q_{k'} ∧ δ(q_j, a) = q_{l'} ∧ k\lt l ∧ \{k,l\} = \{k', l'\} \} \] When interpreted as a function that derives a new value of \(x_{ij}\) from existing values, this equation describes a monotonic function. Further, the lattice of values has finite height \({n \choose 2}\), and a top element in which all variables are true. Therefore, the conditions for finding a greatest fixed point are satisfied.

Algorithmically, we simply initialize all \({n \choose 2}\) variables to be true, push all of them onto the worklist, and then apply the worklist algorithm using the equation above to update values \(x_{ij}\) until the worklist is empty and all variables \(x_{ij}\) satisfy the equation. We are guaranteed to find the greatest solution, one that determines as many DFA states as possible to be equivalent, therefore producing a minimal DFA when equivalent states are merged. This algorithm is essentially the same as the one we saw earlier except that the variables \(x_{ij}\) record whether variables are indistinguishable rather than whether they are distinguishable, a change that amounts to using the boolean lattice rather than its dual.

A dependency graph with cycles

Priority-SCC Iteration

The worklist algorithm can be made more efficient in practice by exploiting more knowledge about the dependencies among variables. Suppose that the equation for variable \(x_i\) takes the form \(x_i = f_i(x_{j_1}, x_{j_2}, x_{j_m})\). Then an update to any of the variables on the right side may require an update to \(x_i\) to make the equation hold: in other words, variable \(x_i\) depends on the \(m\) variables \(x_{j_1}, x_{j_2}\), etc. We can view the dependencies among variables as a directed graph, in which there is an edge from each variable to the variables that depend on it. Note that for dataflow analysis, \(m\) is usually small, so the graph is not dense.

The figure on the right shows a dependency graph where it makes sense to be more intelligent about the order in which we use equations. There are 9 variables, \(A\)–\(I\), each of which depends on 0–3 other variables. Clearly, solving should roughly proceed in the direction of the arrows, but how should we handle cycles in the dependency graph?

Strongly-connected components

The key insight is to organize the iteration around strongly connected components (SCCs), which span multiple nodes in the above graph. A strongly connected component is a maximal subgraph such that every node can reach every other node. Every cycle in a graph is part of an SCC. In fact, every directed graph can be reduced to a direct acyclic graph (DAG) whose nodes are strongly connected components. For example, in the graph above, the sets \(\{B, C\}\) and \(\{E, F, G\}\) are SCCs containing multiple nodes. Other nodes, such as \(A\), form SCCs each comprising just those single nodes.

It makes sense to propagate information through this DAG, allowing each SCC to converge before propagating its information into the rest of the DAG. That is, the SCCs should be solved in topologically sorted order.

Strongly connected components can be found in linear time using either Kosaraju's algorithm or Tarjan's algorithm. Kosaraju's algorithm uses two depth-first traversals:

  1. Do a postorder traversal of the graph.
  2. Do a traversal of the transposed graph (follow edges backward), but pick the nodes to start from in reverse postorder. All nodes reached from a starting node are part of the same SCC as that node. Further, the SCCs will be found in topologically sorted order.

For example, in the graph depicted on the right, postorder traversal of the graph starting from node 1 (it doesn't matter much which node we start from) visits the nodes in this order: 9,8,3,2,7,6,5,4,1. Now we start from the end with node 1. No other nodes are reachable from it in the transposed graph, so it is its own SCC. Node 4 is the same. From node 5 we can reach nodes 6 and 7, so (5,6,7) is an SCC. From node 2 we reach node 3, so (2,3) is an SCC. And finally, (8) and (9) are SCCs. Reversing this, the ordering on SCCs is (1),(4), (6,7,5), (2,3), (8), (9).

Tarjan's algorithm

Although Kosaraju's algorithm takes linear time, it requires DFS on both the graph and its transpose, requiring construction or mainenance of the transposed graph. However, Tarjan's linear-time algorithm performs just one DFS on the graph and is only slightly more complicated than the DFS algorithm itself.

Each node has a variable dfs that keeps track of the DFS number of the node, which records the order in which the nodes were first visited. It is initially set to a special value (in the code below, ∞). A second variable low keeps track of the lowest DFS number of any node that is part of the same strongly connected component. It can be computed recursively during the DFS traversal.

scc(Vertex v) {
    v.dfs = v.low = dfs++;
    s.push(v);
    for (w successor of v) {
        if (w.dfs == ∞) {
            scc(w);
            v.low = min(v.low, w.low);
        } else if (w is on the stack s) {
            v.low = min(v.low, w.dfs);
        }
    }
    if (v.low == v.dfs) {
        // pop everything up to v and make a strong component from it
        SCC nodes = new SCC();
        do {
            w = s.pop();
            nodes.add(w);
        } while (w != v);
        SCCs.add(nodes);
    }
}

When traversal from a node finishes, its low variable contains the index of the earliest stack node that is part of the same strong component. The algorithm returns from the recursive calls until that node is reached, at which point all nodes pushed on the stack after that node are part of the same SCC. Its nodes are popped off the stack to form the new SCC. The new SCC is then prepended to the head of the current list of SCCs.

The state of the stack and the SCCs as the algorithm executes on the above example are is as follows. Node names are followed by the values of dfs and lowlink at that point during execution, and a dot marks the current node v, which is not necessarily at the top of the stack.

Stack s →                                      SCCs
A:0,0  B:1,1•
A:0,0  B:1,1  C:2,2•
A:0,0  B:1,1  C:2,1•
A:0,0  B:1,1  C:2,1  H:3,3•
A:0,0  B:1,1  C:2,1  H:3,3  I:4,4•             {I}
A:0,0  B:1,1  C:2,1  H:3,3•                    {I}, {H}
A:0,0  B:1,1  C:2,1•                           {I}, {H}
A:0,0  B:1,1• C:2,1                            {I}, {H}, {B, C}
A:0,0•                                         {I}, {H}, {B, C}
A:0,0  D:4,4•                                  {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5•                           {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5  F:6,6•                    {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5  F:6,5•                    {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5  F:6,5  G:7,7•             {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5  F:6,5  G:7,5•             {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5  F:6,5• G:7,5              {I}, {H}, {B, C}
A:0,0  D:4,4  E:5,5• F:6,5  G:7,5              {I}, {H}, {B, C}, {E, F, G}
A:0,0  D:4,4•                                  {I}, {H}, {B, C}, {E, F, G}, {D}
A:0,0•                                         {I}, {H}, {B, C}, {E, F, G}, {D}, {A}

The output is a (reverse) topologically ordered list of SCCs, and ignoring back edges, each SCC is also topologically ordered.

See Cormen and Leiserson [1] for more details.

Solving with SCCs

Priority-SCC iteration works by propagating information through the DAG of SCCs. Essentially, we solve the SCC in a reverse postorder sequence so that each SCC is fully solved before any work is done on an SCC that depends Fortunately, the SCC algorithms above automatically generate SCCs in this order. In the example above, we have the following SCCS: (1), (4), (5,6,7), (2,3), (8), (9). SCC is then iterated over repeatedly until it converges; the analysis then proceeds to the next SCC in the list. To make each SCC converge quickly, we use a worklist initially containing the nodes of the SCC in reverse postorder.

In the worst case where the whole graph forms one SCC, this algorithm is equivalent to reverse postorder iteration.

The payoff of the Priority-SCC algorithm when compared to a simple worklist is usually small, but it can yield speedups of orders of magnitude for some complex control-flow graphs.

Further reading