Algorithms for Control Dependence Analysis
and Static Single Assignment Form Transformation

Lidong Zhou and Nikolay Mateev,
Department of Computer Science,
Cornell University, Ithaca, NY 14853

May 14, 1996

  1. Introduction
  2. Implementation
  3. Source Code Pointers
  4. Experimental Results
  5. References

Introduction

Control Dependence

Control dependence is a key concept in program optimization and parallelization. Intuitively, a node n is control dependent on a node c if c determines whether n is executed. For example, in a conditional statement, statements on the true and false sides of the conditional are control dependent on the predicate.

We can also say that a node is control dependent on an edge; for example, we can say that statements on the true side of a conditional statement are control dependent on the true edge from the predicate.

A control flow graph (CFG) G=(V,E) is a directed graph in which nodes represent statements, and edges represent possible flow of control. There are distinguished nodes START and END with the obvious meaning. It is a standard practice to assume that there is an edge from START directly to END in the CFG.

A node w postdominates a node v if every path from v to END contains w. Postdominance is a transitive relation, and its transitive reduction is a tree-structured relation called the postdominator tree.

Intuitively, a node w is control dependent on edge u->v if u is a `decision-point' that determines whether w will be executed, if control flows from node u to node v along edge u->v, it will eventually reach node w; however, it is possible to reach END from u without passing through w.

The set of all pairs (e,w) such that node w is control dependent on edge e is the control dependence relation of the control flow graph.

See [2] for the precise definitions.

Control dependence is used in many phases of modern compilers, such as dataflow analysis, loop transformations and code scheduling. The size of the relation can grow quadratically with program size (e.g. for programs consisting of n nested repeat-until loops). The standard representation of the control dependence relation is the control dependence graph (CDG), which is best viewed as a bipartite graph in which the two sets of nodes in the bipartite graph are V and E, and in which there is an undirected edge between node v and edge e if v is control dependent on e. Since this is a straight-forward representation of the full relation, the size of the CDG is Omega(|E||V|).

For practical applications, however, we do not need the full relation. We must be able to answer the following queries:

cd(e)
the set of nodes that are control dependent on edge e;
conds(w)
the set of control dependences of node w;
cdequiv(w)
the nodes that have the same control dependences as node w.
We have implemented the Augmented Postdominator Tree (APT) data structure described in [2]. The APT is a minor augmentation of the postdominator tree. There is information (routes) cached at some nodes of the postdominator tree, this caching allows queries to be answered optimally. The key idea is dividing the postdominator tree into zones of limited size and storing information at the boundary nodes. The queries have to search only within a single zone. The zones are determined from the following inequalities: for all nodes q, This guarantees that for Alpha a constant, the query time will be proportional to the size of the output. The storage required for caching is proportional to the size of the program, as shown in [2].

Alpha is a design parameter which embodies a tradeoff between query time (increasing with Alpha) and preprocessing time (decreasing with Alpha). For Alpha < 1/size(output), we obtain single-node zones (essentially the CDG, full caching), and for Alpha > |V|, we obtain a single zone (no caching). Small constant values such as Alpha = 1 yield a reasonable compromise.

Static Single Assignment Form

One application of the APT data structure is an optimal algorithm for finding the dominance frontier of a node, which is the key step in conversion to Static Single Assignment (SSA) form. In conversion to SSA form, dummy assignments called Phi-functions are introduced into the program in such a way that every use of a variable is reached by at most one real or dummy assignment.

The APT data structure can be used to convert a program to SSA form in O(|E|) time per variable. We have implemented the Phi-placement algorithm described in [2].

Weak Control Dependence

Weak control dependence (or loop control dependence [3]) is useful for proving total correctness of programs.

Assume that END->END. A node w loop postdominates a node v if every infinite path starting at v contains w. If, in addition, w != v, then w is said to strictly loop postdominate v. In other words, if control reaches a node v, and w loop postdominates v, then control will reach w in a finite number of steps, whether or not the program terminates.

A node w is loop control dependent on edge (u->v) if

Intuitively, this means that See [3] for the precise definitions.

The APT data structure and the Phi-placement for SSA allow us to compute the weak control dependence optimally. We have implemented the algorithm described in [3].


Implementation

Overview:

The heart of the implementation is the realization of the APT data structure. Based on this data structure, we implemented queries: cd, conds, and cdequiv. APT also serves as the basis for the further implementation of Phi-placement in SSA and Weak Control Dependence.

We modified the Polaris compiler (from UIUC) in order to developed a procedure that produces Control Flow Graph at basic block level. This procedure was used to produce CFGs for the SPEC and Perfect benchmark suites for the evaluation of the APT data structure. We also implemented a program that generates not only CFGs but also lists of blocks for assignments. This program was used to generate test files for SSA construction from SPEC and Perfect benchmarks. Also for evaluation purpose, we have programs that generate CFGs for repeat-until loops and stair-like structures.

APT Data Structure

Basically, APT data structure is just postdominant tree with some additional information, the fields and explanations are as follows: The first four fields define the postdominant tree structure: as we need to traverse the tree in both top-down and bottom-up directions, we have to keep both parent and child. To support the operation of disconnecting a node from its parent in constant time (needed in Weak Control Dependence Construction), we introduced backSibling to form a doubly linked list.

Postdominant Tree Construction (pdom.c)

Postdominant tree construction is based on the algorithm described in [1]. We modified the given program and integrated it into our package. In our implementation, the procedure 'postdom' can return either postdominant tree or dominant tree (postdominant tree of reversed CFG) according to the given tag.

The interface is as follows:

Input:
CFG tree (cfg), allocated node array for the tree (apt_tree), reverseTag denoting if we want postdominant tree or dominant tree.
Output:
parent, child, sibling and backsibling fields of the tree array have been set to indicate the tree structure. If reverseTag == TRUE, we get the tree structure for dominant tree, otherwise for postdominant tree.
Note: For the tree array, index 0 is unused. We use apt_tree[1:N] to represent a tree with N nodes.

APT Construction (buildAPT.c, buildAPT.h)

APT construction consists mainly of two parts: preprocessing for 'conds' computations and preprocessing for 'cdequiv' computations. The algorithms are well specified in [2] and [4]. The construction is essentially done by two traversals of the postdominant (or dominant) tree, one bottom-up and the other top-down.

The interface is as follows:

Input:
Number of nodes (size), root of the tree (root), node array with tree structure information (tree), lists of routes (routes), alpha, which controlling the size of the zone.
Output:
augmented information for each node: boundTag, level, class and routes.

Phi-placement for SSA (computeSSA.c, computeSSA.h)

As an application of APT, we implemented the optimal algorithm for finding dominant frontier of a list of nodes, which is the key step in conversion to SSA form. The algorithm is described in [2].

The interface is as follows:

Input:
APT data on reversed CFG (tree), number of nodes in the APT (size), maximal level of the APT (maxLevel) and a list of nodes that contains real assignments to some variable (N).
Output:
The list of nodes that should contain real and dummy assignments to the variable.
Note: The APT that serves as the input for 'phi_place' is for reversed CFG tree, that is also why we have a tag for postdom.

Weak Control Dependence Computation (weakCD.c, weakCD.h)

Based on APT construction and Phi-placement for SSA, we can easily implement the algorithm that does the weak control dependence computation. A detailed explanation and description of the algorithm can be found in [3].

The interface is as follows:

Input:
CFG (G), Alpha that controls the size of zone (ALPHA).
Output:
APT for weak control dependence (*tree), list of routes for weak control dependence.
Note: As the APT for weak control dependency has an extra node, we also add a dummy node to CFG to make the procedures uniform.

Construction of CFG at Block Level (Package of codes)

To do experiments on SPEC and Perfect benchmark, we implemented a program on the top of Polaris that reads the FORTRAN file, finds all the basic blocks and print out CFG at block level in the following format: That is also the format of input file for weak control dependence construction. We also implemented a program that generates not only CFGs but also the lists of blocks that contain assignment for scalar variables, which has been used to generate test files for SSA construction.


Source Code Pointers

Source Codes for Weak Control Dependence Computation:

  1. Makefile
  2. main.c

  3. Read from files the description of CFGs, invoke the routine that builds Weak Control Dependence for the CFGs.
  4. weakCD.c, weakCD.h

  5. Contains the major routines that build the APT for Weak Control Dependence, including algorithms described in Figure 2 and Figure 5 of [3].
  6. computeSSA.c, computeSSA.h

  7. Implementation of Phi-placement algorithm specified in Figure 10 of [2].
  8. buildAPT.c, buildAPT.h

  9. Implementation of APT data structure: algorithms specified in [4]. For flexibility, the procedure can produce APT structure for postdominant tree or dominant tree (which is simply the postdominant tree for reversed CFG) according to the specified tag.
  10. query.c, query.h

  11. Implementation of queries: cd(), conds() and cdequiv()
  12. pdom.c

  13. Implementation of the algorithm described in [1], which construct dominant or postdominant tree for the given CFG.
  14. cfg.c, cfg.h

  15. Construct CFG data structure from the description of CFG.
  16. set.c, set.h

  17. Implementation of integer set, for use in pdom.c

Source Codes for Phi-Placement for SSA:

  1. Makefile
  2. main.c

  3. Read from files the description of CFGs, and also lists of node where assignments locate, compute the list of nodes for Phi-placement.
  4. computeSSA.c, computeSSA.h

  5. Implementation of Phi-placement algorithm specified in Figure 10 of [2].
  6. buildAPT.c, buildAPT.h

  7. Implementation of APT data structure: algorithms specified in [4]. For flexibility, the procedure can produce APT structure for postdominant tree or dominant tree (which is simply the postdominant tree for reversed CFG) according to the specified tag.
  8. query.c, query.h

  9. Implementation of queries: cd(), conds() and cdequiv()
  10. pdom.c

  11. Implementation of the algorithm described in [1], which construct dominant or postdominant tree for the given CFG.
  12. cfg.c, cfg.h

  13. Construct CFG data structure from the description of CFG.
  14. set.c, set.h

  15. Implementation of integer set, for use in pdom.c

Source Codes for CFG Construction at Block Level:

  1. Makefile for CFG only
  2. interface.cc, interface.h

  3. Read from files the fortran programs and get the inner representation using Polaris
  4. cfg.cc, cfg.h

  5. Find basic blocks for the Fortran program, and print out CFGs at block level.
  6. Makefile for both CFG and lists of assignments
  7. cfgAndSSA.cc, cfg.h

  8. Find basic blocks for the Fortran program, and print out CFGs at block level, besides, it also goes through the symbol table, find and print out the lists of blocks that contain assignments to each scalar variable on the symbol table.

Experimental Results

A Stair-like CFG is a standard model problem for Control Dependence investigations. We compare the storage requirements for Standard Control Dependence (SCD) and Weak Control Dependence (WCD).

Figures 1(a) and (b) show storage requirements for SCD as problem size is varied. The storage axis measures the total number of routes stored at all nodes of the tree. The storage required for the CDG grows quadratically with problem size. The storage needed for APT is between the storage needed for CDG (full caching) and the storage needed if there is no caching. As expected, the storage requirements for APT grow linearly, and increase as Alpha decreases. Note that the storage requirement remains constant for Alpha>=1. The reason is that all chariot routes are of the form [b, root). That leads to a single zone for Alpha>=1. Also observe that we have a single-node zones (full caching) for n<(1/Alpha)-1.

Figures 2(a) and (b) show storage requirements for WCD. There are no loops in the Stair-like CFG, so the SCD and the WCD are the same. Our experiments show that the APT storage requirements for SCD and WCD are the same, as expected.

Another standard model problem is a nest of repeat-until loops, where the problem size is the number of nested loops, n. We confirm the results about Standard Control Dependence reported in [2], and compare them with the new results about Weak Control Dependence:

Figures 3(a) and (b) show storage requirements for the SCD as problem size is varied. The storage required for the CDG is n(n+3) which grows quadratically as expected. Again, the storage needed for APT grows linearly and is between the storage needed for CDG (full caching) and the storage needed if there is no caching.

Figures 4(a) and (b) show storage requirements for the WCD. For Alpha<1, the actual storage requirement is smaller than that of the SCD but the pattern of growth is the same. Note that the storage requirement remains constant for Alpha>=1. The reason is that all chariot routes are of the form [b, root), as for the stair-like CFG, so there is a single zone for Alpha>=1.

Figures 5(a) and (b) show the preprocessing and Phi-placement times for 200 nested loops. (All times are measured on a SUN-4.) Figure 5(a) shows how preprocessing time varies with Alpha. Figure 5(b) shows how Phi-placement time varies with Alpha and the nesting depth of the assignment. The Node Number axis denotes the position of a variable definition. Observe that a well-chosen Alpha (e.g. about 1/4) can lead to much better results than a no caching (big Alpha) or full caching (small Alpha) algorithm. The best choice for Alpha depends on the nesting depth of the assignment.

'Real' programs, such as the SPEC and Perfect benchmarks, are less challenging than the model problems. For the stair-like CFG and the nested repeat-until loops, the ratio of storage required for full caching to the storage required for no caching grows with the problem size. For procedures in the SPEC and Perfect benchmarks, this number is fairly independent of problem size, and is close to 3. The APT data structure can reduce storage requirements by about a factor of half without sacrificing performance.

The above conclusion holds for both the SCD and the WCD. Our experiments show that the storage requirements for the WCD are about the same as for the SCD. The preprocessing time for the WCD (after building the APT for SCD) is about 2.5 times greater than that for the SCD.

Figures 6(a) and (b) show the storage requirements for the FORTRAN procedures in SPEC, for the SCD and the WCD respectively.

Figures 7(a) and (b) show the preprocessing times for the FORTRAN procedures in SPEC, for the SCD and the WCD respectively. The preprocessing time for the WCD includes building the Sibling Connectivity Graph and finding the crowns, but not building the APT for the SCD.

Figures 8(a) and (b) show the storage requirements for the procedures in Perfect, for the SCD and the WCD respectively.

Figures 9(a) and (b) show the preprocessing times for the procedures in Perfect, for the SCD and the WCD respectively. The preprocessing time for the WCD includes building the Sibling Connectivity Graph and finding the crowns, but not building the APT for the SCD.

We tested the Phi-placement for SSA on the FORTRAN procedures in SPEC and on the Perfect benchmarks. We measured the space requirements and the time needed for preprocessing (building the APT) and the Phi-placement. The storage requirements, as expected, decrease as Alpha increases. We have full caching for small Alpha (less than 2^(-6)), and no caching for big Alpha (greater than 2^6). The preprocessing time also decreases with Alpha, corresponding to the space requirements. The time for Phi-placement increases with Alpha. It is almost constant for small Alpha (less than 1) and for big Alpha (greater than 2^10). Any Alpha less than 1 is a good choice as far as time is concerned. If we take into account both space and time, the best choice for Alpha is about 1. Thus the APT-based algorithm with Alpha=1 is as fast as the standard (full caching) algorithm, but requires less space. It is about 2 times faster than a no-caching algorithm.

Figure 10(a) shows space requirements as Alpha varies. The space axis shows the total number of routes cached for all FORTRAN procedures in SPEC. Figure 10(b) shows the preprocessing, Phi-placement, and total time as Alpha varies. The time measured is the total for all FORTRAN procedures in SPEC.

Figure 11(a) shows space requirements as Alpha varies. The space axis shows the total number of routes cached for all procedures in Perfect. Figure 11(b) shows the preprocessing, Phi-placement, and total time as Alpha varies. The time measured is the total for all procedures in Perfect.


References

  1. Thomas Lengauer and Robert Endre Tarjan.

  2. A Fast Algorithm for Finding Dominators in a Flowgraph.
    ACM Transactions on Programming Languages and Systems, 1(1): 121 -141, July, 1979
  3. Keshav Pingali and Gianfranco Bilardi.

  4. APT: A Data Structure for Optimal Control Dependence Computation.
    In Programming Language Design and Implementation. SIGPLAN, 1995.
  5. Gianfranco Bilardi and Keshav Pingali.

  6. A Framework for Generalized Control Dependence.
    In Programming Language Design and Implementation. SIGPLAN, June 1996.
  7. Keshav Pingali and Gianfranco Bilardi.

  8. Optimal Control Dependence and the Roman Chariots Problem.
    ACM Transactions on Programming Languages and Systems (TOPLAS), 19(3), May 1997.