After the lowering transformation, the IR is now in a canonical form in which the code is a top-level sequence of statements \(s_1; \dots; s_n\). In this lowered form, the IR is closer to assembly code, but one of its statements is particularly unrealistic: the statement \(\textit{CJUMP}(e, l_1, l_2)\), because it jumps to one of two different labels. Conditional jumps at the assembly level (often called branches) typically jump to specified code location when a condition is true, but simply fall through to the next statement in sequence when the condition is false.
A simple-minded way to make one of the two labels be a fall-through is to alway follow a \(CJUMP\) with a \(JUMP\), so that \(\textit{CJUMP}(e, l_1, l_2)\) is rewritten as \(\textit{CJUMP}(e, l_1, l_2'); LABEL(l_2'); JUMP(NAME(l_2))\). However, this would roughly double the number of jumps in the code, increasing code size and also likely hurting performance since processors run fastest on straight-line code sequences.
Instead, we solve this problem by trying to rearrange the code so that the “false” label of each conditional jump lies immediately after the jump itself.
When reordering code, it is helpful to notice that there are roughly three kind of statements in our IR. Given a statement \(s\) that is followed in the top-level sequence by some statement \(s'\), it can be one of the following:
A basic block is a sequence of statements with the property that if any part of the basic block is executed, the entire basic block must be executed. Therefore, when when reordering code, basic blocks are the natural unit of code to move: there is almost certainly no point to splitting up a basic block.
Given that the entire code is a sequence \(s_1; \dots; s_n\), a basic block is a subsequence \(s_i, \dots, s_j\) where all statements other than \(s_i\) and \(s_j\) are ordinary. Statement \(s_i\) may be a label, and \(s_j\) may be a jump or return. We can visualize the basic block as taking the following form. However, note that even a single statement of any kind may be treated as a basic block by this definition, since a basic block does not have to begin with a label or end with a jump.
For example, consider the following sequence of code. One way to break it up into basic blocks is as shown below. These basic blocks are all maximal: they can't be made any larger.
L0: | CJUMP(e, L2, L3) |
L1: | MOVE(x, y) |
L2: | MOVE(x, y + z) |
JUMP(L1) | |
L3: | CALL(f, x) |
RETURN |
Control-flow graphs (sometimes called flowgraphs) are graphs in which the nodes are basic blocks and the edges describe possible control flow between them. Once the code is in control-flow graph (CFG) form, labels and unconditional jumps no longer add new information, and can be elided. With this elision, the previous example of basic blocks generates the following CFG:
A control-flow graph of basic blocks captures most of the interesting information needed to do a good job of reordering code. The usual technique for finding a good ordering of basic blocks is to construct traces.
In the context of compilation, a trace is just a simple path through the control-flow graph: a sequence of one or more basic blocks in which each block has a control-flow edge to the next one in the sequence, and no block appears more than once. For example, in the control-flow graph above, the following sequences are all traces: (0, 2, 1), (1, 2), (0, 3), and (1). But the following are not: (2, 1, 2) and (3, 2)
Ideally, the ordering of basic blocks should be such that it creates large traces. And for good instruction-cache performance, it is good to create traces containing frequently executed code
A simple algorithm that works well is to choose traces greedily. As the algorithm proceeds, it marks basic blocks that have already been chosen.
When choose an unmarked block to begin a new trace, a useful heuristic is to choose a block that has no unmarked predecessors—since a larger trace could be constructed starting from such a trace. Another useful heuristic is that whenever a block is to be chosen, either as the beginning of a trace or as the next block in a trace, it is good to choose a “hot” block that is expected to be executed frequently. Such a choice can be made based on data from profiling or based on the program structure: for example, basic blocks in inner loops can be expected to be hot.
In our example code, we might start the greedy reordering algorithm by choosing the trace (0, 2, 1) and then appending a singleton trace (3) to obtain the ordering (0, 2, 1, 3). Alternatively, we might start with (0, 3) and then append (1, 2) to obtain (0, 3, 1, 2).
By concatenating the traces, we obtain an ordering of the basic blocks. Now the jumps between blocks need to be repaired to restore the desired program behavior:
If we choose the ordering (0, 2, 1, 3) for our previous code example, these four steps update the code as shown:
L0: | CJUMP(e, L3) |
L2: | MOVE(x, y + z) |
| |
L1: | MOVE(x, y) |
JUMP(L2) | |
L3: | CALL(f, x) |
RETURN |