Lesson 6: Static Single Assignment
- discussion thread
- static single assignment
-
SSA slides from Todd Mowry at CMU
another presentation of the pseudocode for various algorithms herein -
Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency
by Boissinot on more sophisticated was to translate out of SSA form -
SSA is Functional Programming
among other fun things, this is usually cited as the origin of the “basic block arguments” idea -
How I implement SSA form
Filip Pizlo's notes on how to implement the data structures for an SSA IR -
A catalog of ways to generate SSA
many more links to papers about the myriad ways to generate SSA code - tasks due March 6
Gist
You have undoubtedly noticed by now that many of the annoying problems in implementing analyses & optimizations stem from variable name conflicts. Wouldn’t it be nice if every assignment in a program used a unique variable name? Of course, people don’t write programs that way, so we’re out of luck. Right?
Wrong! Many compilers convert programs into static single assignment (SSA) form, which does exactly what it says: it ensures that, globally, every variable has exactly one static assignment location. (Of course, that statement might be executed multiple times, which is why it’s not dynamic single assignment.) In Bril terms, we convert a program like this:
@main {
a: int = const 4;
b: int = const 2;
a: int = add a b;
b: int = add a b;
print b;
}
Into a program like this, by renaming all the variables:
@main {
a.1: int = const 4;
b.1: int = const 2;
a.2: int = add a.1 b.1;
b.2: int = add a.2 b.1;
print b.2;
}
Of course, things will get a little more complicated when there is control flow. And because real machines are not SSA, using separate variables (i.e., memory locations and registers) for everything is bound to be inefficient. The idea in SSA is to convert general programs into SSA form, do all our optimization there, and then convert back to a standard mutating form before we generate backend code.
ϕ-Nodes
Just renaming assignments willy-nilly will quickly run into problems. Consider this program:
@main(cond: bool) {
.entry:
a: int = const 47;
br cond .left .right;
.left:
a: int = add a a;
jmp .exit;
.right:
a: int = mul a a;
jmp .exit;
.exit:
print a;
}
If we start renaming all the occurrences of a
, everything goes fine until we try to write that last print a
.
Which “version” of a
should it use?
To match the expressiveness of unrestricted programs, SSA adds a new kind of instruction: a ϕ-node. ϕ-nodes are flow-sensitive copy instructions: they get a value from one of several variables, depending on which incoming CFG edge was most recently taken to get to them.
Using Bril syntax, we could write a ϕ-node as an instruction with opcode phi
:
a.4: int = phi .left a.2 .right a.3;
This phi
instruction chooses between any number of variables, and it picks between them based on labels.
If the program most recently executed a basic block with the given label, then the phi
instruction takes its value from the corresponding variable.
You can write the above program in SSA like this:
@main(cond: bool) {
.entry:
a.1: int = const 47;
br cond .left .right;
.left:
a.2: int = add a.1 a.1;
jmp .exit;
.right:
a.3: int = mul a.1 a.1;
jmp .exit;
.exit:
a.4: int = phi .left a.2 .right a.3;
print a.4;
}
It can also be useful to see how ϕ-nodes crop up in loops.
The SSA Philosophy
In addition to a language form, SSA is also a philosophy! It can fundamentally change the way you think about programs. In the SSA philosophy:
- definitions == variables
- instructions == values
- arguments == data flow graph edges
In LLVM, for example, instructions do not refer to argument variables by name—an argument is a pointer to defining instruction.
Converting to SSA: Basic Version
To convert to SSA, we need to rename variables and insert ϕ-nodes to reconcile these renamed variables.
Within a basic block, the strategy is pretty simple: on every write, just introduce a new “version” of the variable and make sure that all subsequent uses refer to that name.
Across basic blocks, here’s a really inefficient strategy that creates too many names and too many ϕ-nodes. If your function uses n variables and has m blocks, create n × m new variable names so each block has its own, private copy of the variable on entry. Then, at the top of every block, insert n ϕ-nodes to pick the right incoming value among the predecessors’ names.
This naive strategy would produce code like this:
@main(cond: bool) {
.entry:
entry.a: int = const 47;
br cond .left .right;
.left:
left.a: int = phi .entry entry.a;
left.a.2: int = add left.a left.a;
jmp .exit;
.right:
right.a: int = phi .entry entry.a;
right.a.2: int = mul right.a right.a;
jmp .exit;
.exit:
exit.a: int = phi .left left.a.2 .right right.a.2;
print exit.a;
}
It seems like we can probably do a lot better: i.e., produce many fewer variable “versions” and many fewer ϕ-nodes.
Converting to SSA: For Real
To convert to SSA less wastefully, we want to insert ϕ-nodes whenever there are distinct paths containing distinct definitions of a variable. We don’t need ϕ-nodes in places that are dominated by a definition of the variable. So what’s a way to know when control reachable from a definition is not dominated by that definition? The dominance frontier!
We do it in two steps. First, insert ϕ-nodes:
for v in vars:
for d in Defs[v]: # Blocks where v is assigned.
for block in DF[d]: # Dominance frontier.
Add a ϕ-node to block,
unless we have done so already.
Add block to Defs[v] (because it now writes to v!),
unless it's already in there.
Then, rename variables:
stack[v] is a stack of variable names (for every variable v)
def rename(block):
for instr in block:
replace each argument to instr with stack[old name]
replace instr's destination with a new name
push that new name onto stack[old name]
for s in block's successors:
for p in s's ϕ-nodes:
Assuming p is for a variable v, make it read from stack[v].
for b in blocks immediately dominated by block:
# That is, children in the dominance tree.
rename(b)
pop all the names we just pushed onto the stacks
rename(entry)
Converting from SSA
Eventually, we need to convert out of SSA form to generate efficient code for real machines that don’t have phi
-nodes and do have finite space for variable storage.
The basic algorithm is pretty straightforward. If you have a ϕ-node:
v = phi .l1 x .l2 y;
Then there must be assignments to x
and y
(recursively) preceding this statement in the CFG.
The paths from x
to the phi
-containing block and from y
to the same block must “converge” at that block.
So insert code into the phi
-containing block’s immediate predecessors along each of those two paths:
one that does v = id x
and one that does v = id y
.
Then you can delete the phi
instruction.
This basic approach can introduce some redundant copying. (Take a look at the code it generates after you implement it!) Non-SSA copy propagation optimization can work well as a post-processing step. For a more extensive take on how to translate out of SSA efficiently, see “Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency” by Boissinot et al.
Alternative SSA Representations
There are some tricky problems with ϕ-nodes that make them a little more subtle than they may appear at first. You can either deal with them directly or pick a “twist” on the classic SSA formula that can help avoid these subtleties altogether.
Subtleties in Classic SSA
While ϕ-nodes behave like “normal” instructions in some ways, they are weird in other ways. The biggest way in which ϕ-nodes are weird is that their semantics depends on the control-flow history of the program. Returning to our first example above:
a.4: int = phi .left a.2 .right a.3;
The result we produce in a.4
depends on something that happened in the past:
whether we just “came from” the .left
block or the .right
block.
No other instruction depends on this strange, hidden state about the history of the program’s execution.
A second subtlety is that, if you have multiple phi
s at the beginning of a basic block, you want them to execute “simultaneously.”
Consider a loop in SSA form like this:
@swappy {
i: int = const 5;
one: int = const 1;
zero: int = const 0;
.l0:
x0: int = const 0;
y0: int = const 1;
jmp .l1;
.l1:
x1: int = phi .l0 x0 .l1 y1;
y1: int = phi .l0 y0 .l1 x1;
print x1 y1;
cond: bool = gt i zero;
i: int = sub i one;
br cond .l1 .end;
.end:
}
Roughly speaking, here’s what this code is supposed to do:
let x = 0;
let y = 1;
for (let i = 5; i > 0; --i) {
swap(x, y);
print(x, y);
}
However, what happens if you execute those phi
instructions in the loop in order, one at a time?
x1: int = phi .l0 x0 .l1 y1;
y1: int = phi .l0 y0 .l1 x1;
If we came from a previous iteration of the loop (the .l1
label), then the first instruction would set x1
to y1
, and then the second would set y1
to the same value.
That’s not what we want!
This pitfall was named the “swap problem” in this 1997 paper by Briggs et al..
The resolution is to define the semantics of ϕ-nodes so that, when you have several of them next to each other, they run “simultaneously”: they both read their arguments, and then they both write to their destinations without clobbering each other’s arguments. And that is clearly different from any other kind of instruction.
To make this work, it is common for compilers with SSA ILs to require ϕ-nodes to appear together at the beginnings of basic blocks. (This is true in LLVM, for example.) This restriction makes sense because ϕ-nodes are for “inter-block” communication, but it can become a practical annoyance to maintain the invariant. For example, imagine you have an optimization that can combine two basic blocks when it can prove that a branch is unnecessary. That pass needs to carefully move around the two blocks’ ϕ-nodes to get them all up at the beginning of the resulting, combined block.
Basic Block Arguments
One way of looking at these inconveniences is that ϕ-nodes should stop pretending that they are “just normal instructions.” If their semantics depends on control flow, if groups of them have to be handled together so they work simultaneously, and if they have to appear in a special position within basic blocks… who are we fooling by trying to treat them as instructions inside of blocks? They would be better off embracing their true identity as constructs intertwined with the CFG itself.
Some compilers, therefore, eschew ϕ-nodes altogether and replace them with basic block arguments. (This design has appeared prominently in MLIR and Swift’s IR, and the idea is often attributed to Andrew Appel’s short and fun 1998 paper titled “SSA is Functional Programming”.) These IRs give their basic blocks named parameters; whenever you jump to a block, you must provide arguments for those parameters.
Here’s a rough impression of what that might look like in Bril-like syntax:
@main(cond: bool) {
.entry():
a.1: int = const 47;
br cond .left() .right();
.left():
a.2: int = add a.1 a.1;
jmp .exit(a.2);
.right():
a.3: int = mul a.1 a.1;
jmp .exit(a.3);
.exit(a.4):
print a.4;
}
Notice that the jmp
instructions here pass different arguments to the .exit
block’s parameter, just like a function call.
I think it’s interesting that, compared to ϕ-nodes, basic block arguments push the information to the “other end” of each CFG edge. With ϕ-nodes, a basic block enumerates all the possible sources for a given variable, one for each in-edge in the CFG; with basic block arguments, the sources are distributed to the predecessor blocks.
The upshot is that basic block arguments fulfill the same essential role as ϕ-nodes while being a little more transparent about the complexity that they introduce into the IR and the invariants that the compiler must enforce.
Pizlo’s Upsilon/Phi Variant
Here’s another twist on classic SSA with a similar motivation to basic block arguments that Filip Pizlo recently sketched and named after himself. As with basic block arguments, the idea is to split ϕ-nodes into two parts: the “sending side” (the predecessor blocks) and the “receiving side” the (successor block). But let’s return to using normal-ish instructions instead of changing the way labels, branches, and jumps work.
Instead of just one phi
instruction, let’s introduce two new instructions: set
and get
.
(These are called upsilon
and phi
in Pizlo’s sketch, but I’ve renamed them for clarity.)
The semantics of these instructions will read and write from shadow variables that exist in a separate environment from the “normal” variables in a function.
Critically, these shadow variables do not obey the SSA restriction that applies to regular variables: they can have multiple writers, in the same way that classic ϕ-nodes can get their values from multiple predecessors.
The set
instruction looks like this:
set x y;
and it sets the shadow variable x
to the value in the normal variable y
.
Then, the get
instruction looks kinda funky:
x: type = get;
It reads the value of the shadow variable x
and copies it into the regular variable with the same name, x
.
These instructions obey an important restriction that comes automatically from the fact that the program is in SSA form, i.e., there is only one instruction in a given function that writes to a given variable name.
There is therefore only one get
per shadow variable, because the normal-variable destination of get
also determines its shadow-variable source.
This means that shadow variables are static single use (SSU): there may be multiple definitions, but there is only one use.
Which has a kind of satisfying symmetry with the SSA constraint on normal variables.
Writing programs in this version of SSA entails “splitting” ϕ-nodes across the predecessor and successor blocks. Here’s an example:
@main(cond: bool) {
.top:
a: int = const 5;
set c a;
br cond .here .there;
.here:
b: int = const 7;
set c b;
.there:
c: int = get;
print c;
}
The big advantage of this form is that the instructions behave more like other instructions.
There is no need for set
or get
instructions to have “simultaneous” semantics because they already split the read side from the write side.
There is therefore no danger of the “swap problem.”
There is certainly one big difference between set
and get
and every other instruction: they interact with the “shadow environment.”
However, unlike the deeper weirdness of the phi
instruction (simultaneous semantics; sensitivity to control-flow history), their weirdness is a little more conventional:
you can treat it much like any other side effect.
Your compiler already needs a way to restrict reordering things around print
s or memory loads and stores;
it can use the same mechanism to remain careful about reordering things around changes to the shadow environment.
Finally, there is no need to require the instructions to appear at a particular place within a basic block.
An optimization that stitches together two blocks, for instance, can leave get
s and set
s sprinkled willy-nilly throughout the block and defer any code motion to a separate pass.
Bril in SSA
Bril has an SSA extension that follows the “upsilon/phi form” representation.
It adds support for set
and get
instructions, which correspond to Pizlo’s upsilon and phi operations.
Beyond that, SSA form is just a restriction on the normal expressiveness of Bril—if you solemnly promise never to assign statically to the same variable twice, you are writing “SSA Bril.”
The reference interpreter has built-in support for set
and get
, so you can execute your SSA-form Bril programs without fuss.
Tasks
- Implement the “into SSA” and “out of SSA” transformations on Bril functions.
- For both directions, you can pick which version of the transformation to implement. For the “into SSA” transformation, for example, you can try the basic version (which should be pretty straightforward), the dominance-frontier-based version (which is a bit harder), or any other technique you find in the literature.
- One thing to watch out for: a tricky part of the translation from the pseudocode to the real world is dealing with variables that are undefined along some paths. The Bril SSA extension has an
undef
instruction you can use to handle this case. - Previous 6120 adventurers have found that the “full” version of the “into SSA” transformation can be tricky to get right. Leave yourself plenty of time, and test thoroughly.
- Convince yourself that your implementation actually works!
- Check that the output of your “to SSA” pass is actually in SSA form. There’s a really simple
is_ssa.py
script that can check that for you. - You also must check that programs do the same thing when converted to SSA form and back again. Fortunately, brili supports the
phi
instruction, so you can interpret your SSA-form programs if you want to check the midpoint of that round trip.
- Check that the output of your “to SSA” pass is actually in SSA form. There’s a really simple
- Measure the overhead. If you take a program on a round trip through SSA form and back, how many more instructions (static or dynamic) does the final program have than the original? Report this overhead so you can compare your implementation against the rest of the class. If you ended up implementing more than one version of the transformations, you can compare them against each other.