# Lesson 4: Data Flow

## Gist

### The Data Flow Framework

• Reaching definitions are an example of a global property that require you to look at an entire CFG.
• Terminology:
• Use: An instruction uses all of its arguments. (So any instruction with arguments is a "use.")
• Definition: An instruction defines the variable it writes to. (You can call every instruction that writes to a variable a "definition.")
• Available: Definitions that reach a given point in the program are available at that point.
• Kills: Any definition kills all of the currently available definitions.
• The data flow framework. Here's how you solve a global analysis problem by imbuing a local analysis with a dose of data-flow magic:
1. Figure out the thing you want to know at the entry and exit of each block.
2. Write an equation for every block relating that thing at the entry to that thing at the exit. (In general, this is a local analysis for the block.)
3. Generate equalities according to the edges in the CFG.
4. Solve the system of equations! (Using a general solver algorithm that you don't need to adapt to every problem.)
• Instantiating the data flow framework for reaching definitions.
• Initial value: the empty set.
• Transfer function: `out(in) = def(b) ∪ (in - kills(b))`
• Merge function: union.
• The worklist algorithm for solving data flow.

Here's the pseudocode for solving a forward data flow problem with a worklist algorithm:

``````in[entry] = init
out[*] = init

worklist = all blocks
while worklist is not empty:
b = pick any block from worklist
in[b] = merge(out[p] for every predecessor p of b)
out[b] = transfer(b, in[b])
if out[b] changed:
worklist += successors of b
``````

For the backward version, flip predecessors & successors, and flip `in` and `out`.

### Instantiating Data Flow

• Theory interlude: the requirements for a general data flow analysis. How do you know the worklist algorithm terminates and gives you the right answer?
• The domain of values you're trying compute needs to form a partial order with a unique lower bound. The rough idea is that the worklist algorithm should only "move" values monotonically in the order, so it's guaranteed to eventually terminate.
• In terms of a partial order ⊑, the merge function is the meet (greatest lower bound) operator ⊓; the initial value is the top value ⊤; and the transfer function must be a monotonic function, so `x ⊑ y` implies `f(x) ⊑ f(y)`.
• The usual definition of a "correct" solution to a data-flow problem is the meet-over-all-paths solution: the meet of chained applications of the transfer functions for every path in the CFG from the entry block to any given block.
• For more on the theory, I recommend Chapter 5 of Static Program Analysis by Møller and Schwartzbach.
• More examples of things you can do with the data flow framework.
• Reaching definitions.
• Live variables: which variables are both defined and going to be used at some point in the future?
• Constant propagation: which variables have statically knowable constant values?
• Available expressions: which expressions have already been computed computed in the past? (Useful in CSE.)
• Initialized variables: like in Java, which variables have had something written to them?
• Interval analysis: what is the numerical range of values that a given variable might hold?
• Demonstrating a simple generic implementation.
• If you want, you can use the `{args}` feature of Turnt and its `-a` command-line flag to quickly switch between different analyses.