CS
4120: Introduction to Compilers
Fall 2011
Problem Set 5: Program Analysis and Advanced Language Features
due: Monday, November 21, 11:59PM
- Bounding intervals
We want to design a dataflow analysis that computes conservative intervals bounding the values of all integer variables. An analysis along these lines could be used for eliminating bounds checks, for example. We extend the set Z of integers with plus and minus infinity: Z∗ = Z ∪ {+∞, −∞}, such that −∞ < n, and n < +∞ for any integer n. We then define a lattice over the set L = {[a, b] | a, b ∈ Z∗ ∧ a ≤ b} ∪ {⊤}.
- Explain what the element ⊤ represents and why we need it for this analysis. Define the partial order and the meet operator for elements in this lattice (including ⊤).
- Using this lattice to compute ranges of variables will fail in general. Explain why.
- To solve the problems from part (b), we define a lattice
L′ = {[a, b] | a, b
∈ {−∞, -k, -2, −1, 0, 1, 2,...,k, +∞} ∧ a ≤ b } ∪ {⊤}
(with the same partial order as before, and for some constant k,
and build a dataflow analysis that computes ranges in L′ . Show the transfer
functions for the following nodes:
- assignment to a constant
x = n
- addition
x = y + z
- multiplication
x = y * z
- assignment to a constant
- This is an example of an analysis where it makes sense to propagate different
information along different out edges from a node. Give a suitable transfer function
for each of the out edges of a comparison
x < n
, and show that with this analysis we can derive a bound [10, 10] for variablei
at the exit from the following example:i = 0 while (i < 10) { i = i + 1 }
Defending against zombies with dataflow analysis
Let us define "undead" code as code which depends on a variable that is always uninitialized. When such undead code is removed, additional program regions may become undead due to the disappearance of variable declarations. The goal of this exercise is to remove all undead code from a function using only a single analysis pass.
-
Design a dataflow analysis that can be used for cascading undead code removal.
Describe its domain, the meet operator, the meet of the empty set, as well as
the transfer function. Where necessary, be conservative. You only need to
specify the transfer function for the following cases:
- start
- x=c
- if x
- x=y+z
- x=f(y)
- x=[y]
- [x]=y
- Show that the transfer functions you defined are monotonic, and either show that they are distributive or construct a counterexample.
-
Show that one run of your analysis leads to the removal of the following
grayed-out undead code:
a = 1
if (f(a) > 0) {
c = c+1
d = f(c)
}
[d] = a+c
g = a+d
-
Design a dataflow analysis that can be used for cascading undead code removal.
Describe its domain, the meet operator, the meet of the empty set, as well as
the transfer function. Where necessary, be conservative. You only need to
specify the transfer function for the following cases:
- Object layout
Consider the following class declarations in a Java-like language with subtyping, inheritance, and method overloading, but with a simple type hierarchy in which no class has two parents:
class A { int a; int f() { code 1 } } class B extends A { int b; int f() { code 2 } int g(int x) { code 3 } int h() { code 4 } } class C extends A { int c; void f(float x) { code 5 } int j() { code 6 } int g() { code 7 } }
Assuming the simple dispatch table scheme with sequentially assigned method indices, draw the memory layout of three objects of classes A, B, and C respectively, including the objects themselves, their dispatch tables and the code for all their methods.
Now, suppose instead that the language supported multiple inheritance and there was a class D defined as follows.
class D extends B, C { int d; int f() { code 8 } }
Assuming the language implementation is based on sparse dispatch tables, draw what objects of all four classes might look like, and give a table showing the indices of each method.
-
Suppose Xi were extended with higher-order lexically-scoped functions.
Then we could write the following code:
f(n:int, g1:()→int, g2:()→int) = { x: int = n + 12; g(): int = { return x; } if (n == 1) return g() + g1() + g2(); // X else return f(n-1, g, g1); }
Note that the function call
f(3, g0, g0)
returns 42 regardless of what functiong0
is passed in. Assuming heap-allocated activation records, draw what the stack and heap would look like at the point where the statement marked X is executed. Identify any assumptions you are making about the implementation.