CS 4120: Introduction to Compilers
Fall 2009

Problem Set 4: Program Analysis and Advanced Language Features

due: Monday, November 30, 11:59PM

Bounding intervals
We want to design a dataﬂow analysis that computes conservative intervals bounding the values of all integer variables. An analysis along these lines could be used for eliminating bounds checks, for example. We extend the set Z of integers with plus and minus infinity: Z^∗ = Z ∪ {+∞, −∞}, such that −∞ < n, and n < +∞ for any integer n. We then define a lattice over the set L = {[a, b] | a, b ∈ Z^∗ ∧ a ≤ b} ∪ {⊤}.
1. Explain what the element ⊤ represents and why we need it for this analysis. Define the partial order and the meet operator for elements in this lattice (including ⊤).
2. Using this lattice to compute ranges of variables will fail in general. Explain why.
3. To solve the problems from part (b), we deﬁne a lattice L′ = {[a, b] | a, b ∈ {−∞, -k, -2, −1, 0, 1, 2,...,k, +∞} ∧ a ≤ b } ∪ {⊤} (with the same partial order as before, and for some constant k, and build a dataflow analysis that computes ranges in L′ . Show the transfer functions for the following nodes:
  - assignment to a constant x = n
  - addition x = y + z
  - multiplication x = y * z
4. This is an example of an analysis where it makes sense to propagate different information along different out edges from a node. Give a suitable transfer function for each of the out edges of a comparison x < n, and show that with this analysis we can derive bound [1, 11] for variable i in the following example:
```
i = 0;
while (i < 10) {
    i = i + 1;
}
```
Cascading dead code removal
Live variable analysis can be used to identify dead code. Any variable that is not live is dead. However, removing side-effect-free assignments to dead variables can cause other variables not to be live. For example, in the following code the variable e is dead. But once the assignment to e is removed, variable d is no longer live. It would be better if we didn't have to rerun live variable analysis every time dead code is removed.
1. a = b + c;
2. if (a) {
3. b = c + 1;
4. c = c + 1;
5. }
6. d = c;
7. e = d + b;
8. a = f(b);
1. Define an analysis that identifies the variables that are marked dead after an arbitrarily long sequence of alternating live variable analysis (as defined earlier) and removal of dead code. The domain of dataflow values is the same as for live variable analysis. All that is needed is to define the necessary transfer functions. Give the transfer function for the following cases:
  - x = n
  - x = y + z
  - x = f(y)
  - x = [y]
  - [x] = y
  - if x
2. Show that one run of your analysis enables the code above to be optimized as follows, with dead code shown in gray:
  1. a = b + c;
  2. if (a) {
  3. b = c + 1;
  4. c = c + 1;
  5. }
  6. d = c;
  7. e = d + b;
  8. a = f(b);
Object layout
Consider the following class declarations in a Java-like language with subtyping, inheritance, and method overloading, but with a simple type hierarchy in which no class has two parents:
```
class A {
    int a;
    int f() { code 1 }
}

class B extends A {
    int b;
    int f() { code 2 }
    int g(int x) { code 3 }
    int h() { code 4 }
}

class C extends A {
    int c;
    void f(float x) { code 5 }
    int j() { code 6 }
    int g() { code 7 }
}
```
Assuming the simple dispatch table scheme with sequentially assigned method indices, draw the memory layout of three objects of classes A, B, and C respectively, including the objects themselves, their dispatch tables and the code for all their methods.

Now, suppose instead that the language supported multiple inheritance and there was a class D defined as follows.
```
class D extends B, C {
    int d; 
    int f() { code 8 }
}
```
Assuming the language implementation is based on sparse dispatch tables, draw what objects of all four classes might look like, and give a table showing the indices of each method.
Suppose Iota were extended with higher-order lexically-scoped functions. Then we could write the following code:
```
f(n:int, g1:()→int, g2:()→int) = {
    x: int = n + 12;
    g(): int = { return x; }
    if (n == 1)
	return g() + g1() + g2(); // X
    else
	return f(n-1, g, g1);
}
```
The function call f(3, g0, g0) returns 42 regardless of what function g0 is passed in. Assuming heap-allocated activation records, draw what the stack and heap would look like at the point where the statement marked X is executed. Identify any assumptions you are making about the implementation.

CS 4120: Introduction to Compilers Fall 2009

Problem Set 4: Program Analysis and Advanced Language Features

CS 4120: Introduction to Compilers
Fall 2009