Program Analysis as Non-Standard Denotational Semantics

We can use a non-standard denotational semantics to reason about
programs.  The basic insight is that if we abstract from actual
values, keep our abstractions suitably finite, and are always
conservative, then we can compute conservative approximations of the
flow of a given program.

For example, let us consider the abstract domain D defined
as:

  D ::= Z | Z+ | Z- | 0

where Z represents the set of all integers, Z+ represents the set of
all positive integers, Z- all negative integers, 0 represents the
singleton set { 0 }, and {} represents the empty set of integers.

This is an abstraction of our concrete domain of integers, and it's
only one abstraction that we could choose for a given analysis
problem.  It has the property that there is an "information ordering"
in the sense Z is a superset of all of the other abstract domain
elements.  It has less information than something like Z+ or 0.

                          Z
                        / | \
                      /   |   \
                    Z-    0    Z+


In principle, we can use any lattice to abstract the domain of values,
but it helps to have a finite lattice or at least a lattice of finite
height (no infinite ascending or descending chains.)

We can define a denotational semantics using this abstract domain as
follows:

First, we use abstract stores which will be functions from variables
to D instead of variables to integers.  In other words, we're going to
forget what specific integer value a variable holds and only remember
whether it's zero, positive, negative, etc.  If we don't know anything
about the variable, then we'll have to assume that it's any possible
integer (i.e., Z).

  S in AbsStore : Var -> D

Next, we provide an interpretation of expressions E' that respects our
abstraction:


  E'[i]S = 0   if i = 0
           Z-  if i < 0
           Z+  if i > 0

  E'[x]S = S(x)

  E'[e1 + e2]S = 0    if E'[e1]S = E'[e2]S = 0 
                 Z+   if (E'[e1]S = E'[e2]S = Z+) or 
                         (E'[e1]S = Z+ and E'[e2]S = 0) or
                         (E'[e1]S = 0 and E'[e2]S = Z+)
                 Z-   if (E'[e1]S = E'[e2]S = Z-) or 
                         (E'[e1]S = Z- and E'[e2]S = 0) or
                         (E'[e1]S = 0 and E'[e2]S = Z-)
                 Z    otherwise

Note that evaluating an integer i returns either 0, Z-, or Z+ but not
Z.  This reflects perfect information with respect to the abstraction.
We're returning the most precise thing that we possibly can.

The variable case is just as in the standard denotational semantics --
we lookup the value in the store.  Of course, here, the store returns
an abstract domain value instead of an integer.

Finally, we had to interpret the operation + in a way that's
consistent with the domain.  For instance, we can only say that we get
a positive number out if we know that e1 and e2 yield non-negative
numbers.

In general, the property that we want is that if we run the real
semantics, whatever value we get out is contained in the set returned
by the abstract semantics.  Formally, we can say that an abstract
store S is faithful to a concrete store s if for all variables x, s(x)
is an element of S(x).

Then, we can say that E' is faithful to E if for expressions e, all
stores s, and all abstract stores S that are faithful to s, E[e]s is
an element of E'[e]S.

We can define a similar function B' for boolean valued expressions.
However, we'll need a new abstract domain for booleans:

  D2 ::= True | False | DontKnow

Here, DontKnow represents the set {true,false} while True 
reprsents {true} and False {false}.  Then we can define B'
as follows:

  B'[true] = True
  B'[false] = False
  B'[e1 <= e2] = True if (E'[e1]S = Z- and E'[e2]S = 0 or Z+) or else
                         (E'[e1]S = 0 and E'[e2]S = Z+)
               = False if (E'[e1]S = Z+ and E'[e2]S = 0 or Z-) or else
                         (E'[e1]S = 0 and E'[e2]S = Z-)
               = DontKnow otherwise

Next we need to define our abstract semantics for commands, C'.  The
first few cases work just as before:

  C'[skip]S = S

  C'[x := e]S = S[x -> E'[e]S]

  C'[c1 ; c2]S = C'[c2](C'[c1]S)

We run into problems with if-commands, because in general, we don't
know which branch will be taken.  In particular, if we have "if e1 <=
e2 then c1 else c2", and we only know that e1 and e2 are integers,
then we don't know which branch will be selected.  Thus, we must
conservatively look at both branches to compute the possible output
state, and we must somehow merge the information in the two output
states to get a single abstract store.

 C'[if e then c1 else c2]S = 

      C'[c1]S  if B'[e]S = True

      C'[c2]S  if B'[e]S = False

      merge_stores(C'[c1]S, C'[C2]S)  otherwise


where we define merge_stores(S1,S2) as:

      { (x,merge(S1(x),S2(x))) | x in Var }

and merge for two abstract domains as:

       merge(Z,_) = Z
       merge(_,Z) = Z
       merge(X,X) = X

In general, we calculate the union of the two sets and then find the
smallest abstract domain element that is big enough to cover the
union.  Since Z contains everything, merging it with one of the other
domains always yields Z.  

As an example, consider what we get out of analyzing:

      if x <= 0 then x := x + 1 else skip

If we assume on input that x is any negative integer (i.e., S(x) = Z-)
then the analysis works as follows:

A. First, we need to compute E'[x]S = S(x) = Z-
B. Then we need to compute E'[0]S = 0
C. We know that all elements of Z- are less than 0, so we only
   need to calculate C'[x := x + 1]S and return that.
D. C'[x := x + 1]S requires computing E'[x + 1]S.  Since E'[x]S = S(x) = Z-,
   and E'[1]S = Z+, we can only conclude that the result is a Z.
   Thus, C'[x := x + 1]S = S[x -> Z].

Now let's analyze a different program:

      if x <= 0 then x := 3 else x := -4

and assume on input that x is any integer (i.e., S(x) = Z).

A. compute E'[x]S = S(x) = Z.
B. compute E'[0]S = 0.
C. We can't tell which way the if goes since we don't know whether
   x <= 0 (i.e., all elements of Z aren't less-than or equal to zero,
   nor are they all greater than 0.)  So, we have to compute both
   possible outcomes and merge them.
D. Compute C'[x := 3]S = S[x -> Z+]  since E'[3]S = Z+.
E. Compute C'[x := -4]S = S[x -> Z-] since E'[-4]S = Z-.
F. Merge the two output states S[x -> Z+] and S[x -> Z-]
   which results in S[x -> Z] since in one case, x is positive
   and in another case it's negative.  The most we can say is
   that after executing the if-command, x is an integer.  

So much for conditionals.  What about while loops?  

  C'[while e do c]S = ???

Suppose we could guess an output S' for this.  What properties
should S' have?  Well, it ought to be the case that S' is
faithful to the original semantics.  In particular, if
we have a concrete store s and S is faithful to s (i.e.,
for all x, s(x) is an element of S(x)), and C[while e do c]s = s', 
then it ought to be the case that s' is faithful to S'.

One thing we could guess for S' is the store that maps
every variable to Z.  Let's call that store Top.  Top is 
certainly faithful.  It also has the property that:

  C'[if e do (c; while e do c) else skip]S <= Top

where S1 <= S2 means for all x, S1(x) is a subset of S2(x).
So, we could just use Top, but that's a little imprecise.

I claim that you can do something like the following:

  fun loop S = 
      let S' = merge_stores(S,C'[c]S)
      if S' = S then return S'
      else loop S'

That is, start with the input state, compute C'[c]S (the result of
running the body of the while loop), and merge that with the input
state to produce an S'.  If S' is different than S (i.e., for some
variable x, S(x) != S'(x)), then we try again but using S' as the
input state.

Why would this work?  First, note that:

  (1) S1 <= merge_stores(S1,S2) and S2 <= merge_stores(S1,S2) and 
  (2) if S1 <= S2, then merge_stores(S,S1) <= merge_stores(S,S2).

So, that tells us that after we go around the loop, S' is more
abstract than S and is more abstract than C'[c]S.  So, for instance,
if S'(x) = 0, then it must be the case that S(x) = 0 and C'[c]S(x) =
0.

Second, suppose we've reached a point where S' = S.  That is,
merge(S,C'[c]S) = S.  Well, for one thing, it tells us that going
around the loop again won't matter, because we'll always be getting
the same S out.  So we might as well stop.

Now I need to convince you of two things: First, using loop in this
fashion will terminate (i.e., we'll eventually find a fixed point for
the loop.)  This is pretty easy to see -- if we go around the loop,
then we get something that is more abstract than we had on input.
That means that for some variable, we went from 0,Z+, or Z- to Z.
Now, the next time around the loop, that variable can't change.  There
can only be a finite number of variables that change (since the
program can only mention a finite number of variables), so sooner or
later, we'll stop.  In this case, we should be able to stop within n
iterations where n is the number of program variables.  (If we had
a deeper lattice, it might require more iterations.)

Technically, I need to convince you that this is a faithful
approximation.  I won't prove this formally, but it's pretty easy to
see that for all finite unrollings of the while loop, we get out
something that is a faithful approximation.  

--------------------------------------------------------------------
Homework 3:

1. Given the following datatypes for IMP programs:

   type var = string
   datatype oper = Plus | Times | Minus | Divide
   datatype exp = Var of var | Int of int | Oper of exp * oper * exp
   datatype bexp = True | False | LessThanEq of exp * exp
   datatype com = Skip | Assign of var * exp | Seq of com * com |
     If of bexp * com * com | While of bexp * com

Write an analysis which conservatively determines whether or not the
program can divide by zero.  You should build a non-standard
denotational semantics with an abstract domain that records whether an
integer value can be negative, positive, zero, non-negative,
non-positive, or simply an integer.  That is, your lattice will have
more elements than the example I gave above.

You'll need to define suitable merge operations for abstract integers
(and abstract booleans).  You'll want to use abstract stores that are
finite maps from variables to abstract integers (e.g., association
lists or something from the SML library) and define a suitable
merge_stores function.

2.  Suppose we modified IMP so that we allowed assigning boolean
values to variables, as well as integer values.  For instance,
we could write:

  x := true;
  x := 1 + 42;
  if x then x := 42 else x := true;

We want to allow variables to hold both integers and booleans, but we
would like to warn the programmer if they ever use a variable in a way
that is inconsistent with the way the variable was last assigned.  In
particular, if the variable was last assigned as an integer, but is
used as a boolean, then this should generate a warning.  Similarly, if
the variable was last assigned a boolean but is then used as an
integer, we should generate a warning.

Write down an analysis (on paper -- or as ML code) which generates
these warnings.