Problem Set 4: Code transformations and Induction

Due on October 27, 2005 at 11:59 pm.


Instructions

You will submit a total of four separate files: part1.sml , part2.sml , part3.sml, and part4.txt, Modify the .sml template files from ps4source.zip. As always, don't change function names, and remember: non-compiling code will receive an automatic zero. Do NOT compress the files into a zip file. You must submit each file individually using CMS.

You have the option to select a partner for this assignment. Only groups of at most two persons are permitted. You can register your group through CMS. Choose your partner carefully; it is not an acceptable excuse not to submit a finished homework because "the other partner did not work enough." All your files should have a comment at the top with the name, netID, and section  of both students.


Preamble: Mini-ML

We have introduced the substitution model in class in order to provide an execution model for a carefully chosen subset of SML. In this problem we give you a SML interpreter that implements the substitution model as we specified it in class. Parts 1, 2 and 3 are focused on extending the interpreter with several code transformations.

The Mini-ML Language

We will focus on a small functional language specified by the following grammar:
e ::= c                        (* constants *)
    | id                       (* variables *)
    | (fn id => e)             (* anonymous functions *) 
    | e1(e2)                   (* function applications *)
    | u e                      (* unary operations, ~, not, etc. *)
    | e1 b e2                  (* binary operations, +,*,etc. *)
    | (if e then e1 else e2)   (* if expressions *)
    | let d in e end           (* let expressions *)
    | (fun f x => e)            (* fun expressions *)

Declarations:
d ::= val id = e               (* value declarations *)
    | fun id(id1) = e          (* function declarations *)

Note that Mini-ML is not a proper subset of SML, since SML does not have fun expressions. Also, note that our language has neither case statements, nor pattern matching. All our functions have exactly one argument, and all let statements contain exactly one declaration.

To simplify the implementation, we will assume that the language has only two basic types, integers and booleans. No other types can be defined, also, no type specifications are accepted anywhere in the program.

The Abstract Syntax Tree (AST)

One of the basic problems we need to address is to find a suitable program representation that can serve as input for our interpreter. One possibility is to use plain-text representation. While most convenient for the user, such a representation is not very suitable for direct evaluation. The conversion to a more convenient representation could be implemented in SML, and it raises many interesting questions (which are addressed in other CS courses). These representation conversion issues are ancillary to our goal in this problem, thus, we will represent our programs as if the conversion step has already taken place. The representation we choose is named the abstract syntax tree (AST) and it is based on the following type definitions:
datatype decl = VAL_D of string * expr
              | FUN_D of string * string * expr

and expr = INT of int
         | BOOL of bool
         | FN of string * expr
         | ID_E of string
         | FAPP_E of expr * expr
         | FUN_E of string * string * expr (* name * arg * body *)
         | LET_E of decl * expr
         | IF_E of expr * expr * expr
         | BINOP_E of expr * string * expr
         | UNOP_E of string * expr

First, note the and keyword that links the two datatype definitions. This keyword makes it explicit that these two type definitions are mutually recursive (i.e. type decl refers to type expr in its definition, and vice-versa). Such simultaneous declarations are necessary for mutually recursive functions as well.

We use "_D" to mark a datatype constructor for declarations and "_E" to mark a datatype constructor for expressions. Note that INT, BOOL, and FN do not have any special marking. This is to emphasize their special status as both expressions and values. We could have defined "expression" constants and "value" constants, for example, and you should know that this is often done. Because of the peculiar features of Mini-ML, however, it turns out that our representation is more convenient. Make sure you do not get confused by the fact that our "values" really are, according to their type, expressions. Only integers, booleans, and anonymous functions can be values (i.e. results of an evaluation).

The abstract syntax tree represents the essential structure of the program, which is stripped by all unnecessary details. Here are a number of val definitions in SML that encode Mini-ML programs:
val e_3 = INT 3

val e_let = LET_E(VAL_D("x", INT 3),
                  BINOP_E(ID_E "x", "+", INT 7))

val e_fn = FAPP_E(FN("x", BINOP_E(ID_E "x", "+", INT 25)), INT 7)

val e_fun = FUN_E("function", "x", ID_E "x")

val e_fact = LET_E(FUN_D("fact",
			  "n",
			  IF_E(BINOP_E(ID_E "n", "=", INT 0),
			       INT 1,
			       BINOP_E(ID_E "n",
				       "*",
				       FAPP_E(ID_E "fact",
					      BINOP_E(ID_E "n",
						      "-",
						      INT 1))))),
		     FAPP_E(ID_E "fact",
			    INT 3))

Mini-ML Interpreter

We have provided a file interpreter.sml that implements an interpreter of Mini-ML. In that file you will find implementations of unary and binary operators, substitutions and evaluation of expressions. This code is provided to help you in two respects: to get familiar with manipulations over the AST, and to be able to debug your own assignment. Here are a few examples of evaluations using the expressions defined above:
- eval(e_3)
val it = INT 3 : expr
- eval e_let;
val it = INT 10 : expr
- eval e_fn;
val it = INT 32 : expr
- eval e_fun;
val it = FN ("x", ID_E "x") : expr
- eval e_fact;
val it = INT 6 : expr

The interpreter.sml file also includes the functions printDecl and printExpr that print a declaration and an expression, respectively. For example:
- printExpr(e_fact) "";
let
  fun fact (n) = if (n = 0) then 1 else (n * (fact (n - 1) ))
in
  fact 3
end
val it = () : unit

The file parser.sml provides functions parse and parseString which you can use to generate a Mini-ML AST from Mini-ML code text. Feel free to use these, but have in mind that the parser is more strict than in standard SML (e.g. arguments to functions cannot be specified without parenthesis). We have included some examples in file include.sml for your convenience. For example, you can write a Mini-ML expression in a file "a.sml", then do the following:
- CM.make();
- open Interpreter;
- case (Parser.parse (TextIO.openIn "a.sml")) of
      NONE     => raise Fail " Parsing Error "
    | SOME(s1) => (printExpr s1 ""; eval s1)

Problem 1: Code normalization

The first part of the assignment consists on normalizing Mini-ML programs. You are going to work on two normalizations:
  1. Removing anonymous functions:

    We would like to avoid expressions with anonymous functions. We can change the code to add new let constructs that will bind the functions to some name. For example
    let 
       val x = 4
    in
       (fn y => y+1) x
    end
    
    can be transformed to:
    let 
       val x = 4
    in
       (let fun f (y) = (y+1) in f end) x
    end
    
    Note that Mini-ML supports two function expressions: (fn x => e) and (fun f x => e). We will also change fun expressions by adding a let expression like in the example above. The advantage of fun expressions is that you already have a name for the function, for anonymous functions you need to give a fresh name to each function you add. For this purpose we give you a function Given.newFun() (in file include.sml) which returns a fresh new function name (f_i) each time is called.

    One small complication for this problem is that since we have no pairs in Mini-ML, functions that need several parameters are written in curried form. Ie. anonymous functions are used to pass several arguments. These anonymous functions are allowed. For instance:
    let
      fun f (y) = fn (z) => y + z
    in
      f 2 4
    end
    
    Should remain unchanged after the normalization.

    You need to implement this normalization in a function
    removeAnonymous: expr -> expr
    Given an input AST, the result of removeAnonymous is an AST that:

  2. Pulling let constructs to the outer-most level of their scope:

    We would like to have all let expressions at the top level. Since Mini-ML doesn't have a top level environment, you will produce nested let expressions, and place them at the outer-most level of their scope. For example the program on the left should be normalized to produce the program on the right:
    Input Output
    let 
       fun g (x) = 
         let  
           val y = 3 * x
         in
           x + (let val w = 2 * y in w end)
         end
    in
       (let fun f (y) = (g(y+1) + g(y-1)) in f end) 3
    end
    
    let 
       fun g (x) = 
         let  
           val y = 3 * x
         in
           let
              val w = 2 * y
            in
              x + w
            end
         end
    in
       let 
          fun f (y) = (g(y+1) + g(y-1))
       in
          f 3
       end
    end
    

    Notice, in the example above we didn't move the declaration of y and w outside function g: we don't lift let expressions outside function declarations.

    The main complication of this transformation is to make sure that you replace some declaration names to avoid name conflicts. For example:
    Input Output
    let 
      val x = 1
    in
      (let val x = 2 in x end) + x
    end
    
    let 
      val x = 1
    in 
      let 
        val v_1 = 2
      in
        v_1 + x
      end
    end
    
    You should use the functions Given.newVar() to create fresh variable names, and Given.newFun() to create fresh function names. You can assume that these fresh names are different from any name present in the input programs.

    Write your solution in the function
    liftLet: expr -> expr
    
    Given an input AST, the result of liftLet is an AST that:


File part1.sml contains the functions you need to implement.

Problem 2: Lambda Lifting

Lambda lifting is another kind of normalization. The idea is to remove the use of local function definitions. For example, consider the following SML program:
let
  fun f (x) = 
    let 
        fun g (z) =  z + (x + 4)
    in
        g 3
    end
in
  f 5
end
We would like to extract g outside f, and simplify the program as follows:
fun g (v_1) (z) = z + (v_1 + 4)
fun f (x) = g x 3
f 5

To do lambda-lifting we need to:

  1. Assume that the code doesn't contain anonymous functions, and that let expressions are lifted to the outer-most level of their scope.
  2. For each function declaration, we identify all free variables. Free variables are variable names that are not bounded by a declaration or an argument in the current scope. For example, x is a free variable in the following expression:
     fun g(z) = z + (x + 4)
    

    We will replace each free variable with an additional argument to the enclosing function, and pass that argument to every use of the function. This is why, in the example above, the function g now receives a new formal argument v_1, and we passed x as an actual argument in the body of function f.

  3. Replace every local function definition that has no free variables with an identical global function. This is when we moved g to the top level. This step may require renaming functions to avoid name conflicts.
Since in Mini-ML we don't have a top level environment, the result of the example above should look as follows:
let
  fun g(v_1) => fn (z) => z + (v_1 + 4)
in
  let
     fun f (x) = g x 3
  in
     f 5
  end
end
We ask you to implement the answer to this problem in several steps:
  1. freeVar: Implement a function
    val freeVar: string  * (string list) -> bool
    
    Which takes the name of a variable, and a list of variable names that are bounded. The result is true if the variable doesn't belong to the set of bounded variables. Eg:
    freeVar("x", ["y","z"]) = true
    freeVar("x", ["x","z"]) = false
    
  2. canLiftVar: Write a function
      val canLiftVar: expr * (string list) -> bool
    
    Its arguments are a Mini-ML expression (e) and a list of bounded variable names. It returns true if is any subexpression of e can be lifted. This is true in two possible cases:
  3. extendDecl: Write a function
      val extendDecl: decl * string list -> string * decl 
    
    extendDecl receives a declaration and a list of new variable names. The idea is to extend a function declaration with additional arguments. Eg:
       extendDecl (*val x = 5*,["v_1"]) = ("",*val x = 5*)
       extendDecl (*val f = fn y => y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)
       extendDecl (*fun f y = y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)
    
    The first component of the result is the name of the function being extended (if any).
  4. extendBody: Write a function
      val extendBody: expr * string * expr list -> expr 
    
    Which takes an expression, a function name, and a list of expression arguments. It produces a new expression by passing the arguments whenever the function name is used:
    extendBody (*f 1*,"f",[*y*]) = *f y 1*
    extendBody (*let fun f x = x in f 1 end*,"f",[*y*]) = *let fun f x = x in f 1 end*
    
    Note in the second example the let declaration redefines the name of f, thus f is not extended in this case. Make sure you deal correctly with these cases.
  5. liftVars: Write a function
      val liftVars: expr -> expr 
    
    That takes an expression, and recursively extends all function definitions to avoid having any free variables. For example,
    Input Output
       let
         fun f(x) = fn y =>
           let
             fun g(z) = (z + x)
           in
             (g 3 + y)
           end
       in
         f 3 4
       end
    
      let
         val f = fn x => fn y =>
         let
           val g = fn v_1 => fn z => (z + v_1)
         in
           (g x 3 + y)
         end
       in
         f 3 4
       end
    
    Notice that we removed the fun f x declaration and replace it by val f = fn x . This is a legal change in this example because the function is non-recursive. Mini-ML also allows this change for recursive functions. This is because val declarations in Mini-ML are like val rec declarations in SML. You can see how this is implemented in function eval_decl (file interpreter.sml).

    In summary, liftVars should take a normalized AST (according to what you did in part 1), and return a new AST, where all variables inside function declarations are bounded to a parameter for that function.

  6. liftGlobal: Finally, write a function
      val liftGlobal: expr -> expr
    
    Which takes the result of the last step and pulls all function declarations to the outer-most level. For example, liftGlobal applied on the previous result should return:
      let
         let
           val g = fn v_1 => fn z => (z + v_1)
         in 
            let
              val f = fn x => fn y => (g x 3 + y)
            in
              f 3 4
            end
         end
    
    As in the liftLet problem, this step may require renaming functions to avoid name conflicts. The result of this step is a new AST that contains all function declarations at the outer-most level. Let expressions may exist inside function declarations, but only to define local primitive values, not functions.

File part2.sml contains the functions you need to implement.

Problem 3: Fully Lazy Lambda Lifting

An optimization to lambda lifting is to give full laziness. The idea is to extract maximal free expressions, instead of only free variables. This allows to reduce the amount of recalculation in a program.

Free expressions are program subexpressions that are closed or only contain free variables. For instance, in
let
  fun f (x) = 
    let
        fun g (z) = z + (x + 4)
    in
        g 3
    end
in
  f 5
end 
x is the only free variable inside function g, and x + 4 is the largest expression that doesn't contain bounded variables (is a maximal free expression). We would like to lift the free expression as follows:
let
  fun g v_1 (z) = z + v_1
in
  let
    fun f (x) = g (x + 4) 3
  in
    f 5
  end
end 
Instead of simply lifting the free variable x, we have added an argument for the expression x + 4.

You will follow the same process as before to implement this part:

  1. freeExpr: Implement a function
    val freeExpr: expr  * (string list) -> bool
    
    Which takes an expression, and a list of variable names that are bounded. The result is true if the expression doesn't use any bounded variables. Eg:
    freeExpr(*x + 4*, ["y","z"]) = true
    freeExpr(*x + 4*, ["x","z"]) = false
    freeExpr(*let val x = 4 in x end*", ["x","z"]) = true
    
  2. canLiftExpr: Write a function
      val canLiftExpr: expr * (string list) -> bool
    
    like before, it returns true if:
  3. liftExpr: Write a function
      val liftExpr: expr -> expr 
    
    That takes an expression, and recursively extends all function definitions to avoid having any free subexpressions. Notice that you should be able to reuse the functions extendDecl and extendBody from the previous part:
      val extendDecl: decl * string list -> string * decl 
      val extendBody: expr * string * expr list -> expr 
    
    The resulting AST is similar as in liftVars, except that the extended function application may contain large expressions (not only variables), and the function declarations are simplified to contain smaller expressions.

Notice that liftGlobal can also be applied on expressions returned from liftExpr.


File part3.sml contains the functions you need to implement.

Problem 4: Induction

  1. Using the technique shown in class, find a formula for S3(n), the sum of the third powers of the integers from 1 through n. Prove by induction that your formula is correct.
  2. Use induction to show that there are at most mh leaves in an m-ary tree of height h.
  3. Emily is learning proofs by induction and she has come up with the following proof that a tree with n nodes has (n-1) edges.

    Proof:

    Base case: A tree with one node has zero edges because the only edges in such a tree would form a self-loop.

    Inductive hypothesis: Any tree with k nodes has (k-1) edges.

    Inductive step: Starting with a tree T that has k nodes, form a tree with (k+1) nodes by adding a new node and connecting it to a node in tree T. This construction required adding one extra node and one extra edge, so the theorem is true for this tree as well.

    Conclusion: By induction, we can conclude that a tree with n nodes has (n-1) edges.

    1. Is this is a valid proof by induction? If not, what is the problem with the proof.
    2. Is the theorem correct? If so, give a valid proof by induction of the theorem.
  4. Consider the following datatype.
    datatype 'a tree = Empty | Node of 'a tree * 'a * 'a tree
    
    Given the function rev defined below - intuitively, this creates a mirror image of the input tree.
    fun rev Empty = Empty
        | rev (Node(t1,n,t2)) = Node(rev(t2),n,rev(t1))
    
    Use structural induction to prove that rev(rev(t)) = t.

Write up your solutions in a text file part4.txt.