Problem Set 4: Code transformations and Induction

Due on October 27, 2005 at 11:59 pm.

Instructions

You will submit a total of four separate files: part1.sml , part2.sml , part3.sml, and part4.txt, Modify the .sml template files from ps4source.zip. As always, don't change function names, and remember: non-compiling code will receive an automatic zero. Do NOT compress the files into a zip file. You must submit each file individually using CMS.

You have the option to select a partner for this assignment. Only groups of at most two persons are permitted. You can register your group through CMS. Choose your partner carefully; it is not an acceptable excuse not to submit a finished homework because "the other partner did not work enough." All your files should have a comment at the top with the name, netID, and section of both students.

Preamble: Mini-ML

We have introduced the substitution model in class in order to provide an execution model for a carefully chosen subset of SML. In this problem we give you a SML interpreter that implements the substitution model as we specified it in class. Parts 1, 2 and 3 are focused on extending the interpreter with several code transformations.

The Mini-ML Language

We will focus on a small functional language specified by the following grammar:

e ::= c (* constants *) | id (* variables *) | (fn id => e) (* anonymous functions *) | e1(e2) (* function applications *) | u e (* unary operations, ~, not, etc. *) | e1 b e2 (* binary operations, +,*,etc. *) | (if e then e1 else e2) (* if expressions *) | let d in e end (* let expressions *) | (fun f x => e) (* fun expressions *)

Declarations:

d ::= val id = e (* value declarations *) | fun id(id1) = e (* function declarations *)

Note that Mini-ML is not a proper subset of SML, since SML does not have fun expressions. Also, note that our language has neither case statements, nor pattern matching. All our functions have exactly one argument, and all let statements contain exactly one declaration.

To simplify the implementation, we will assume that the language has only two basic types, integers and booleans. No other types can be defined, also, no type specifications are accepted anywhere in the program.

The Abstract Syntax Tree (AST)

One of the basic problems we need to address is to find a suitable program representation that can serve as input for our interpreter. One possibility is to use plain-text representation. While most convenient for the user, such a representation is not very suitable for direct evaluation. The conversion to a more convenient representation could be implemented in SML, and it raises many interesting questions (which are addressed in other CS courses). These representation conversion issues are ancillary to our goal in this problem, thus, we will represent our programs as if the conversion step has already taken place. The representation we choose is named the abstract syntax tree (AST) and it is based on the following type definitions:

datatype decl = VAL_D of string * expr | FUN_D of string * string * expr and expr = INT of int | BOOL of bool | FN of string * expr | ID_E of string | FAPP_E of expr * expr | FUN_E of string * string * expr (* name * arg * body *) | LET_E of decl * expr | IF_E of expr * expr * expr | BINOP_E of expr * string * expr | UNOP_E of string * expr

First, note the and keyword that links the two datatype definitions. This keyword makes it explicit that these two type definitions are mutually recursive (i.e. type decl refers to type expr in its definition, and vice-versa). Such simultaneous declarations are necessary for mutually recursive functions as well.

We use "_D" to mark a datatype constructor for declarations and "_E" to mark a datatype constructor for expressions. Note that INT, BOOL, and FN do not have any special marking. This is to emphasize their special status as both expressions and values. We could have defined "expression" constants and "value" constants, for example, and you should know that this is often done. Because of the peculiar features of Mini-ML, however, it turns out that our representation is more convenient. Make sure you do not get confused by the fact that our "values" really are, according to their type, expressions. Only integers, booleans, and anonymous functions can be values (i.e. results of an evaluation).

The abstract syntax tree represents the essential structure of the program, which is stripped by all unnecessary details. Here are a number of val definitions in SML that encode Mini-ML programs:

val e_3 = INT 3 val e_let = LET_E(VAL_D("x", INT 3), BINOP_E(ID_E "x", "+", INT 7)) val e_fn = FAPP_E(FN("x", BINOP_E(ID_E "x", "+", INT 25)), INT 7) val e_fun = FUN_E("function", "x", ID_E "x") val e_fact = LET_E(FUN_D("fact", "n", IF_E(BINOP_E(ID_E "n", "=", INT 0), INT 1, BINOP_E(ID_E "n", "*", FAPP_E(ID_E "fact", BINOP_E(ID_E "n", "-", INT 1))))), FAPP_E(ID_E "fact", INT 3))

Mini-ML Interpreter

We have provided a file interpreter.sml that implements an interpreter of Mini-ML. In that file you will find implementations of unary and binary operators, substitutions and evaluation of expressions. This code is provided to help you in two respects: to get familiar with manipulations over the AST, and to be able to debug your own assignment. Here are a few examples of evaluations using the expressions defined above:

- eval(e_3) val it = INT 3 : expr - eval e_let; val it = INT 10 : expr - eval e_fn; val it = INT 32 : expr - eval e_fun; val it = FN ("x", ID_E "x") : expr - eval e_fact; val it = INT 6 : expr

The interpreter.sml file also includes the functions printDecl and printExpr that print a declaration and an expression, respectively. For example:

- printExpr(e_fact) ""; let fun fact (n) = if (n = 0) then 1 else (n * (fact (n - 1) )) in fact 3 end val it = () : unit

The file parser.sml provides functions parse and parseString which you can use to generate a Mini-ML AST from Mini-ML code text. Feel free to use these, but have in mind that the parser is more strict than in standard SML (e.g. arguments to functions cannot be specified without parenthesis). We have included some examples in file include.sml for your convenience. For example, you can write a Mini-ML expression in a file "a.sml", then do the following:

- CM.make(); - open Interpreter; - case (Parser.parse (TextIO.openIn "a.sml")) of NONE => raise Fail " Parsing Error " | SOME(s1) => (printExpr s1 ""; eval s1)

Problem 1: Code normalization

The first part of the assignment consists on normalizing Mini-ML programs. You are going to work on two normalizations:

Removing anonymous functions:
We would like to avoid expressions with anonymous functions. We can change the code to add new let constructs that will bind the functions to some name. For example
let val x = 4 in (fn y => y+1) x end
can be transformed to:
let val x = 4 in (let fun f (y) = (y+1) in f end) x end
Note that Mini-ML supports two function expressions: (fn x => e) and (fun f x => e). We will also change fun expressions by adding a let expression like in the example above. The advantage of fun expressions is that you already have a name for the function, for anonymous functions you need to give a fresh name to each function you add. For this purpose we give you a function Given.newFun() (in file include.sml) which returns a fresh new function name (f_i) each time is called.
One small complication for this problem is that since we have no pairs in Mini-ML, functions that need several parameters are written in curried form. Ie. anonymous functions are used to pass several arguments. These anonymous functions are allowed. For instance:
let fun f (y) = fn (z) => y + z in f 2 4 end
Should remain unchanged after the normalization.
You need to implement this normalization in a function
removeAnonymous: expr -> expr
Given an input AST, the result of removeAnonymous is an AST that:
- Has no FUN_E expressions
- The only FN expressions in the AST are used for functions with multiple arguments (in curried form).

Pulling let constructs to the outer-most level of their scope:

We would like to have all let expressions at the top level. Since Mini-ML doesn't have a top level environment, you will produce nested let expressions, and place them at the outer-most level of their scope. For example the program on the left should be normalized to produce the program on the right:

Input Output

let fun g (x) = let val y = 3 * x in x + (let val w = 2 * y in w end) end in (let fun f (y) = (g(y+1) + g(y-1)) in f end) 3 end

let fun g (x) = let val y = 3 * x in let val w = 2 * y in x + w end end in let fun f (y) = (g(y+1) + g(y-1)) in f 3 end end

Notice, in the example above we didn't move the declaration of y and w outside function g: we don't lift let expressions outside function declarations.

The main complication of this transformation is to make sure that you replace some declaration names to avoid name conflicts. For example:

Input Output

let val x = 1 in (let val x = 2 in x end) + x end

let val x = 1 in let val v_1 = 2 in v_1 + x end end

You should use the functions Given.newVar() to create fresh variable names, and Given.newFun() to create fresh function names. You can assume that these fresh names are different from any name present in the input programs.

Write your solution in the function

liftLet: expr -> expr

Given an input AST, the result of liftLet is an AST that:

All let declarations are group together
All group of let declarations are either:
- At the top-level (like functions g and f in the first example above)
- At the outer-most level of a function declaration (like y and w inside function g)
Tip: to check your implementation, make sure your resulting AST doesn't have any let expression inside another kind of expression (such as binary operations).
Tip: look at interpreter.sml you might find functions that are useful and will save you some time and work.

File part1.sml contains the functions you need to implement.

Problem 2: Lambda Lifting

Lambda lifting is another kind of normalization. The idea is to remove the use of local function definitions. For example, consider the following SML program:

let
  fun f (x) = 
    let 
        fun g (z) =  z + (x + 4)
    in
        g 3
    end
in
  f 5
end

We would like to extract g outside f, and simplify the program as follows:

fun g (v_1) (z) = z + (v_1 + 4)
fun f (x) = g x 3
f 5

To do lambda-lifting we need to:

Assume that the code doesn't contain anonymous functions, and that let expressions are lifted to the outer-most level of their scope.
For each function declaration, we identify all free variables. Free variables are variable names that are not bounded by a declaration or an argument in the current scope. For example, x is a free variable in the following expression:
fun g(z) = z + (x + 4)
We will replace each free variable with an additional argument to the enclosing function, and pass that argument to every use of the function. This is why, in the example above, the function g now receives a new formal argument v_1, and we passed x as an actual argument in the body of function f.
Replace every local function definition that has no free variables with an identical global function. This is when we moved g to the top level. This step may require renaming functions to avoid name conflicts.

Since in Mini-ML we don't have a top level environment, the result of the example above should look as follows:

let
  fun g(v_1) => fn (z) => z + (v_1 + 4)
in
  let
     fun f (x) = g x 3
  in
     f 5
  end
end

We ask you to implement the answer to this problem in several steps:

freeVar: Implement a function
val freeVar: string * (string list) -> bool
Which takes the name of a variable, and a list of variable names that are bounded. The result is true if the variable doesn't belong to the set of bounded variables. Eg:
freeVar("x", ["y","z"]) = true freeVar("x", ["x","z"]) = false
canLiftVar: Write a function
val canLiftVar: expr * (string list) -> bool
Its arguments are a Mini-ML expression (e) and a list of bounded variable names. It returns true if is any subexpression of e can be lifted. This is true in two possible cases:
- There is a free variable in the expression e. For instance:
  canLiftVar (*x*, ["y"]) = true canLiftVar (*y*, ["y"]) = false
  (Notice: we write *x* to simplify the presentation in this page, you need to replace it with the full expression, in this case ID_E("x"). We will use this same notation though the rest of this document)
- There is a function declaration that contains free variables with respect to the function arguments (regardless of the set of bounded names). For instance,
  canLiftVar (*let fun f x = x + y in f 3 end*, ["y"]) = true canLiftVar (*let fun f x = x + y in f 3 end*, []) = true canLiftVar (*let fun f x = x + y in f 3 end*, ["x","y"]) = true
  In all these cases, y is a free variable inside the body of f, and we would like to lift it (by adding an additional argument in f). Essentially, this scenario applies for any set of bounded names, what is relevant is that: a) a function is declared, b) some variable is used in the body of the function, and c) that variable is not an argument of the function. If in the examples above we add to f the argument y, canLiftVar should return false:
  canLiftVar (*let fun f x = x + y in f 3 end*, _) = true canLiftVar (*let fun f x = fn y => x + y in f 3 end*, _ ) = false

extendDecl: Write a function

  val extendDecl: decl * string list -> string * decl

extendDecl receives a declaration and a list of new variable names. The idea is to extend a function declaration with additional arguments. Eg:

   extendDecl (*val x = 5*,["v_1"]) = ("",*val x = 5*)
   extendDecl (*val f = fn y => y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)
   extendDecl (*fun f y = y + v_1*,["v_1"]) = ("f", *val f = fn v_1 => fn y => y + v_1*)

The first component of the result is the name of the function being extended (if any).

extendBody: Write a function
val extendBody: expr * string * expr list -> expr
Which takes an expression, a function name, and a list of expression arguments. It produces a new expression by passing the arguments whenever the function name is used:
extendBody (*f 1*,"f",[*y*]) = *f y 1* extendBody (*let fun f x = x in f 1 end*,"f",[*y*]) = *let fun f x = x in f 1 end*
Note in the second example the let declaration redefines the name of f, thus f is not extended in this case. Make sure you deal correctly with these cases.

liftVars: Write a function

  val liftVars: expr -> expr

That takes an expression, and recursively extends all function definitions to avoid having any free variables. For example,

Input	Output
let fun f(x) = fn y => let fun g(z) = (z + x) in (g 3 + y) end in f 3 4 end	let val f = fn x => fn y => let val g = fn v_1 => fn z => (z + v_1) in (g x 3 + y) end in f 3 4 end

Notice that we removed the fun f x declaration and replace it by val f = fn x . This is a legal change in this example because the function is non-recursive. Mini-ML also allows this change for recursive functions. This is because val declarations in Mini-ML are like val rec declarations in SML. You can see how this is implemented in function eval_decl (file interpreter.sml).

In summary, liftVars should take a normalized AST (according to what you did in part 1), and return a new AST, where all variables inside function declarations are bounded to a parameter for that function.

liftGlobal: Finally, write a function
val liftGlobal: expr -> expr
Which takes the result of the last step and pulls all function declarations to the outer-most level. For example, liftGlobal applied on the previous result should return:
let let val g = fn v_1 => fn z => (z + v_1) in let val f = fn x => fn y => (g x 3 + y) in f 3 4 end end
As in the liftLet problem, this step may require renaming functions to avoid name conflicts. The result of this step is a new AST that contains all function declarations at the outer-most level. Let expressions may exist inside function declarations, but only to define local primitive values, not functions.

File part2.sml contains the functions you need to implement.

Problem 3: Fully Lazy Lambda Lifting

An optimization to lambda lifting is to give full laziness. The idea is to extract maximal free expressions, instead of only free variables. This allows to reduce the amount of recalculation in a program.

Free expressions are program subexpressions that are closed or only contain free variables. For instance, in

let fun f (x) = let fun g (z) = z + (x + 4) in g 3 end in f 5 end

x is the only free variable inside function g, and x + 4 is the largest expression that doesn't contain bounded variables (is a maximal free expression). We would like to lift the free expression as follows:

let fun g v_1 (z) = z + v_1 in let fun f (x) = g (x + 4) 3 in f 5 end end

Instead of simply lifting the free variable x, we have added an argument for the expression x + 4.

You will follow the same process as before to implement this part:

freeExpr: Implement a function
val freeExpr: expr * (string list) -> bool
Which takes an expression, and a list of variable names that are bounded. The result is true if the expression doesn't use any bounded variables. Eg:
freeExpr(*x + 4*, ["y","z"]) = true freeExpr(*x + 4*, ["x","z"]) = false freeExpr(*let val x = 4 in x end*", ["x","z"]) = true
canLiftExpr: Write a function
val canLiftExpr: expr * (string list) -> bool
like before, it returns true if:
- The expression contains a free subexpression that can be lifted, or
- There is a function declaration that contains a free subexpression (which is free with respect to the set of arguments)
liftExpr: Write a function
val liftExpr: expr -> expr
That takes an expression, and recursively extends all function definitions to avoid having any free subexpressions. Notice that you should be able to reuse the functions extendDecl and extendBody from the previous part:
val extendDecl: decl * string list -> string * decl val extendBody: expr * string * expr list -> expr
The resulting AST is similar as in liftVars, except that the extended function application may contain large expressions (not only variables), and the function declarations are simplified to contain smaller expressions.

Notice that liftGlobal can also be applied on expressions returned from liftExpr.

File part3.sml contains the functions you need to implement.

Problem 4: Induction

Using the technique shown in class, find a formula for S3(n), the sum of the third powers of the integers from 1 through n. Prove by induction that your formula is correct.
Use induction to show that there are at most m^h leaves in an m-ary tree of height h.
Emily is learning proofs by induction and she has come up with the following proof that a tree with n nodes has (n-1) edges.

Proof:
Base case: A tree with one node has zero edges because the only edges in such a tree would form a self-loop.
Inductive hypothesis: Any tree with k nodes has (k-1) edges.
Inductive step: Starting with a tree T that has k nodes, form a tree with (k+1) nodes by adding a new node and connecting it to a node in tree T. This construction required adding one extra node and one extra edge, so the theorem is true for this tree as well.
Conclusion: By induction, we can conclude that a tree with n nodes has (n-1) edges.
1. Is this is a valid proof by induction? If not, what is the problem with the proof.
2. Is the theorem correct? If so, give a valid proof by induction of the theorem.
Consider the following datatype.
datatype 'a tree = Empty | Node of 'a tree * 'a * 'a tree
Given the function rev defined below - intuitively, this creates a mirror image of the input tree.
fun rev Empty = Empty | rev (Node(t1,n,t2)) = Node(rev(t2),n,rev(t1))
Use structural induction to prove that rev(rev(t)) = t.

Write up your solutions in a text file part4.txt.

Input	Output
let val x = 1 in (let val x = 2 in x end) + x end	let val x = 1 in let val v_1 = 2 in v_1 + x end end