CS312 PROBLEM SET 3

Issued: Wednesday, September 24

Due: Wednesday, October 8, 11:59 PM

You may do this problem set alone or in pairs. You will submit the files you changed through CMS.

Part I - Higher Order Fun with Higher Order Functions

Functions in SML can be treated as regular values, hence we can write expressions that "compute" functions. Such flexibility does not exist, for example, in Java and C/C++/C#.

In answering the questions below you must use only functions whose result depends explicitly on all of their arguments (e.g. you can not use constant functions, which are functions that return the same value irrespective of their arguments' values).

Problem 1

Consider the following definition (we have ommitted all type specifications):
fun O x y = x y y

Function O is a curried function of two arguments. The first argument is a 2-argument function, which is applied to the duplicated second argument. The type of O is (t1 -> t1 -> t2) -> t1 -> t2, where t1 and t2 are generic, unspecified types. You can check this type signature if you type its definition at the SML prompt (note that SML uses notations like 'a and ?.X1 for types that are not fully specified).

For this problem, edit prob1.sml.

(a) Provide specific values for x and y such that the value of expression (O x y) is the integer 312.

(b) In the expression O (O (O <X>)) <Y> <Z>, replace each of the placeholders <X> , <Y>, and <Z> with one or more appropriate expressions, such that the entire expression evaluates to 2003.

Problem 2

Consider the definition below:

fun O2 x y z = x (y z)

Operator O2 is a curried function of three arguments that applies the 1-argument function f to the result of applying 1-argument function y to value z.

For this problem, put your answers in prob2.txt.

(a) Define in a manner similar to the explanation above, what function O2 O2 O2 does.

(b) Prove that O2 o O2 = O2 O2 O2. The o operator is the function composition operator defined in SML as (f o g) x = f (g (x)). Hint: Your proof should not be a formal proof within the framework of the environment model, but rather a sequence of transformations that show the equivalence of the two expressions.

Part II - Proof by Induction

In lecture, we discussed proofs of algorithm correctness by induction. Here, you will need to construct such a proof.

fun sum(l:int list):int =
  if (null l) then 0
  else (hd l) + sum (tl l)

For this problem, put your answer in prob3.txt.

Problem 3

Prove that the function sum computes the sum of the integers in the provided list. Use the substitution model to explain its behavior. Your proof must contain all four parts (as defined in lecture) of an induction proof, clearly labeled. If a part is not labelled you will not receive credit for it.

Part III - Mini-ML Interpreter

In the last problem set, you were introduced to the idea of a calculator, that is, an arithmetic expression evaluator. Now, we are introducing an evaluator for a subset of ML. In this problem set, you will work with a fairly simple ML interpreter.

The most important part of this part of the problem set is understanding the interpreter. If you take the time to understand the interpreter before you begin, the following exercises will all be easy. If instead you start typing before thinking, expect to spend a long time on this section.

The Mini-ML Language

The Mini-ML language attempts to be a subset of ML. For reasons of simplicity, though, it does not behave exactly as ML does in some aspects.

Fewer types, no type conversions

The Mini-ML language has very few types. It has integers, booleans, functions, lists, tuples, and unit. There are no type conversions.

Basic syntax

Because of the few types, there are fewer operators and expressions. The following is a representative list of what is still in the language:

+ - * mod div = > < >= <= :: @ andalso orelse
~ not
234 true () foo 
if/then/else let/in/end
fn (x) => x
#1 (3, true)
val f = 3
hd tl null

The syntax of Mini-ML is very similar to that of ML. A formal BNF definition can be found in mini-ml.grm, which is a file that generates the parser. The semantics of Mini-ML is defined by the evaluator, found in evaluator.sml.

No fun

The fun keyword is not present. You can declare functions using fn and bind them to a variable:

val f = fn (x) => x + 1

(The parentheses around function arguments are mandatory.)

Note that recursive functions, as we know them, are impossible. (In the next problem set, we will discuss how it is actually possible to implement them. But for now, let us assume there is no way of doing this.)

No pattern matching

No form of pattern matching exists. This means that any function acting on a list needs to use the functions hd and tl rather than using pattern matching.

Dynamic typing

All typing is done at runtime. When an attempt is made to apply a primitive operation on a type which is not meaningful (i.e. adding booleans, 87 as the condition of an if) an error is raised. If this code is never executed, though, no error is raised. For example,

if true then 3 else 8 + [34]

is a perfectly acceptable expression. This also means that you do not need to declare the types of any expressions in Mini-ML.

No user-defined types

The only types in Mini-ML are those defined by the interpreter. Users cannot create their own.

No records, exceptions, structures, or signatures

These language features are missing from Mini-ML.

Colon commands

Much like the calculator of problem set 2, the evaluator contains a few colon commands. Type :h at the Mini-ML prompt to get a list.

Predefined values

There are a few predefined values in Mini-ML. You can see what they are in ListEnvironment.

Lists are untyped

The dynamic typing allows you to put anything you want in a list. So

[2, true, (), fn (x) => x]

is a perfectly legal list. It is recommended that you keep your lists homogenous, however.

Structure of the interpreter

The outer structure of the interpreter is in the file interpreter.sml. The main read-eval-print loop is here. Basically, it takes input from the user and hands it to one of the functions in the Evaluator structure.

The Evaluator signature, found in evaluator.sml, only defines two functions, but they are integral. The evaluate function turns an expression into a value (as defined in values.sml). The evaluateDeclare function uses a declaration to modify an environment.

An environment is a mapping from names to values. It is where the active variables are stored. (More on this later.) The Interpreter is specialized over the particular environment using a functor, so to actually use the interpreter with ListEnvironment, for instance, you will have to do

structure I = InterpreterFn(ListEnvironment)
I.run()

Once a value has been determined, the read-eval-print loop calls the Printer.show function defined in printer.sml to display the computed value. The other structures exist mainly to define the types on which these ones operate.

Problem 4

Changing the evaluated language

The meaning of life is six times nine is forty-two. Unfortunately, the evaluator does not yet know this. Your job will be to enlighten it.

For questions a and b, put your answers in prob5.txt. Just include the line numbers you changed and what you changed those lines to.

a) Change the evaluator so that any expression that evaluates to 6 multiplied by any expression that evaluates to 9, or vice versa, evaluates to 42.

For instance:

Mini-ML =) 6 * 9;
42
Mini-ML =) (4 + 5) * (12 div 2)
42

b) Change the evaluator so that only an integer constant 6 multiplied directly by another integer constant 9 evaluates to 42. For instance:

Mini-ML =) 6 * 9
42
Mini-ML =) 9 * 6
54
Mini-ML =) (5+1) * 9
54

For question c, submit the changed file.

c) Implement lexicographical comparisons on lists using the <, <=, >, >= operators. Lexicographical essentially means dictionary order.

For nonempty lists L1 and L2,

For any lists L1 and L2,

Some examples:

Mini-ML =) [] < [1,2,3]
true
Mini-ML =) [1] > [1]
false
Mini-ML =) [1,2,4] > [1,2,3]
true
Mini-ML =) [1,2,3] >= [1,2]
true
Mini-ML =) [] <= []
true

Problem 5

Changing the implementation: Environments

The meaning of an expression like x + 3 is totally dependent on what x represents. In programming languages as in mathematics, we call symbols like x variables. A natural question to ask is how are values assigned to variables?

An environment is a mapping of variables to values. Any expression that involves some number of unbound variables will be meaningless without an environment to translate those variables into values. In this problem, we investigate the possibility of using data structures to make this process of looking up variables in the environment more efficient.

The default implementation of environment variables uses lists to store these values. To add a new variable binding to the environment, we simply add it with the desired value to the front of the list. To look up a variable, we search along the list until we either find the variable and get its value or fail to find it (causing an error).

There is a top-level environment which consists first of all builtin functions and also of any val bindings made at the top level. A sequence of variable bindings and expressions at the top level like

val x = 2
val y = x * x
val z = y + x
z

is equivalent to the following. On the right, we show the environment as it is represented by our ListEnvironment.

                       []
let
  val x = 2           [("x",Int_v 2)]
in
  let
    val y = x * x     [("y",Int_v 4), ("x",Int_v 2)]
  in
    let
      val z = y + x   [("z",Int_v 6), ("y",Int_v 4), ("x",Int_v 2)]
    in
      z
    end               [("y",Int_v 4), ("x",Int_v 2)]
  end                 [("x",Int_v 2)]
end                   []

Each new variable binding is equivalent to entering a new let. Expressions have no effect on the Mini-ML environment. So when an expression like val x = 2 is evaluated at the Mini-ML prompt, first the value of the expression (2) is determined. Then the binding of "x" to 2 is made. It is in this modified environment that all future expressions and declarations are evaluated.

Here is a somewhat more complicated example involving scope.

let
  val x = 2
  val y = 3 + x
in
  let 
    val x = (let val y = 2 in y * y end)
  in
    x + y
  end
end

When multiple val declarations are made within the same let, this is equivalent to entering a new let for each new declaration. So the above is equivalent to the following. Again the right hand side shows our ListEnvironment.

                    []
let
  val x = 2         [("x", Int_v 2)]
in
  let
    val y = 3 + x   [("y", Int_v 5),("x", Int_v 2)]
  in
    let
      val x =
        let
          val y = 2 [("y", Int_v 2),("y", Int_v 5),("x", Int_v 2)]
        in
          y * y
        end         [("y", Int_v 5),("x", Int_v 2)]
                    [("x", Int_v 4),("y", Int_v 5),("x", Int_v 2)]
    in
      x + y
    end             [("y", Int_v 5),("x", Int_v 2)]
  end               [("x", Int_v 2)]
end                 []

Note that looking up a variable must use the most recent binding of a value to that variable. The ListEnvironment works by adding new bindings to the front of the list, and returning the first binding that matches. So in the expression x + y above, the x that is referred to is the first x reading from left to right: that is, the most recently added x.

When a function expression is evaluated, it is necessary to generate a closure for the body of that function. Briefly, suppose you have something like

val f = let val y = 3 in fn (x) => y end
Later, you evaluate
let val y = 4 in f 2 end
Now, you want y in the body of the function that is bound here to f to refer to the y that existed when f was defined, not to the y that has replaced it. f is defined, we save a closure, which consists of a copy of the environment together with the body expression of the function, inside every function value. For technical reasons relating to SML's structure system, the closure does not actually contain an environment but rather an exception that wraps around an environment.

The reasons behind this involve mutual recursion among structures. Basically, you can have

datatype Foo = Blank | F of Bar
and datatype Bar = B of Foo

within the same structure. That is what we want to do with environments and values in Mini-ML: An environment should be able to contain function values, and a function value should be able to contain an environment. Unfortunately, types cannot be mutually recursive if they are defined in distinct structures. The workaround we use involves a peculiar property of exceptions. We declare a value as

  datatype value =
    Int_v      of int
  | Bool_v     of bool
...
  | Fn_v       of (string option     (* function name        *)
                   * exn             (* environment          *)
                   * string list     (* names of arguments   *)
                   * exp)            (* body                 *)

In the evaluator functor, we declare

  exception ClosureEnv of env

and it is this exception that we store in a Fn_v. So when a function expression is evaluated, we take the current environment, wrap it in a ClosureEnv, and store it in the Fn_v that we produce.

(If you are interested in why we use an exception here, the reason is that the exn type is the only type in SML that is defined at one place but can be extended at another place. This is admittedly a hack, and there are better ways of handling closures, with techniques we will describe later in the course.)

When a function is applied to its arguments, the evaluator processes this first by evaluating the function expression and its argument expression. Then, a new environment is created by adding the names and values of the arguments to the environment inside the function closure. Finally, the expression representing the body of the function is evaluated in this environment.

The implementation of environments using lists has some advantages; it is very simple and can be fast when the number of variables is small. Also, it handles shadowing well. For instance, the expression

let val x = 2 in (let val x = 3 in x end) end
should evaluate to 3 rather than 2, as "inner" bindings override "outer" ones. The list implementation simply puts newer bindings on the front of the list and searches starting at the beginning. It will use the first matching string it finds to determine a value, so this process is handled fairly easily.

For this problem, you will implement RedBlackTreeEnvironment (in the file rbtreeEnvironment.sml), which uses red-black trees to handle lookup. The implementation of environment lookup is changeable simply by changing the various functors to use something other than ListEnvironment, so to test your environment you need to run

structure I = InterpreterFn(RedBlackTreeEnvironment)
I.run()

The most important part of this problem is that the RedBlackTreeEnvironment behaves exactly as the ListEnvironment does with respect to such things as scope and variable shadowing. Only its running time should differ from that of ListEnvironment.

You may adapt the red-black tree code we have given you, but note that it currently provides functionality for a set of integers, when you need to implement a mapping from strings to values. Take special care that shadowed variables are handled correctly.

You must implement all the functions in environment.sig. The function lookupBinding looks up the specified string in the specified environment and returns NONE if it is not found, or SOME v where v is the appropriate value. The function insertBinding inserts the specified value bound to the specified string in the specified environment. It returns the new environment. The topLevel environment is an environment containing the basic primitives. These should be the same as the ones defined in listEnvironment.sml.

The toString function deserves special attention. You need to implement this function so we can check that your implementation of red-black trees works correctly. The function should behave the following way:

For example (using R and B to represent red and black nodes):

B => bEE

  B
 / \  => brEErEE
R   R

  B
 / \
B   R   => bbEErbEEbEE
   / \
  B   B

This is not a difficult function to write, however, it is crucial that you get it right.

Timing

We have included support in the interpreter (in the colon functions) for timing how long an evaluation takes. Time 10,000 evaluations of the following (admittedly contrived) expression:

[allSet,arguments,alrm,andb,Array,anySet,app,array,abrt,Abs]

Time this using ListEnvironment and using your RBTreeEnvironment. Put the two times in a comment at the top of RBTreeEnvironment.