Problem Set 4: A Scheme Interpreter

Part 1: Logic

Overview

We have provided a proof assistant, NatDed, that implements natural deduction. You are encouraged to use it to solve the following problems, although proofs by hand are also acceptable. However, all submissions should be formatted in LaTeX and submitted in .pdf form. NatDed can automatically output the finished proof in a LaTeX file for you, from which you can create a .pdf file for submission.

Questions

Note: In some browsers, the symbol for logical "and" may be rendered incorrectly as a small circle. It should look like an inverted ∨.
1. Propositional Logic

Prove the following statements using the inference rules provided in Lecture 13. Prove them from scratch, don't use any of the tautologies as lemmas.

1. (A ⇒ B) ⇒ ¬(A ∧ ¬B)
2. ((P ⇒ R) ∨ (Q ⇒ R)) ⇒ (P ∧ Q ⇒ R)
3. ¬(A ∨ B) ⇔ ¬(¬A ⇒ B)
2. Predicate Logic

Prove the following statements using the inference rules for propositional and predicate logic in Lectures 13 and 14. Note that P, Q are names of predicates.

1. (∃x. P(x)) ⇒ (¬∀x.¬P(x))
2. ∀x.(P(x) ⇒ P(f(x))) ⇒ ∀x.(P(x) ⇒ P(f(f(x))))
3. ∃x. ∀y. P(x,y) ⇒ ∀y. ∃x. P(x,y)

What to Submit

Submit a file logic.tex or logic.pdf containing your solutions. If you find this fun, include proofs of other interesting theorems for extra karma.

Part 2: A Scheme Interpreter

Overview

For this problem, you will implement an interpreter for simplified version of the Scheme language. Scheme is a functional language whose semantics is very much like OCaml. Evaluation follows the environment model presented in class very closely.

We have provided a lexical analyzer and parser that can take a string input, parse it as a Scheme program, and output an abstract syntax tree (AST) representing the code. The input program is converted to a tree of type Ast.expr, defined in the `Ast` module. Unlike OCaml, there is not a lot of static typechecking done, so you will find the syntax of Scheme quite a bit less restrictive than OCaml (not necessarily a good thing).

Your job will be to write functions for evaluating the AST that is returned from the parser. You will also have to check for various runtime type errors (such as applying an if-then-else to a non-Booolean), and to raise a runtime type error if the types of the expressions do not make sense in a given context.

A few Scheme programs are provided in the file `etc/input.txt`. Here is a sample run with our solution code.

```? (load "etc/input.txt")

>>
>>
>>
>>
>>
? (fact 4)

>> 24
? (length (list 0 0 0))

>> 3
? (rev (list 1 2 3 4))

>> (4 3 2 1)
? (cons 1 2)

>> (1 . 2)
? (fib 10)

>> (55 34 21 13 8 5 3 2 1 1 0)
? ((lambda (x y) (- x y)) 3110 2110)

>> 1000
?
```

The Scheme Language

Values

The language has six types of values:
1. Integers, which correspond to the OCaml type int
2. Strings, which correspond to the OCaml type string
3. Bools, which correspond to the OCaml type bool
4. Closures, which represent functions and their environments
5. `cons` pairs
6. `nil`.
There is also a seventh type `Undef` that is used internally for creating recursive definitions and is not available to the Scheme programmer. The following code, taken from heap.ml, declares the type `value`. When an expression is evaluated successfully, the resulting value is an entity of this type.

```type value = Int of int | Str of string | Bool of bool
| Closure of expr * env
| Cons of value * value | Nil
| Undef
```

Expressions

The lexical analyzer and parser convert an input program formatted in plain text to an AST, an entity of type Ast.expr, which can then be evaluated. The parser recognizes the following patterns:
• Integers (Ast.Int_e of int) consist of sequences of one or more numeric characters. In our version of Scheme, this excludes negative integers, so that 42 and 007 are considered valid integers, whereas -1 is not
• .
• Strings (Ast.Str_e of string) consist of sequences of zero or more characters enclosed in double quotes, e.g. "Hello". Certain strings can be escaped by the backslash character:
CharacterEscaped
\\\
"\"
<newline>\n
<tab>\t
• Bools (Ast.Bool_e of bool) are specified by the strings #t for true or #f for false
• Identifiers (ids) (Ast.Id_e of Ast.id) consist of a single lower- or upper-case letter followed by zero or more lower- or upper-case letters, numeric digits, underscores, and single-quote characters.
In the following structures, the parentheses must be present:
• (lambda x e) represents a function that takes a single value v as its input and evaluates e with x bound to v. This is represented as the type
`Ast.Fun_e of Ast.id list * Ast.expr`
• (lambda (x1 x2 x3 ... xn) e) represents a function that takes n values v1 ... vn as its input and evaluates e with xi bound to vi for all i. This is represented as the type
`Ast.Fun_e of Ast.id list * Ast.expr`
• (define x e) evaluates the expression e and binds it to the id x at the top level. It is represented in the AST as
`Ast.Def_e of Ast.id list * Ast.expr`
There is also a definerec version for creating recursive functions.
• (define (f x1 ... xn) e) defines a function that takes n values (v1 ... vn) as its input and evaluates the expression e. The evaluation function should return a closure containing the AST representation of the function and the current environment. This is then bound to f at the top level. It is represented in the AST as
`Ast.Def_e of Ast.id list * Ast.expr`
There is also a definerec version for creating recursive functions.
• (if e0 e1 e2) first evaluates e0. If the result is not a boolean, then the interpreter should raise a runtime exception. If the result is true, it should evaluate and return the result of e1. If the result is false, it should evaluate and return the result of e2. It is represented in the AST as
`Ast.If_e of Ast.expr * Ast.expr * Ast.expr`
.
• (BO e1 e2) represents one of the following binary operations:
BOAST NameMeaning
-Ast.MinusInteger subtraction
*Ast.Mul Integer multiplication
/Ast.DivInteger division
%Ast.ModInteger modulus
=Ast.EqComparison (equality)
!=Ast.Neq Comparison (inequality)
<Ast.Lt Comparison (less than)
<=Ast.Leq Comparison (less than or equal to)
>Ast.Gt Comparison (greater than)
>=Ast.Geq Comparison (greater than or equal to)
&Ast.AndLogical AND
|Ast.OrLogical OR
A binary operation is represented by the following AST type:
`Ast.Binop_e of op * expr * expr`
• (UO e1) represents one of the following unary operations:
UOAST NameMeaning
~Ast.NotLogical complement
carAst.Car Take the first element of a cons pair
cdrAst.Cdr Take the second element of a cons pair
nullAst.Null Return true if the argument is nil, otherwise return false
A unary operation is represented by the following AST type:
`Ast.Unop_e of op * expr`
The minus sign is the only operator that can be used as both a unary and a binary operator.

Lists

The binary operator cons is used to create a pair. The expression `(cons e1 e2)`, when evaluated, should recursively evaluate `e1` and `e2`, then create a `value` of the form `Cons (v1, v2)` from the resulting values. There are corresponding projections `car` and `cdr` that extract the first and second components of a pair, respectively; thus

```(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2
```

The names `car` and `cdr` are historical; they stand for contents of the address register and contents of the decrement register, respectively, and refer to hardware components of the IBM 704 computer on which LISP was originally implemented in the late 1950s.

The special entity `nil` is used with `cons` to form lists. Lists in Scheme are sequences of objects combined using `cons` and terminated by `nil`. For example, the list

```(cons 1 (cons 2 (cons 3 nil)))
```
is the Scheme equivalent of the OCaml list `[1;2;3]`. The object `nil` serves as the empty list, and is analogous to the OCaml `[]`. The AST representation of `nil` is Ast.Nil_e.

There is a shorthand way to create lists in Scheme. Instead of typing

```(cons 1 (cons 2 (cons 3 nil)))
```
one can type
```(list 1 2 3)
```
and the resulting value is the same list. The output form of a list, i.e. the string that should be printed by your interpreter when you type either of the above expressions at the prompt, is `(1 2 3)`.

The display form of the value `(cons 1 2)` is `(1 . 2)`. Thus if you type `(cons 1 2)` at your Scheme interpreter, it should respond with `(1 . 2)`. Note, however, that if the second component is `nil` instead of 2, it should respond with `(1)`, because in that case it is a list of one element 1.

More generally, if you have any sequence of objects combined with `cons`, but not terminated with `nil`, then the output form puts a dot before the last element. Thus

```(cons 1 (cons 2 3)) = (1 2 . 3)
(cons 1 (cons 2 nil)) = (1 2)
```
and
```(cons (cons 1 2) (cons 3 4)) = ((1 . 2) 3 . 4)
(cons (cons 1 2) (cons 3 nil)) = ((1 . 2) 3)
```

Semantic Overview

Operations in Scheme are all represented in prefix notation and must be enclosed in parentheses. For instance, to add 5 and 7, we would type `(+ 5 7)`. There is no notion of precedence and all compound expressions must be enclosed in parentheses. For instance, the OCaml expression `5 + (2 * 7)` would be written (+ (5 (* 2 7))) in Scheme.

Comparisons may be performed on Int, String, and Bool types, while arithmetic operators are valid only on Ints, and boolean operations are valid only on Bools.

Variables can be bound to values in several ways. Top-level global variables are bound using the `define` keyword. Local variables are bound using the `let` keyword. The expression (let x 5 (* 2 x)) is equivalent to the OCaml `let x = 5 in 2*x`.

Finally, anonymous functions are defined using the `lambda` keyword. These can be bound to variables using `define` or let. For example, (lambda x (* 2 x)) in Scheme is equivalent to `fun x -> 2*x` in OCaml, and (let f (lambda x (* 2 x)) (f 4)) in Scheme is equivalent to `let f = fun x -> 2*x in (f 4)` in OCaml.

Furthermore, anonymous functions of several variables can also be defined with `lambda`. For example,
`(let f (lambda (x y z) (+ (+ x y) z)) (f 1 2 3))`
is equivalent to the OCaml
`let f = fun x y z -> x + y + z in f 1 2 3`.
If we would like to use `define` to bind functions of one or more variables at the top level, we can write
` (define (f x y) (+ x y))`
or
` (define f (lambda (x y) (+ x y))) `.

If the function is recursive, use `letrec` and `definerec` instead. These behave identically to `let` and `define`, respectively, except that before the second argument is evaluated, the function name is included in the environment bound to a dummy value (`Undef`); then after the second argument is evaluated, the binding is updated to that value. This means that the function name may occur in the function body so that the function may call itself recursively.

An `if` expression evaluates its first argument, checks that it is a boolean, and if so, evaluates the second or third argument according as the value of the first argument is `true` or `false`, respectively. The values of second and third arguments need not be of the same type.

When applied to a `cons` pair, `car` returns the first component, while `cdr` returns the second. You can test whether a list is empty using the `null` operator.

The `load` operator can be used to load and interpret the contents of a file. The operators `load`, `define`, and `definerec` may be used only at the top level.

All code you submit must adhere to the specifications defined in the respective `mli` files. You must implement functions in the following files:
1. eval.ml This module is responsible for evaluating Scheme code that has been converted to an AST. Complete the function `eval`. This should take in an `Ast.expr` and a `Heap.env` and return a `Heap.value`.
2. heap.ml This module contains data types for scheme values, as well as types for keeping track of variable bindings. You must complete all functions outlined in `heap.mli`. This will include a function `value_to_string`. For `Str` values, simply output the appropriate string, and for `Int` and `Bool` values simply use OCaml's string casting methods. Closures should output as and `nil` as `()`.
Cons cells and lists will need to be handled more carefully. The convention in Scheme is to output `cons` values in the form `(x . y)` for a single pair of values, or for nested Cons of the form `(Cons x (Cons y (Cons z t)))` output the value as `(x y z . t)`, placing a dot only before the final element. Note that if the Cons is nested differently, say as `(Cons (Cons x y) Cons(z t))`, then we would represent this as `((x . y) z . t)`, since the first element of the Cons is now itself the value `(Cons x y)` rather than simply `x`. Finally, lists are represented as `(x y z t)`, without any dot. This format is used only for displaying values in the output, and will not actually work as valid Scheme expressions.
3. repl.ml This module contains functions for parsing input code and starting evaluation. First, complete the function `eval_one`. This should handle any top level operations (declarations and loads) that can't exist elsewhere in a program. Any declarations made in a loaded file should be included in the scope of the remainder of the calling program. Next, implement the function `read_file`, which should be able to read code from a specified input file.
You may add function definitions to any `mli` files, however you should be sure not to change any existing defintions.

We have provided a `Makefile` that you can use to build your project. This will output a file `interpreter.exe` which you can execute to launch the command line interface.

What to submit

Submit a zip file `ps4.zip` containing the following files: `eval.ml`, `repl.ml`, `heap.ml`, `ast.ml`, `util.ml`, `eval.mli`, `repl.mli`, `heap.mli`, `ast.mli`, and `util.mli`.

Part 3: Induction

Consider the following implementation of strand sort. The specification for the function `strandSort` itself is provided; specifications for the helper functions `merge`, `getSubLists`, and `strandHelper` are not.

```let rec getSubLists l1 l2 =
match l1, l2 with
| h1 :: t1, [] -> getSubLists t1 [h1]
| h1 :: t1, h2 :: _ ->
if h1 > h2 then getSubLists t1 (h1 :: l2)
else let (r1, r2) = getSubLists t1 l2 in (h1 :: r1, r2)
| [], _ -> (l1, l2)

let rec merge l1 l2 : 'a list =
match l1, l2 with
| [], _ -> l2
| _, [] -> l1
| h1 :: t1, h2 :: t2 ->
if h1 < h2 then h1 :: merge t1 l2
else h2 :: merge l1 t2

let rec strandHelper l1 l2 =
if l1 = [] then l2 else
let (us, sl) = getSubLists l1 [] in
let ns = merge (List.rev sl) l2 in
strandHelper us ns

(* Requires: true
Returns: a list l' containing the same elements as l
and l' is sorted in ascending order *)
let strandSort l = strandHelper l []
```

1. Write specifications for `strandHelper`, `merge`, and `getSubLists`. Write your specifications in English, but try to be as precise as possible. You should strive to write specifications that are not just correct, but also reasonably complete, meaning that they do not exclude legitimate inputs.
2. Using the specification of `strandSort` and your specifications for `strandHelper`, `merge`, and `getSubLists`, prove the correctness of all the first three functions by induction, and then from this show that `strandSort` is correct.

What to submit

A file induction.txt or induction.pdf containing your proof.

Karma: Fun with Scheme!

Once you have completed your Scheme interpreter, why not write some useful code to test it out? Complete the following functions to gain valuable practice in coding in Scheme (oh, and also you may get some karma).
1. map and fold: Wish you had your favorite list traversal functions available in Scheme? Why not code them yourself? Write functions `fold_left`, `fold_right`, and `map` that behave the same as their OCaml counterparts.
2. flatten: As in OCaml, `flatten` should take in a list of lists and return a single list that is the concatenation of all the input lists. The ordering of the elements should be preserved.
3. quicksort: Implement `quicksort` such that `(quicksort l)` will return a list with the same elements as `l`, but sorted in ascending order.

What to submit

A file `karma.txt` containing your implementations, along with some sample runs of your interpreter on them.