Problem Set 4: A Scheme Interpreter

Due Thursday 3/17/11 11:59pm

Part 1: Logic

Overview

We have provided a proof assistant, NatDed, that implements natural deduction. You are encouraged to use it to solve the following problems, although proofs by hand are also acceptable. However, all submissions should be formatted in LaTeX and submitted in .pdf form. NatDed can automatically output the finished proof in a LaTeX file for you, from which you can create a .pdf file for submission.

[NatDed tutorial] [Natded download]

Questions

Note: In some browsers, the symbol for logical "and" may be rendered incorrectly as a small circle. It should look like an inverted ∨.

Propositional Logic
Prove the following statements using the inference rules provided in Lecture 13. Prove them from scratch, don't use any of the tautologies as lemmas.
1. (A ⇒ B) ⇒ ¬(A ∧ ¬B)
2. ((P ⇒ R) ∨ (Q ⇒ R)) ⇒ (P ∧ Q ⇒ R)
3. ¬(A ∨ B) ⇔ ¬(¬A ⇒ B)
Predicate Logic
Prove the following statements using the inference rules for propositional and predicate logic in Lectures 13 and 14. Note that P, Q are names of predicates.
1. (∃x. P(x)) ⇒ (¬∀x.¬P(x))
2. ∀x.(P(x) ⇒ P(f(x))) ⇒ ∀x.(P(x) ⇒ P(f(f(x))))
3. ∃x. ∀y. P(x,y) ⇒ ∀y. ∃x. P(x,y)

What to Submit

Submit a file logic.tex or logic.pdf containing your solutions. If you find this fun, include proofs of other interesting theorems for extra karma.

Part 2: A Scheme Interpreter

Overview

For this problem, you will implement an interpreter for simplified version of the Scheme language. Scheme is a functional language whose semantics is very much like OCaml. Evaluation follows the environment model presented in class very closely.

We have provided a lexical analyzer and parser that can take a string input, parse it as a Scheme program, and output an abstract syntax tree (AST) representing the code. The input program is converted to a tree of type Ast.expr, defined in the Ast module. Unlike OCaml, there is not a lot of static typechecking done, so you will find the syntax of Scheme quite a bit less restrictive than OCaml (not necessarily a good thing).

Your job will be to write functions for evaluating the AST that is returned from the parser. You will also have to check for various runtime type errors (such as applying an if-then-else to a non-Booolean), and to raise a runtime type error if the types of the expressions do not make sense in a given context.

A few Scheme programs are provided in the file etc/input.txt. Here is a sample run with our solution code.

? (load "etc/input.txt")

>> 
>> 
>> 
>> 
>> 
? (fact 4)

>> 24
? (length (list 0 0 0))

>> 3
? (rev (list 1 2 3 4))

>> (4 3 2 1)
? (cons 1 2)

>> (1 . 2)
? (fib 10)

>> (55 34 21 13 8 5 3 2 1 1 0)
? ((lambda (x y) (- x y)) 3110 2110)

>> 1000
?

The Scheme Language

Values

The language has six types of values:

Integers, which correspond to the OCaml type int
Strings, which correspond to the OCaml type string
Bools, which correspond to the OCaml type bool
Closures, which represent functions and their environments
cons pairs
nil.

There is also a seventh type Undef that is used internally for creating recursive definitions and is not available to the Scheme programmer. The following code, taken from heap.ml, declares the type value. When an expression is evaluated successfully, the resulting value is an entity of this type.

type value = Int of int | Str of string | Bool of bool
           | Closure of expr * env
           | Cons of value * value | Nil
           | Undef

Expressions

The lexical analyzer and parser convert an input program formatted in plain text to an AST, an entity of type Ast.expr, which can then be evaluated. The parser recognizes the following patterns:

Integers (Ast.Int_e of int) consist of sequences of one or more numeric characters. In our version of Scheme, this excludes negative integers, so that 42 and 007 are considered valid integers, whereas -1 is not
Strings (Ast.Str_e of string) consist of sequences of zero or more characters enclosed in double quotes, e.g. "Hello". Certain strings can be escaped by the backslash character:

Character Escaped

\ \\

" \"

<newline> \n

<tab> \t
Bools (Ast.Bool_e of bool) are specified by the strings #t for true or #f for false
Identifiers (ids) (Ast.Id_e of Ast.id) consist of a single lower- or upper-case letter followed by zero or more lower- or upper-case letters, numeric digits, underscores, and single-quote characters.

Character	Escaped
\	\\
"	\"
<newline>	\n
<tab>	\t

In the following structures, the parentheses must be present:

(lambda x e) represents a function that takes a single value v as its input and evaluates e with x bound to v. This is represented as the type
```
Ast.Fun_e of Ast.id list * Ast.expr
```
(lambda (x1 x2 x3 ... xn) e) represents a function that takes n values v1 ... vn as its input and evaluates e with xi bound to vi for all i. This is represented as the type
```
Ast.Fun_e of Ast.id list * Ast.expr
```
(define x e) evaluates the expression e and binds it to the id x at the top level. It is represented in the AST as
```
Ast.Def_e of Ast.id list * Ast.expr
```
There is also a definerec version for creating recursive functions.
(define (f x1 ... xn) e) defines a function that takes n values (v1 ... vn) as its input and evaluates the expression e. The evaluation function should return a closure containing the AST representation of the function and the current environment. This is then bound to f at the top level. It is represented in the AST as
```
Ast.Def_e of Ast.id list * Ast.expr
```
There is also a definerec version for creating recursive functions.
(if e0 e1 e2) first evaluates e0. If the result is not a boolean, then the interpreter should raise a runtime exception. If the result is true, it should evaluate and return the result of e1. If the result is false, it should evaluate and return the result of e2. It is represented in the AST as
```
Ast.If_e of Ast.expr * Ast.expr * Ast.expr
```
.

(BO e1 e2) represents one of the following binary operations:

BO	AST Name	Meaning
+	`Ast.Plus`	Integer addition
-	`Ast.Minus`	Integer subtraction
*	`Ast.Mul`	Integer multiplication
/	`Ast.Div`	Integer division
%	`Ast.Mod`	Integer modulus
=	`Ast.Eq`	Comparison (equality)
!=	`Ast.Neq`	Comparison (inequality)
<	`Ast.Lt`	Comparison (less than)
<=	`Ast.Leq`	Comparison (less than or equal to)
>	`Ast.Gt`	Comparison (greater than)
>=	`Ast.Geq`	Comparison (greater than or equal to)
&	`Ast.And`	Logical AND
\|	`Ast.Or`	Logical OR

A binary operation is represented by the following AST type:

Ast.Binop_e of op * expr * expr

(UO e1) represents one of the following unary operations:

UO	AST Name	Meaning
~	`Ast.Not`	Logical complement
car	`Ast.Car`	Take the first element of a `cons` pair
cdr	`Ast.Cdr`	Take the second element of a `cons` pair
null	`Ast.Null`	Return true if the argument is `nil`, otherwise return false
load	`Ast.Load`	Read a file (specified by a string-valued argument) from disk

A unary operation is represented by the following AST type:

Ast.Unop_e of op * expr

The minus sign is the only operator that can be used as both a unary and a binary operator.

Lists

The binary operator cons is used to create a pair. The expression (cons e1 e2), when evaluated, should recursively evaluate e1 and e2, then create a value of the form Cons (v1, v2) from the resulting values. There are corresponding projections car and cdr that extract the first and second components of a pair, respectively; thus

(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2

The names car and cdr are historical; they stand for contents of the address register and contents of the decrement register, respectively, and refer to hardware components of the IBM 704 computer on which LISP was originally implemented in the late 1950s.

The special entity nil is used with cons to form lists. Lists in Scheme are sequences of objects combined using cons and terminated by nil. For example, the list

(cons 1 (cons 2 (cons 3 nil)))

is the Scheme equivalent of the OCaml list [1;2;3]. The object nil serves as the empty list, and is analogous to the OCaml []. The AST representation of nil is Ast.Nil_e.

There is a shorthand way to create lists in Scheme. Instead of typing

(cons 1 (cons 2 (cons 3 nil)))

one can type

(list 1 2 3)

and the resulting value is the same list. The output form of a list, i.e. the string that should be printed by your interpreter when you type either of the above expressions at the prompt, is (1 2 3).

The display form of the value (cons 1 2) is (1 . 2). Thus if you type (cons 1 2) at your Scheme interpreter, it should respond with (1 . 2). Note, however, that if the second component is nil instead of 2, it should respond with (1), because in that case it is a list of one element 1.

More generally, if you have any sequence of objects combined with cons, but not terminated with nil, then the output form puts a dot before the last element. Thus

(cons 1 (cons 2 3)) = (1 2 . 3)
(cons 1 (cons 2 nil)) = (1 2)

and

(cons (cons 1 2) (cons 3 4)) = ((1 . 2) 3 . 4)
(cons (cons 1 2) (cons 3 nil)) = ((1 . 2) 3)

Semantic Overview

Operations in Scheme are all represented in prefix notation and must be enclosed in parentheses. For instance, to add 5 and 7, we would type (+ 5 7). There is no notion of precedence and all compound expressions must be enclosed in parentheses. For instance, the OCaml expression 5 + (2 * 7) would be written (+ (5 (* 2 7))) in Scheme.

Comparisons may be performed on Int, String, and Bool types, while arithmetic operators are valid only on Ints, and boolean operations are valid only on Bools.

Variables can be bound to values in several ways. Top-level global variables are bound using the define keyword. Local variables are bound using the let keyword. The expression (let x 5 (* 2 x)) is equivalent to the OCaml let x = 5 in 2*x.

Finally, anonymous functions are defined using the lambda keyword. These can be bound to variables using define or let. For example, (lambda x (* 2 x)) in Scheme is equivalent to fun x -> 2*x in OCaml, and (let f (lambda x (* 2 x)) (f 4)) in Scheme is equivalent to let f = fun x -> 2*x in (f 4) in OCaml.

Furthermore, anonymous functions of several variables can also be defined with lambda. For example,
(let f (lambda (x y z) (+ (+ x y) z)) (f 1 2 3))
is equivalent to the OCaml
let f = fun x y z -> x + y + z in f 1 2 3.
If we would like to use define to bind functions of one or more variables at the top level, we can write
(define (f x y) (+ x y))
or
(define f (lambda (x y) (+ x y))) .

If the function is recursive, use letrec and definerec instead. These behave identically to let and define, respectively, except that before the second argument is evaluated, the function name is included in the environment bound to a dummy value (Undef); then after the second argument is evaluated, the binding is updated to that value. This means that the function name may occur in the function body so that the function may call itself recursively.

An if expression evaluates its first argument, checks that it is a boolean, and if so, evaluates the second or third argument according as the value of the first argument is true or false, respectively. The values of second and third arguments need not be of the same type.

When applied to a cons pair, car returns the first component, while cdr returns the second. You can test whether a list is empty using the null operator.

The load operator can be used to load and interpret the contents of a file. The operators load, define, and definerec may be used only at the top level.

Your Tasks

All code you submit must adhere to the specifications defined in the respective mli files. You must implement functions in the following files:

eval.ml This module is responsible for evaluating Scheme code that has been converted to an AST. Complete the function eval. This should take in an Ast.expr and a Heap.env and return a Heap.value.
heap.ml This module contains data types for scheme values, as well as types for keeping track of variable bindings. You must complete all functions outlined in heap.mli. This will include a function value_to_string. For Str values, simply output the appropriate string, and for Int and Bool values simply use OCaml's string casting methods. Closures should output as and nil as ().
Cons cells and lists will need to be handled more carefully. The convention in Scheme is to output cons values in the form (x . y) for a single pair of values, or for nested Cons of the form (Cons x (Cons y (Cons z t))) output the value as (x y z . t), placing a dot only before the final element. Note that if the Cons is nested differently, say as (Cons (Cons x y) Cons(z t)), then we would represent this as ((x . y) z . t), since the first element of the Cons is now itself the value (Cons x y) rather than simply x. Finally, lists are represented as (x y z t), without any dot. This format is used only for displaying values in the output, and will not actually work as valid Scheme expressions.
repl.ml This module contains functions for parsing input code and starting evaluation. First, complete the function eval_one. This should handle any top level operations (declarations and loads) that can't exist elsewhere in a program. Any declarations made in a loaded file should be included in the scope of the remainder of the calling program. Next, implement the function read_file, which should be able to read code from a specified input file.

You may add function definitions to any mli files, however you should be sure not to change any existing defintions.

Building Your Code

We have provided a Makefile that you can use to build your project. This will output a file interpreter.exe which you can execute to launch the command line interface.

What to submit

Submit a zip file ps4.zip containing the following files: eval.ml, repl.ml, heap.ml, ast.ml, util.ml, eval.mli, repl.mli, heap.mli, ast.mli, and util.mli.

Part 3: Induction

Consider the following implementation of strand sort. The specification for the function strandSort itself is provided; specifications for the helper functions merge, getSubLists, and strandHelper are not.

let rec getSubLists l1 l2 = 
  match l1, l2 with
  | h1 :: t1, [] -> getSubLists t1 [h1]
  | h1 :: t1, h2 :: _ ->
      if h1 > h2 then getSubLists t1 (h1 :: l2)
      else let (r1, r2) = getSubLists t1 l2 in (h1 :: r1, r2)
  | [], _ -> (l1, l2)
  
let rec merge l1 l2 : 'a list =
  match l1, l2 with
  | [], _ -> l2
  | _, [] -> l1
  | h1 :: t1, h2 :: t2 ->
      if h1 < h2 then h1 :: merge t1 l2
      else h2 :: merge l1 t2

let rec strandHelper l1 l2 =
  if l1 = [] then l2 else
  let (us, sl) = getSubLists l1 [] in
  let ns = merge (List.rev sl) l2 in
  strandHelper us ns

(* Requires: true
   Returns: a list l' containing the same elements as l
   and l' is sorted in ascending order *)
let strandSort l = strandHelper l []

Your tasks

Write specifications for strandHelper, merge, and getSubLists. Write your specifications in English, but try to be as precise as possible. You should strive to write specifications that are not just correct, but also reasonably complete, meaning that they do not exclude legitimate inputs.
Using the specification of strandSort and your specifications for strandHelper, merge, and getSubLists, prove the correctness of all the first three functions by induction, and then from this show that strandSort is correct.

What to submit

A file induction.txt or induction.pdf containing your proof.

Karma: Fun with Scheme!

Once you have completed your Scheme interpreter, why not write some useful code to test it out? Complete the following functions to gain valuable practice in coding in Scheme (oh, and also you may get some karma).

map and fold: Wish you had your favorite list traversal functions available in Scheme? Why not code them yourself? Write functions fold_left, fold_right, and map that behave the same as their OCaml counterparts.
flatten: As in OCaml, flatten should take in a list of lists and return a single list that is the concatenation of all the input lists. The ordering of the elements should be preserved.
quicksort: Implement quicksort such that (quicksort l) will return a list with the same elements as l, but sorted in ascending order.

What to submit

A file karma.txt containing your implementations, along with some sample runs of your interpreter on them.