Problem Set 4: A Scheme Interpreter

Due: October 20, 2011

Version: 1
Last Modified: October 6, 23:59
Changes:

Version 1 is the original release.

Part 1: An Interpreter for "Scheme311" (75 points)

Overview

For this problem set, you will be implementing an interpreter for a subset of the Scheme programming language. Scheme is a dialect of Lisp, a language centered around the list data structure. Notably, programs are written as lists of symbols and can be manipulated and constructed just as with any list. Hence, evaluation is a function on lists, as you will see as you fill out your Scheme implementation. Evaluation follows the environment model presented in class very closely.

We have provided for you a lexical analyzer and parser that reads strings into the lists suitable for processing by the evaluator. The input programs are converted into data of type Types.lisp_expr, defined in the Types module. Unlike OCaml, Scheme does not provide any type checking or nontrivial syntactic analysis until evaluation, so you will find the syntax and typing of Scheme a bit less restrictive than OCaml's (not necessarily a good thing).

Your job will be to write functions that evaluate the lists generated by the parser. Additionally, you will have to check for runtime errors, such as malformed expressions and type errors (such as adding a string to an integer). We have provided an exception instance and several functions for raising exceptions in the Types module.

You can build your interpreter by invoking the make command in the solution directory, which should work out of the box on Mac OSX and Linux platforms. On Windows, either install GNU make from GnuWin32 or a shell like MinGW or Cygwin. MinGW is probably the easiest way to get up and running. If you wish to remove the compiled files, invoke the command make clean.

We have also included several Scheme programs in the file etc/examples.scm to test your implementation. Here is a sample run with our solution code.

$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/examples.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
scheme311> (map square (list 1 2 3 4 5))
=> (1 4 9 16 25)
scheme311> (null? (list "I'm" "not" "empty"))
=> #f
scheme311> (definerec (neverender x) (neverender x))
=> <fun>
scheme311> (even? (fib 12))
=> #t
scheme311> (definerec (make n x) (if (<= n 0) '() (cons x (make (- n 1) x))))
=> <fun>
scheme311> (make 10 'speed)
=> (speed speed speed speed speed speed speed speed speed speed)
scheme311> :quit

Our Dialect of Scheme

Values

Our language has four types of primitives:

Integer, which correspond to the OCaml type int,
Float, which correspond to the OCaml type float,
String, which correspond to the OCaml type string, and
Bool, which correspond to the OCaml type bool.

From this primitives, we construct five different types of expressions.

Atoms, which represent symbols. Symbols do not have an equivalent in OCaml, but you may think of them as named atomic values. Symbols support testing for equality using the keyword "eq?" (described below). Before evaluation, symbols often correspond to keywords, operators, and variable names.
Lists, which represent, well, lists. Lists serve many purposes in Scheme. Code is written in the form of nested lists, and we may also use lists for more mundane purposes, such as storing ordered collections. Idiomatic Scheme often creates more complex data structures, such as trees, from lists. Much like in OCaml, lists are chained cons pairs. Just like [1;2;3] is equivalent to 1 :: 2 :: 3 :: [], the list (1 2 3) is the same as (cons 1 (cons 2 (cons 3 '()))).
Dotted lists, which represent lists whose last element is not the empty list. For example, (cons 1 2) yields the dotted list (1 . 2). These dotted lists are often used in a similar manner as tuples.
Primitives, as described above.
Closures, which represent functions and their environments.

There is also a sixth type Undef which is used internally for creating recursive bindings and is not available to the Scheme programmer. The following code, taken from types.ml, declares the type lisp_expr. Evaluation returns data of this type.

type lisp_expr =
    Atom of symbol
  | List of lisp_expr list
  | Dotted of (lisp_expr list * lisp_expr)
  | Primitive of primitive
  | Closure of lisp_expr * symbol * env
  | Undef

The Reader

The lexical analyzer and parser create an input program formatted in plain text into a lisp_expr ready for evaluation. The parser recognizes the following patterns.

Integers consist of an optional plus or minus sign followed by a sequence of one or more numeric characters. Thus, both negative and positive integers are allowed in our implementation.
Floating point numbers consist of an optional plus or minus sign followed by one or more numeric characters followed by a dot and then an optional sequence of numeric characters. Note that a number before the point is required: -.1 is a syntax error.
Strings consist of sequences of zero or more characters enclosed in double quotes, e.g. "Hello". Certain characters, such as double quotes, backslashes, newlines, and tabs can be escaped with a backslash.
Booleans are specified by the strings #t for true and #f for false.
Atoms are more or less sequences of arbitrary characters. We do not allow atoms to start with a number or an underscore. As you will see, variables whose name begins with an underscore are reserved for the interpreter. If you would like to learn the nuances of what an atom is, look at the regular expression atom in the lexer.mll file.
A list begins with a left parenthesis and ends with a right one. Elements are separated with spaces. The empty list is designated simply as ().
A dotted list is a list with a dot between its second to last and last element. For example, ("this" "is" "a" "dotted" . "list") is a dotted list.
Prefixing an expression with a single quote ' quotes that expression. A quoted expression is not evaluated; e.g., '(+ 1 2 3) evaluates to (+ 1 2 3). We will discuss this more later.
Comments begin with ; and end with a semicolon as well.

The List Data Structure

The keyword cons is used to construct a list. Cons is directly analogous to :: in OCaml. There are two natural projections on this type, car and cdr, which return the first and second elements of a cons pair, respectively.

(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2

The names "car" and "cdr" are historical: They stand for "contents of the address register" and "contents of the decrement register," respectively, and refer to hardware components of the IBM 704 computer on which LISP was originally implemented in the late 1950s. Lists are built from the empty list () using successive cons pairs. Thus, lists in Scheme are sequences of objects combined using cons and terminated with the empty list (). For example, the list

(cons 1 (cons 2 (cons 3 ())))

is the Scheme equivalent of the OCaml list [1;2;3]. There are two shorthands for creating lists in Scheme. The first is the list keyword. Instead of the previous construction, we may have used

(list 1 2 3)

and the resulting value is the same list. Alternatively, we could have quoted the list:

'(1 2 3)

Upon evaluation, all three constructions should yield

(1 2 3)

Dotted lists are constructed via cons sequences not terminated with the empty list. For example, (cons 1 (cons 2 3)) evaluates to (1 2 . 3)

Semantic Overview

Operations in Scheme are all represented in prefix notation and must be enclosed in parentheses. For instance, to add 5 and 7, we would type (+ 5 7). There is no notion of precedence and all compound expressions must be enclosed in parentheses. For instance, the OCaml expression 5 + (2 * 7) would be written (+ (5 (* 2 7))) in Scheme.

Comparisons may be performed on all types, and the resulting behavior is governed by the behavior of OCaml's compare function. Arithmetic operations may be performed on both integers and floats. If they are mixed, the evaluator upgrades all integers involved to floats. Boolean operations such as and, or, and not only operate on boolean values.

Variables can be bound to values in several ways. Top-level global variables are bound using the define keyword. Local variables are bound using the let keyword. The expression (let ((x 5) (y 6)) (* x y)) is equivalent to the OCaml let x = 5 in let y = 6 in x * y.

Finally, anonymous functions are defined using the lambda keyword. These can be bound to variables using define or let. For example, (lambda (x) (* 2 x)) in Scheme is equivalent to fun x -> 2*x in OCaml. Scheme provides syntactic sugar for multiple variables as well: (lambda (x y) (+ x y)) is equivalent to (lambda (x) (lambda (y) (+ x y))).

If we would like to use define to bind functions of one or more variables at the top level, we can write
(define (f x y) (+ x y))
or
(define f (lambda (x y) (+ x y))).

If the function is recursive, use letrec and definerec instead. These behave identically to let and define, respectively, except that before the second argument is evaluated, the function name is included in the environment bound to a dummy value (Undef). Then after the second argument is evaluated, the binding is updated to that value. This means that the function name may occur in the function body so that the function may call itself recursively.

The load operator can be used to load and interpret the contents of a file. The keywords load, define, and definerec may be used only at the top level.

Program Structure

Our implementation has four main modules:

Types: The types module contains the type definitions for this project and some useful helper functions.
Repl: Short for "Read Eval Print Loop," the Repl module handles the interpreter functionality. The REPL loop reads in a string from either the console input or a file, calls the parser on the input, calls eval on the expression, and prints the result. The REPL loop also manages the global environment between expression evaluations.
Eval: The Eval module forms the basis for the procedure of taking in a Scheme expression, encoded as data of type lisp_expr and returning the evaluated result. This module itself is fairly succinct, but relies heavily on the keyword and operator functionality managed by the Dispatch module.
Dispatch: Upon encountering a keyword or operator's symbol during evaluation, the Eval module dispatches on that symbol to receive the appropriate function to handle that keyword. In the Dispatch module, we store an association map between symbols and functions, where each function handles evaluation for a special form of that keyword or symbol. Each handler takes three arguments, a callback to eval, the remaining elements of the list to be processed, and the current environment. The callback, which is accomplished by simply passing in eval as an argument, is necessary as many of the handler functions will need to call eval, but since they live in a different file, the direct recursive implementation is not possible.

In addition to these modules, the interpreter requires two additional files, keywords.ml and operators.ml. These two files specify the handling functions referred to in the dispatch table within the Dispatch module. Most of your task will involve filling in these functions according to our specification.

Keywords and Operators

Keyword Handlers

Be sure to use the given malformed exception function for the case of malformed input and runtime for general errors. Ellipsis is used in the following descriptions in the natural way (it is not part of the Scheme syntax).

(apply h t) evaluates t to a list (t1 t2 ...) and evaluates (h t1 t2 ...).
(car l) evaluates l and returns the head of l, raising a runtime exception if l is empty. Be sure to handle both lists and dotted lists.
(cdr l) evaluates l and returns the tail in a similar manner to car.
(cons h t) evaluates h and t and returns their cons pair. Note that consing an element onto a list, a dotted list, and any other value yield different outputs. If cons is called with only one argument, yield a lambda that takes the tail in as an argument.
(cond (p1 c1) (p2 c2) ... (pn cn)) is a conditional statement. Each (p c) is a pair of a predicate (Boolean) and a consequent. Step through the pairs one by one, evaluating the predicates. If a predicate is true, evaluate the corresponding consequent and return its value. If all pairs' predicates are false, return false. The predicate else is a synonym for #t.
(eq? a b) evaluates a and b and returns true if they are both atoms with the same name. It also returns true if they are both primitives with the same value. For all other properly formed input, eq? returns false.
(equal? l1 l2) defaults to eq?'s behavior for atoms and primitives. For lists and dotted lists, equal? recursively calls equal? on corresponding pairs of elements in l1 and l2 (think List.fold_left2). If the lists have different length or the input is properly formed but does not fit into the previous cases, return false.
(eval expr) evaluates expr and then evaluates it again in the current environment (e.g. (eval '(+ 1 2)) => 3)
(if pred conseq alt) evaluates the predicate and, if it is true, evaluates and returns the consequent. Otherwise, the alternative is used.
(lambda () body) returns a closure of the body under the current environment. Since there are no arguments, you may use a dummy variable (variables beginning with an underscore are reserved for the interpreter) in place of the argument in the closure.
(lambda (x) body) returns a closure in the natural way.
(lambda (x y z ...) expr) returns a closure whose argument is x and whose body is a lambda of (y z ...) to expr.
(let ((f1 e1) (f2 e2) ...) body) is equivalent to the OCaml let f1 = e1 in let f2 = e2 in ... in body. Evaluate the bindings from left to right and finally evaluate the body
(letrec ((f1 e1) (f2 e2) ...) body) is equivalent to the OCaml let rec f1 = e1 and f2 = e2 and ... in body. Note that all the f are in scope of each other and themselves.
(list x1 x2 x3 ...) evaluates all of the x to x' and returns the list (x1' x2' x3' ...). Note that using list to build a list involves evaluating each element, but quotation does not.
(list? expr) returns true if the evaluation of expr is a list and false if a single argument is passed, but it is not a list.
(null? l) returns true if l is empty and false if exactly one nonempty argument is passed.
(quote expr) is equivalent to 'expr and returns expr unevaluated.

The following keywords are only allowed in the toplevel and are handled in repl.ml.

(define f expr) binds the evaluation of expr to f in the toplevel environment.
(definerec f expr) does the same, but with the added complication of recursion. Note that only one recursive binding is allowed at a time with definerec.
(load path) loads the Scheme file at the specified path, parses it, and evaluates its list of expressions in the toplevel.

Operator Handlers

Operators fall into two main categories: numeric operators and boolean operators. We suggest handling each category all at once, using your knowledge of partial application and higher order functions. We have provided a type checker that returns Int if all arguments are Integers, Flt if the arguments are all Floats or mixed Integers and Floats, and Fail in any other case. For all of the numeric operators described below, the "weak typing" model is used where Integers are upgraded to Floats if one or more Floats are present in the argument list. Also, be sure to evaluate all terms before performing the appropriate reduction. For and, or, and not, use the helper function to_bool to convert data of type lisp_expr to OCaml's boolean type. Finally, note that we do not require boolean operations to be short-circuited, although you may do this if you would like.

(* n1 n2 n3 ...) returns the product of all the n.
(+ n1 n2 n3 ...) returns the sum of all the n.
(- n1 n2 n3 ...) returns n1 minus n2 minus n3 and so on.
(/ n1 n2 n3 ...) returns n1 divided by n2 divided by n3 and so on.
(< n1 n2 n3 ...) returns true if n1 < n2 and n2 < n3 and so on. You may use OCaml's ordering on lisp_expr here.
(<= n1 n2 n3 ...) returns true if n1 <= n2 and n2 <= n3 and so on.
(= n1 n2 n3 ...) returns true if n1 = n2 and n2 = n3 and so on. You may use OCaml's equality semantics on lisp_expr here.
(> n1 n2 n3 ...) returns true if n1 > n2 and n2 > n3 and so on.
(>= n1 n2 n3 ...) returns true if n1 >= n2 and n2 >= n3 and so on.
(and p1 p2 p3 ...) returns true if all the predicates are true and false if any one of them is false but the input is well-formed.
(or p1 p2 p3 ...) returns true if any of its predicates are true and false if none of them are but the input is well-formed.
(not p) returns true if p is false and false if p is true.

Your Tasks

All code submitted must adhere to the interfaces specified in the .mli files and use the given helper functions in an intelligent manner. No imperative usage of OCaml is allowed outside of interacting with environments.

types.ml

Complete the pretty_print function in the following manner.

An atom prints to itself.
A list yields a left parenthesis followed by all of its elements pretty printed, with each element separated by a space, followed by a right parenthesis.
A dotted list prints the same way as a list but has a dot between its second to last and last elements.
Integers and floats print to their OCaml equivalents (you may use string_of_int and string_of_float).
Strings print to their value, but with quotes surrounding them.
Closures print to "<fun>". We have provided a useful alternative for debugging, but your final version should simply return "<fun>".
Undefs should not be passed to the pretty printer and should yield a foreboding message such as "Yikes, Undef!"

eval.ml

Fill in the two functions eval and apply so that they have the following properties.

Primitives, Closures, and the empty list evaluate to themselves.
Evaluating an Atom looks up that Atom's value in the current environment. If it is found, it returns the value in the environment. If not, a runtime exception is raised. Use the given function runtime to raise an exception with a descriptive error message and the current expression being evaluated.
Evaluating a list with an atom in the first position has three cases. If the atom is an operator, the appropriate function in the operator dispatch table is called. Similarly, if the atom is a keyword, the corresponding function in the keyword dispatch table is called. Finally, if the atom is neither a keyword nor an operator, apply the rest of the list as arguments to the first element.
Our reference implementation has eval split into two functions, eval and list_eval for clarity in handling the aforementioned cases, but this is not strictly necessary.
Any other form passed to eval is considered an application, as described in the previous bullet.
The function apply takes in a Closure as its first argument, a list of arguments, and an environment. It evaluates each argument in the current environment and then applies them in sequence to the Closure. If input without this structure is passed to apply, a runtime exception is raised.
Your implementation of apply should support partial application.

repl.ml

Complete the functions eval_one and load_file. The eval_one function evaluates a given expression in an environment and returns a new environment. This function should handle the special cases of define, definerec, and load, which can only exist on the toplevel. Finally, load_file reads code from a specified input path. Note that you may load a file from within another source file, but the path is always relative to the interpreter's working directory.

keywords.ml, operators.ml

Complete all functions in each file with respect to the specifications given in the previous section. You may add in rec keywords. We have provided a useful helper function desugar in the keywords file which maps the following syntactic transformation over a list of bindings.

((f x y z) body) -> (f (lambda (x y z) body))

In the operators file, we have also provided check_type and to_flt for dealing with Scheme's weak typing on numeric types. Additionally, to_bool is defined in the Types module.

Part 2: Tweaking the Interpreter (15 points)

Extend our interpreter by adding a display keyboard that prints its argument using the pretty printer you wrote. It should take exactly one argument and return the empty list. Remember that whenever you add a keyword, you have to add it to the relevant dispatch table.
Currently the evaluator evaluates the arguments to a function as they are applied, meaning that the list of arguments is evaluated from left to right. Implement a new keyword right-apply in the spirit of (but slightly different than) apply that evaluates its arguments from right-to-left. More precisely, (right-apply f (arg1 arg2 ... argn)) evaluates argn, ..., then arg2, and finally arg1 and then applies each argument to f. You may validate your keyword's evaluation order by using the display keyword you wrote previously. Note that this isn't as easy as it looks: For a function f that prints and returns its argument, ensure that (apply-right + '((f 0) (f 1) (f 2))) works as expected.
The let keyword we wrote before is sequential in that each binding is in scope in all the bindings following it. Write a parallel-let keyword that evaluates each of its bindings such that none of them are in scope of each other.
Lazy evaluation is an alternative to the eager evaluation semantics we implemented in our interpreter. We will simulate lazy evaluation within our interpreter with two new keywords: delay and force. Delay should suspend a computation, creating a "thunk" that is not evaluated until it is passed into a force expression. Forcing a thunk evaluates the delayed expression in the environment in which it was delayed. Note that force should not do anything on expressions that are not thunks; you may find using a reserved variable (one whose identifier begins with an underscore) helpful for this. Note that you are not allowed to change any of the interpreter's types or fundamental structure: You may only add two new handler functions and their entries in the dispatch table. We have provided sample code demonstrating streams in "etc/lazy.scm" along with a representative session below. As always, the ellipsis is not actually the output from the interpreter. If (and only if) you're feeling particularly adventurous, write a lazy mergesort such that (take k (mergesort l)) can be performed for fixed k in time linear in the length of the list.

$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/lazy.scm")
...
scheme311> (nth primes 10)
=> 31
scheme311> (take 10 fib-stream)
=> (1 1 2 3 5 8 13 21 34 55)
scheme311> (take 10 (drop 5 (sift 3 naturals)))
=> (8 10 11 13 14 16 17 19 20 22)
scheme311> (take 10 (merge (sift 5 primes) (sift 7 naturals)))
=> (1 2 2 3 3 4 5 6 7 8)
scheme311> :quit

Part 3: Fun with Scheme! (10 points)

Now that you have finished your Scheme interpreter, let's implement something that's particularly well suited to Scheme's symbolic computation. In most languages, defining new syntactic constructs requires the use of macros and other features not intrinsic to the language. Not so with Scheme!

Recall that let bindings can be reformulated in terms of lambda applications in a straightforward manner when there are no data dependencies among the bindings. For example the code let x = 3 in let y = 4 in x + y can be equivalently restated as (fun x y -> x + y) 3 4. We will now together implement a Scheme function that will perform this transformation on code fragments it takes as input.

Your job is to fill out the map, foldl, reverse, first, second, and third list functions in etc/let-desugar.scm in the natural way. We will use most of these functions in the provided code for the desugaring process.

The first function provided single-binding-desugar transforms an expression in the form ((f x y ...) expr) to (f (lambda (x y ...) expr)) and leaves other bindings alone. The second function, desugar-bindings maps this over a binding list, doing the same thing as the provided OCaml function desugar, but within Scheme! Finally, the function let-desugar performs the promised transformation.

Our let-desugar function first splits the code into two parts: the list of bindings and the body of the let statement. It then desugars the bindings, using our previously defined function. Next, it splits the binding list in half, forming a list of identifiers and a list of values. It then constructs the lambda form as follows: (('lambda identifiers body) values). Here's an example session.

$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/let-desugar.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> (let ((x 3) (y 4) (z 5)) (+ x y z))
=> (let (((f x y) (+ x y)) ((g x y) ( * x y)) (a 1) (b 2) (c 3) (d 4)) (f (g a b) (g c d)))
scheme311> (let-desugar test-code-1)
=> ((lambda (x y z) (+ x y z)) 3 4 5)
scheme311> (let-desugar test-code-2)
=> ((lambda (f g a b c d) (f (g a b) (g c d))) (lambda (x y) (+ x y)) (lambda (x y) ( * x y)) 1 2 3 4)
scheme311> (let-desugar '(let ((f (cons 1))) (f 2)))
=> ((lambda (f) (f 2)) (cons 1))
scheme311> (eval (let-desugar test-code-1)) (eval (let-desugar test-code-2))
=> 12
=> 14
scheme311> (eq? (eval (let-desugar test-code-1)) (eval test-code-1))
=> #t
scheme311> (eq? (eval (let-desugar test-code-2)) (eval test-code-2))
=> #t
scheme311> :quit