Problem Set 4: A Scheme Interpreter

Due: October 20, 2011

Version: 1
Last Modified: October 6, 23:59
Changes:

Part 1: An Interpreter for "Scheme311" (75 points)

Overview

For this problem set, you will be implementing an interpreter for a subset of the Scheme programming language. Scheme is a dialect of Lisp, a language centered around the list data structure. Notably, programs are written as lists of symbols and can be manipulated and constructed just as with any list. Hence, evaluation is a function on lists, as you will see as you fill out your Scheme implementation. Evaluation follows the environment model presented in class very closely.

We have provided for you a lexical analyzer and parser that reads strings into the lists suitable for processing by the evaluator. The input programs are converted into data of type Types.lisp_expr, defined in the Types module. Unlike OCaml, Scheme does not provide any type checking or nontrivial syntactic analysis until evaluation, so you will find the syntax and typing of Scheme a bit less restrictive than OCaml's (not necessarily a good thing).

Your job will be to write functions that evaluate the lists generated by the parser. Additionally, you will have to check for runtime errors, such as malformed expressions and type errors (such as adding a string to an integer). We have provided an exception instance and several functions for raising exceptions in the Types module.

You can build your interpreter by invoking the make command in the solution directory, which should work out of the box on Mac OSX and Linux platforms. On Windows, either install GNU make from GnuWin32 or a shell like MinGW or Cygwin. MinGW is probably the easiest way to get up and running. If you wish to remove the compiled files, invoke the command make clean.

We have also included several Scheme programs in the file etc/examples.scm to test your implementation. Here is a sample run with our solution code.

$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/examples.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
scheme311> (map square (list 1 2 3 4 5))
=> (1 4 9 16 25)
scheme311> (null? (list "I'm" "not" "empty"))
=> #f
scheme311> (definerec (neverender x) (neverender x))
=> <fun>
scheme311> (even? (fib 12))
=> #t
scheme311> (definerec (make n x) (if (<= n 0) '() (cons x (make (- n 1) x))))
=> <fun>
scheme311> (make 10 'speed)
=> (speed speed speed speed speed speed speed speed speed speed)
scheme311> :quit

Our Dialect of Scheme

Values

Our language has four types of primitives:
  1. Integer, which correspond to the OCaml type int,
  2. Float, which correspond to the OCaml type float,
  3. String, which correspond to the OCaml type string, and
  4. Bool, which correspond to the OCaml type bool.
From this primitives, we construct five different types of expressions.
  1. Atoms, which represent symbols. Symbols do not have an equivalent in OCaml, but you may think of them as named atomic values. Symbols support testing for equality using the keyword "eq?" (described below). Before evaluation, symbols often correspond to keywords, operators, and variable names.
  2. Lists, which represent, well, lists. Lists serve many purposes in Scheme. Code is written in the form of nested lists, and we may also use lists for more mundane purposes, such as storing ordered collections. Idiomatic Scheme often creates more complex data structures, such as trees, from lists. Much like in OCaml, lists are chained cons pairs. Just like [1;2;3] is equivalent to 1 :: 2 :: 3 :: [], the list (1 2 3) is the same as (cons 1 (cons 2 (cons 3 '()))).
  3. Dotted lists, which represent lists whose last element is not the empty list. For example, (cons 1 2) yields the dotted list (1 . 2). These dotted lists are often used in a similar manner as tuples.
  4. Primitives, as described above.
  5. Closures, which represent functions and their environments.
There is also a sixth type Undef which is used internally for creating recursive bindings and is not available to the Scheme programmer. The following code, taken from types.ml, declares the type lisp_expr. Evaluation returns data of this type.
type lisp_expr =
    Atom of symbol
  | List of lisp_expr list
  | Dotted of (lisp_expr list * lisp_expr)
  | Primitive of primitive
  | Closure of lisp_expr * symbol * env
  | Undef

The Reader

The lexical analyzer and parser create an input program formatted in plain text into a lisp_expr ready for evaluation. The parser recognizes the following patterns.

The List Data Structure

The keyword cons is used to construct a list. Cons is directly analogous to :: in OCaml. There are two natural projections on this type, car and cdr, which return the first and second elements of a cons pair, respectively.
(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2
The names "car" and "cdr" are historical: They stand for "contents of the address register" and "contents of the decrement register," respectively, and refer to hardware components of the IBM 704 computer on which LISP was originally implemented in the late 1950s. Lists are built from the empty list () using successive cons pairs. Thus, lists in Scheme are sequences of objects combined using cons and terminated with the empty list (). For example, the list
(cons 1 (cons 2 (cons 3 ())))
is the Scheme equivalent of the OCaml list [1;2;3]. There are two shorthands for creating lists in Scheme. The first is the list keyword. Instead of the previous construction, we may have used
(list 1 2 3)
and the resulting value is the same list. Alternatively, we could have quoted the list:
'(1 2 3)
Upon evaluation, all three constructions should yield
(1 2 3)
Dotted lists are constructed via cons sequences not terminated with the empty list. For example, (cons 1 (cons 2 3)) evaluates to (1 2 . 3)

Semantic Overview

Operations in Scheme are all represented in prefix notation and must be enclosed in parentheses. For instance, to add 5 and 7, we would type (+ 5 7). There is no notion of precedence and all compound expressions must be enclosed in parentheses. For instance, the OCaml expression 5 + (2 * 7) would be written (+ (5 (* 2 7))) in Scheme.

Comparisons may be performed on all types, and the resulting behavior is governed by the behavior of OCaml's compare function. Arithmetic operations may be performed on both integers and floats. If they are mixed, the evaluator upgrades all integers involved to floats. Boolean operations such as and, or, and not only operate on boolean values.

Variables can be bound to values in several ways. Top-level global variables are bound using the define keyword. Local variables are bound using the let keyword. The expression (let ((x 5) (y 6)) (* x y)) is equivalent to the OCaml let x = 5 in let y = 6 in x * y.

Finally, anonymous functions are defined using the lambda keyword. These can be bound to variables using define or let. For example, (lambda (x) (* 2 x)) in Scheme is equivalent to fun x -> 2*x in OCaml. Scheme provides syntactic sugar for multiple variables as well: (lambda (x y) (+ x y)) is equivalent to (lambda (x) (lambda (y) (+ x y))).

If we would like to use define to bind functions of one or more variables at the top level, we can write
(define (f x y) (+ x y))
or
(define f (lambda (x y) (+ x y))).

If the function is recursive, use letrec and definerec instead. These behave identically to let and define, respectively, except that before the second argument is evaluated, the function name is included in the environment bound to a dummy value (Undef). Then after the second argument is evaluated, the binding is updated to that value. This means that the function name may occur in the function body so that the function may call itself recursively.

The load operator can be used to load and interpret the contents of a file. The keywords load, define, and definerec may be used only at the top level.

Program Structure

Our implementation has four main modules:
  1. Types: The types module contains the type definitions for this project and some useful helper functions.
  2. Repl: Short for "Read Eval Print Loop," the Repl module handles the interpreter functionality. The REPL loop reads in a string from either the console input or a file, calls the parser on the input, calls eval on the expression, and prints the result. The REPL loop also manages the global environment between expression evaluations.
  3. Eval: The Eval module forms the basis for the procedure of taking in a Scheme expression, encoded as data of type lisp_expr and returning the evaluated result. This module itself is fairly succinct, but relies heavily on the keyword and operator functionality managed by the Dispatch module.
  4. Dispatch: Upon encountering a keyword or operator's symbol during evaluation, the Eval module dispatches on that symbol to receive the appropriate function to handle that keyword. In the Dispatch module, we store an association map between symbols and functions, where each function handles evaluation for a special form of that keyword or symbol. Each handler takes three arguments, a callback to eval, the remaining elements of the list to be processed, and the current environment. The callback, which is accomplished by simply passing in eval as an argument, is necessary as many of the handler functions will need to call eval, but since they live in a different file, the direct recursive implementation is not possible.

In addition to these modules, the interpreter requires two additional files, keywords.ml and operators.ml. These two files specify the handling functions referred to in the dispatch table within the Dispatch module. Most of your task will involve filling in these functions according to our specification.

Keywords and Operators

Keyword Handlers

Be sure to use the given malformed exception function for the case of malformed input and runtime for general errors. Ellipsis is used in the following descriptions in the natural way (it is not part of the Scheme syntax). The following keywords are only allowed in the toplevel and are handled in repl.ml.

Operator Handlers

Operators fall into two main categories: numeric operators and boolean operators. We suggest handling each category all at once, using your knowledge of partial application and higher order functions. We have provided a type checker that returns Int if all arguments are Integers, Flt if the arguments are all Floats or mixed Integers and Floats, and Fail in any other case. For all of the numeric operators described below, the "weak typing" model is used where Integers are upgraded to Floats if one or more Floats are present in the argument list. Also, be sure to evaluate all terms before performing the appropriate reduction. For and, or, and not, use the helper function to_bool to convert data of type lisp_expr to OCaml's boolean type. Finally, note that we do not require boolean operations to be short-circuited, although you may do this if you would like.

Your Tasks

All code submitted must adhere to the interfaces specified in the .mli files and use the given helper functions in an intelligent manner. No imperative usage of OCaml is allowed outside of interacting with environments.

types.ml

Complete the pretty_print function in the following manner.

eval.ml

Fill in the two functions eval and apply so that they have the following properties.

repl.ml

Complete the functions eval_one and load_file. The eval_one function evaluates a given expression in an environment and returns a new environment. This function should handle the special cases of define, definerec, and load, which can only exist on the toplevel. Finally, load_file reads code from a specified input path. Note that you may load a file from within another source file, but the path is always relative to the interpreter's working directory.

keywords.ml, operators.ml

Complete all functions in each file with respect to the specifications given in the previous section. You may add in rec keywords. We have provided a useful helper function desugar in the keywords file which maps the following syntactic transformation over a list of bindings.
((f x y z) body) -> (f (lambda (x y z) body))
In the operators file, we have also provided check_type and to_flt for dealing with Scheme's weak typing on numeric types. Additionally, to_bool is defined in the Types module.

Part 2: Tweaking the Interpreter (15 points)

  1. Extend our interpreter by adding a display keyboard that prints its argument using the pretty printer you wrote. It should take exactly one argument and return the empty list. Remember that whenever you add a keyword, you have to add it to the relevant dispatch table.
  2. Currently the evaluator evaluates the arguments to a function as they are applied, meaning that the list of arguments is evaluated from left to right. Implement a new keyword right-apply in the spirit of (but slightly different than) apply that evaluates its arguments from right-to-left. More precisely, (right-apply f (arg1 arg2 ... argn)) evaluates argn, ..., then arg2, and finally arg1 and then applies each argument to f. You may validate your keyword's evaluation order by using the display keyword you wrote previously. Note that this isn't as easy as it looks: For a function f that prints and returns its argument, ensure that (apply-right + '((f 0) (f 1) (f 2))) works as expected.
  3. The let keyword we wrote before is sequential in that each binding is in scope in all the bindings following it. Write a parallel-let keyword that evaluates each of its bindings such that none of them are in scope of each other.
  4. Lazy evaluation is an alternative to the eager evaluation semantics we implemented in our interpreter. We will simulate lazy evaluation within our interpreter with two new keywords: delay and force. Delay should suspend a computation, creating a "thunk" that is not evaluated until it is passed into a force expression. Forcing a thunk evaluates the delayed expression in the environment in which it was delayed. Note that force should not do anything on expressions that are not thunks; you may find using a reserved variable (one whose identifier begins with an underscore) helpful for this. Note that you are not allowed to change any of the interpreter's types or fundamental structure: You may only add two new handler functions and their entries in the dispatch table. We have provided sample code demonstrating streams in "etc/lazy.scm" along with a representative session below. As always, the ellipsis is not actually the output from the interpreter. If (and only if) you're feeling particularly adventurous, write a lazy mergesort such that (take k (mergesort l)) can be performed for fixed k in time linear in the length of the list.
$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/lazy.scm")
...
scheme311> (nth primes 10)
=> 31
scheme311> (take 10 fib-stream)
=> (1 1 2 3 5 8 13 21 34 55)
scheme311> (take 10 (drop 5 (sift 3 naturals)))
=> (8 10 11 13 14 16 17 19 20 22)
scheme311> (take 10 (merge (sift 5 primes) (sift 7 naturals)))
=> (1 2 2 3 3 4 5 6 7 8)
scheme311> :quit

Part 3: Fun with Scheme! (10 points)

Now that you have finished your Scheme interpreter, let's implement something that's particularly well suited to Scheme's symbolic computation. In most languages, defining new syntactic constructs requires the use of macros and other features not intrinsic to the language. Not so with Scheme!

Recall that let bindings can be reformulated in terms of lambda applications in a straightforward manner when there are no data dependencies among the bindings. For example the code let x = 3 in let y = 4 in x + y can be equivalently restated as (fun x y -> x + y) 3 4. We will now together implement a Scheme function that will perform this transformation on code fragments it takes as input.

Your job is to fill out the map, foldl, reverse, first, second, and third list functions in etc/let-desugar.scm in the natural way. We will use most of these functions in the provided code for the desugaring process.

The first function provided single-binding-desugar transforms an expression in the form ((f x y ...) expr) to (f (lambda (x y ...) expr)) and leaves other bindings alone. The second function, desugar-bindings maps this over a binding list, doing the same thing as the provided OCaml function desugar, but within Scheme! Finally, the function let-desugar performs the promised transformation.

Our let-desugar function first splits the code into two parts: the list of bindings and the body of the let statement. It then desugars the bindings, using our previously defined function. Next, it splits the binding list in half, forming a list of identifiers and a list of values. It then constructs the lambda form as follows: (('lambda identifiers body) values). Here's an example session.

$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands

scheme311> (load "etc/let-desugar.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> (let ((x 3) (y 4) (z 5)) (+ x y z))
=> (let (((f x y) (+ x y)) ((g x y) ( * x y)) (a 1) (b 2) (c 3) (d 4)) (f (g a b) (g c d)))
scheme311> (let-desugar test-code-1)
=> ((lambda (x y z) (+ x y z)) 3 4 5)
scheme311> (let-desugar test-code-2)
=> ((lambda (f g a b c d) (f (g a b) (g c d))) (lambda (x y) (+ x y)) (lambda (x y) ( * x y)) 1 2 3 4)
scheme311> (let-desugar '(let ((f (cons 1))) (f 2)))
=> ((lambda (f) (f 2)) (cons 1))
scheme311> (eval (let-desugar test-code-1)) (eval (let-desugar test-code-2))
=> 12
=> 14
scheme311> (eq? (eval (let-desugar test-code-1)) (eval test-code-1))
=> #t
scheme311> (eq? (eval (let-desugar test-code-2)) (eval test-code-2))
=> #t
scheme311> :quit