Problem Set 4: A Scheme Interpreter
Due: October 20, 2011
Version: 1
Last Modified: October 6, 23:59
Changes:
Part 1: An Interpreter for "Scheme311" (75 points)
Overview
For this problem set, you will be implementing an interpreter for a subset of
the Scheme
programming language. Scheme is a dialect of Lisp, a language centered around the list
data structure. Notably, programs are written as lists of symbols and can be manipulated
and constructed just as with any list. Hence, evaluation is a function on lists, as you
will see as you fill out your Scheme implementation. Evaluation follows the environment
model presented in class very closely.
We have provided for you a lexical analyzer and parser that reads strings into the
lists suitable for processing by the evaluator. The input programs are converted into
data of type Types.lisp_expr, defined in the Types module. Unlike
OCaml, Scheme does not provide any type checking or nontrivial syntactic analysis until
evaluation, so you will find the syntax and typing of Scheme a bit less restrictive
than OCaml's (not necessarily a good thing).
Your job will be to write functions that evaluate the lists generated by the parser.
Additionally, you will have to check for runtime errors, such as malformed expressions and
type errors (such as adding a string to an integer). We have provided an exception instance
and several functions for raising exceptions in the Types module.
You can build your interpreter by invoking the make command in the solution directory,
which should work out of the box on Mac OSX and Linux platforms. On Windows, either install
GNU make from GnuWin32 or a shell like MinGW or Cygwin. MinGW is probably the easiest
way to get up and running. If you wish to remove the compiled files, invoke the command
make clean.
We have also included several Scheme programs in the file etc/examples.scm to test your
implementation. Here is a sample run with our solution code.
$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands
scheme311> (load "etc/examples.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
scheme311> (map square (list 1 2 3 4 5))
=> (1 4 9 16 25)
scheme311> (null? (list "I'm" "not" "empty"))
=> #f
scheme311> (definerec (neverender x) (neverender x))
=> <fun>
scheme311> (even? (fib 12))
=> #t
scheme311> (definerec (make n x) (if (<= n 0) '() (cons x (make (- n 1) x))))
=> <fun>
scheme311> (make 10 'speed)
=> (speed speed speed speed speed speed speed speed speed speed)
scheme311> :quit
Our Dialect of Scheme
Values
Our language has four types of primitives:
- Integer, which correspond to the OCaml type int,
- Float, which correspond to the OCaml type float,
- String, which correspond to the OCaml type string, and
- Bool, which correspond to the OCaml type bool.
From this primitives, we construct five different types of expressions.
- Atoms, which represent symbols. Symbols do not have an equivalent
in OCaml, but you may think of them as named atomic values. Symbols support testing
for equality using the keyword "eq?" (described below). Before evaluation,
symbols often correspond to keywords, operators, and variable names.
- Lists, which represent, well, lists. Lists serve many purposes in Scheme.
Code is written in the form of nested lists, and we may also use lists for
more mundane purposes, such as storing ordered collections. Idiomatic Scheme
often creates more complex data structures, such as trees, from lists. Much
like in OCaml, lists are chained cons pairs. Just like [1;2;3] is
equivalent to 1 :: 2 :: 3 :: [], the list (1 2 3) is the
same as (cons 1 (cons 2 (cons 3 '()))).
- Dotted lists, which represent lists whose last element is not the empty
list. For example, (cons 1 2) yields the dotted list (1 . 2).
These dotted lists are often used in a similar manner as tuples.
- Primitives, as described above.
- Closures, which represent functions and their environments.
There is also a sixth type Undef which is used internally for creating
recursive bindings and is not available to the Scheme programmer. The following
code, taken from types.ml, declares the type lisp_expr. Evaluation
returns data of this type.
type lisp_expr =
Atom of symbol
| List of lisp_expr list
| Dotted of (lisp_expr list * lisp_expr)
| Primitive of primitive
| Closure of lisp_expr * symbol * env
| Undef
The Reader
The lexical analyzer and parser create an input program formatted in plain
text into a lisp_expr ready for evaluation. The parser recognizes
the following patterns.
- Integers consist of an optional plus or minus sign followed by a sequence
of one or more numeric characters. Thus, both negative and positive integers
are allowed in our implementation.
- Floating point numbers consist of an optional plus or minus sign followed
by one or more numeric characters followed by a dot and then an optional sequence
of numeric characters. Note that a number before the point is required: -.1
is a syntax error.
- Strings consist of sequences of zero or more characters enclosed
in double quotes, e.g. "Hello". Certain characters, such as double quotes,
backslashes, newlines, and tabs can be escaped with a backslash.
- Booleans are specified by the strings #t for true and #f for false.
- Atoms are more or less sequences of arbitrary characters. We do not allow atoms to
start with a number or an underscore. As you will see, variables whose name begins
with an underscore are reserved for the interpreter. If you would like to learn the nuances
of what an atom is, look at the regular expression atom in the lexer.mll
file.
- A list begins with a left parenthesis and ends with a right one. Elements are separated
with spaces. The empty list is designated simply as ().
- A dotted list is a list with a dot between its second to last and last element. For
example, ("this" "is" "a" "dotted" . "list") is a dotted list.
- Prefixing an expression with a single quote ' quotes that expression. A quoted
expression is not evaluated; e.g., '(+ 1 2 3) evaluates to (+ 1 2 3).
We will discuss this more later.
- Comments begin with ; and end with a semicolon as well.
The List Data Structure
The keyword cons is used to construct a list. Cons is directly analogous
to :: in OCaml. There are two natural projections on this type, car
and cdr, which return the first and second elements of a cons pair, respectively.
(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2
The names "car" and "cdr" are historical: They stand for "contents of the address
register" and "contents of the decrement register," respectively, and refer to hardware
components of the IBM 704 computer on which LISP was originally implemented in the late
1950s.
Lists are built from the empty list () using successive cons pairs. Thus, lists
in Scheme are sequences of objects combined using cons and terminated with the
empty list (). For example, the list
(cons 1 (cons 2 (cons 3 ())))
is the Scheme equivalent of the OCaml list [1;2;3]. There are two shorthands for
creating lists in Scheme. The first is the list keyword. Instead of the previous construction,
we may have used
(list 1 2 3)
and the resulting value is the same list. Alternatively, we could have quoted the list:
'(1 2 3)
Upon evaluation, all three constructions should yield
(1 2 3)
Dotted lists are constructed via cons sequences not terminated with the empty list. For
example, (cons 1 (cons 2 3)) evaluates to (1 2 . 3)
Semantic Overview
Operations in Scheme are all represented in prefix notation and
must be enclosed in parentheses. For instance, to add 5 and 7, we
would type (+ 5 7). There is no notion of precedence and
all compound expressions must be enclosed in parentheses. For
instance, the OCaml expression 5 + (2 * 7) would be
written (+ (5 (* 2 7))) in Scheme.
Comparisons may be performed on all types, and the resulting
behavior is governed by the behavior of OCaml's compare
function. Arithmetic operations may be performed on both integers and
floats. If they are mixed, the evaluator upgrades all integers
involved to floats. Boolean operations such as and, or,
and not only operate on boolean values.
Variables can be bound to values in several ways. Top-level global
variables are bound using the define keyword. Local
variables are bound using the let keyword. The
expression (let ((x 5) (y 6)) (* x y)) is equivalent to the
OCaml let x = 5 in let y = 6 in x * y.
Finally, anonymous functions are defined using the lambda
keyword. These can be bound to variables using define
or let. For example, (lambda (x) (* 2 x)) in Scheme
is equivalent to fun x -> 2*x in OCaml. Scheme provides
syntactic sugar
for multiple variables as well: (lambda (x y) (+ x
y)) is equivalent to (lambda (x) (lambda (y) (+ x y))).
If we would like to use define to bind functions of one
or more variables at the top level, we can write
(define (f x y) (+ x y))
or
(define f (lambda (x y) (+ x y))).
If the function is recursive, use letrec
and definerec instead. These behave identically
to let and define, respectively, except that
before the second argument is evaluated, the function name is included
in the environment bound to a dummy value (Undef). Then
after the second argument is evaluated, the binding is updated to that
value. This means that the function name may occur in the function
body so that the function may call itself recursively.
The load operator can be used to load and interpret the
contents of a file. The keywords load, define, and
definerec may be used only at the top level.
Program Structure
Our implementation has four main modules:
- Types: The types module contains the type definitions for this
project and some useful helper functions.
- Repl: Short for "Read Eval Print Loop," the Repl module handles
the interpreter functionality. The REPL loop reads in a string from
either the console input or a file, calls the parser on the input,
calls eval on the expression, and prints the result. The
REPL loop also manages the global environment between expression
evaluations.
- Eval: The Eval module forms the basis for the procedure of
taking in a Scheme expression, encoded as data of
type lisp_expr and returning the evaluated result. This
module itself is fairly succinct, but relies heavily on the keyword
and operator functionality managed by the Dispatch module.
- Dispatch: Upon encountering a keyword or operator's symbol
during evaluation, the Eval module dispatches on that symbol to
receive the appropriate function to handle that keyword. In the
Dispatch module, we store an association map between symbols and
functions, where each function handles evaluation for a special form
of that keyword or symbol. Each handler takes three arguments, a
callback
to eval, the remaining elements of the list to be
processed, and the current environment. The callback, which is
accomplished by simply passing in eval as an argument, is
necessary as many of the handler functions will need to
call eval, but since they live in a different file, the
direct recursive implementation is not possible.
In addition to these modules, the interpreter requires two additional files,
keywords.ml and operators.ml. These two files specify the
handling functions referred to in the dispatch table within the Dispatch module. Most
of your task will involve filling in these functions according to our specification.
Keywords and Operators
Keyword Handlers
Be sure to use the given malformed exception function for the case of
malformed input and runtime for general errors. Ellipsis is used in the
following descriptions in the natural way (it is not part of the Scheme syntax).
- (apply h t) evaluates t to a list (t1 t2 ...) and
evaluates (h t1 t2 ...).
- (car l) evaluates l and returns the head of l,
raising a runtime exception if l is empty. Be sure to handle both
lists and dotted lists.
- (cdr l) evaluates l and returns the tail in a similar
manner to car.
- (cons h t) evaluates h and t and returns their
cons pair. Note that consing an element onto a list, a dotted list, and
any other value yield different outputs. If cons is called with only one
argument, yield a lambda that takes the tail in as an argument.
- (cond (p1 c1) (p2 c2) ... (pn cn)) is a conditional statement. Each
(p c) is a pair of a predicate (Boolean) and a consequent. Step through
the pairs one by one, evaluating the predicates. If a predicate is true, evaluate
the corresponding consequent and return its value. If all pairs' predicates are
false, return false. The predicate else is a synonym for #t.
- (eq? a b) evaluates a and b and returns true if they
are both atoms with the same name. It also returns true if they are both primitives
with the same value. For all other properly formed input, eq? returns false.
- (equal? l1 l2) defaults to eq?'s behavior for atoms and
primitives. For lists and dotted lists, equal? recursively calls equal? on
corresponding pairs of elements in l1 and l2 (think List.fold_left2).
If the lists have different length or the input is properly formed but does not fit
into the previous cases, return false.
- (eval expr) evaluates expr and then evaluates it again in the
current environment (e.g. (eval '(+ 1 2)) => 3)
- (if pred conseq alt) evaluates the predicate and, if it is true, evaluates
and returns the consequent. Otherwise, the alternative is used.
- (lambda () body) returns a closure of the body under the current environment.
Since there are no arguments, you may use a dummy variable (variables beginning with an
underscore are reserved for the interpreter) in place of the argument in the closure.
- (lambda (x) body) returns a closure in the natural way.
- (lambda (x y z ...) expr) returns a closure whose argument is x and
whose body is a lambda of (y z ...) to expr.
- (let ((f1 e1) (f2 e2) ...) body) is equivalent to the OCaml let f1 = e1 in
let f2 = e2 in ... in body. Evaluate the bindings from left to right and finally
evaluate the body
- (letrec ((f1 e1) (f2 e2) ...) body) is equivalent to the OCaml let rec f1 = e1
and f2 = e2 and ... in body. Note that all the f are in scope of each other
and themselves.
- (list x1 x2 x3 ...) evaluates all of the x to x' and returns the list
(x1' x2' x3' ...). Note that using list to build a list involves evaluating
each element, but quotation does not.
- (list? expr) returns true if the evaluation of expr is a list and false
if a single argument is passed, but it is not a list.
- (null? l) returns true if l is empty and false if exactly one nonempty argument
is passed.
- (quote expr) is equivalent to 'expr and returns expr unevaluated.
The following keywords are only allowed in the toplevel and are handled in repl.ml.
- (define f expr) binds the evaluation of expr to f in
the toplevel environment.
- (definerec f expr) does the same, but with the added complication of recursion.
Note that only one recursive binding is allowed at a time with definerec.
- (load path) loads the Scheme file at the specified path, parses it, and evaluates
its list of expressions in the toplevel.
Operator Handlers
Operators fall into two main categories: numeric operators and boolean operators.
We suggest handling each category all at once, using your knowledge of partial application
and higher order functions. We have provided a type checker that returns Int
if all arguments are Integers, Flt if the arguments are all Floats or mixed Integers
and Floats, and Fail in any other case. For all of the numeric operators described
below, the "weak typing" model is used where Integers are upgraded to Floats if one or
more Floats are present in the argument list. Also, be sure to evaluate all terms before
performing the appropriate reduction. For and, or, and not, use
the helper function to_bool to convert data of type lisp_expr to OCaml's boolean type.
Finally, note that we do not require boolean operations to be short-circuited, although
you may do this if you would like.
- (* n1 n2 n3 ...) returns the product of all the n.
- (+ n1 n2 n3 ...) returns the sum of all the n.
- (- n1 n2 n3 ...) returns n1 minus n2 minus n3
and so on.
- (/ n1 n2 n3 ...) returns n1 divided by n2 divided by n3
and so on.
- (< n1 n2 n3 ...) returns true if n1 < n2 and n2 < n3
and so on. You may use OCaml's ordering on lisp_expr here.
- (<= n1 n2 n3 ...) returns true if n1 <= n2 and n2 <= n3
and so on.
- (= n1 n2 n3 ...) returns true if n1 = n2 and n2 = n3
and so on. You may use OCaml's equality semantics on lisp_expr here.
- (> n1 n2 n3 ...) returns true if n1 > n2 and n2 > n3
and so on.
- (>= n1 n2 n3 ...) returns true if n1 >= n2 and n2 >= n3
and so on.
- (and p1 p2 p3 ...) returns true if all the predicates are true and false
if any one of them is false but the input is well-formed.
- (or p1 p2 p3 ...) returns true if any of its predicates are true and
false if none of them are but the input is well-formed.
- (not p) returns true if p is false and false if p is
true.
Your Tasks
All code submitted must adhere to the interfaces specified in the .mli
files and use the given helper functions in an intelligent manner. No imperative usage
of OCaml is allowed outside of interacting with environments.
types.ml
Complete the pretty_print function in the following manner.
- An atom prints to itself.
- A list yields a left parenthesis followed by all of its elements pretty printed,
with each element separated by a space, followed by a right parenthesis.
- A dotted list prints the same way as a list but has a dot between its second
to last and last elements.
- Integers and floats print to their OCaml equivalents (you may use string_of_int
and string_of_float).
- Strings print to their value, but with quotes surrounding them.
- Closures print to "<fun>". We have provided a useful alternative for debugging, but
your final version should simply return "<fun>".
- Undefs should not be passed to the pretty printer and should yield a foreboding
message such as "Yikes, Undef!"
eval.ml
Fill in the two functions eval and apply so that they have the
following properties.
- Primitives, Closures, and the empty list evaluate to themselves.
- Evaluating an Atom looks up that Atom's value in the current environment.
If it is found, it returns the value in the environment. If not, a runtime exception
is raised. Use the given function runtime to raise an exception with a
descriptive error message and the current expression being evaluated.
- Evaluating a list with an atom in the first position has three cases. If
the atom is an operator, the appropriate function in the operator dispatch table
is called. Similarly, if the atom is a keyword, the corresponding function in the
keyword dispatch table is called. Finally, if the atom is neither a keyword nor an
operator, apply the rest of the list as arguments to the first element.
- Our reference implementation has eval split into two functions,
eval and list_eval for clarity in handling the aforementioned cases, but
this is not strictly necessary.
- Any other form passed to eval is considered an application, as described in
the previous bullet.
- The function apply takes in a Closure as its first argument, a list of arguments,
and an environment. It evaluates each argument in the current environment and then
applies them in sequence to the Closure. If input without this structure is passed to
apply, a runtime exception is raised.
- Your implementation of apply should support partial application.
repl.ml
Complete the functions eval_one and load_file. The
eval_one function evaluates a given expression in an environment and returns
a new environment. This function should handle the special cases of define,
definerec, and load, which can only exist on the toplevel. Finally,
load_file reads code from a specified input path. Note that you may load a
file from within another source file, but the path is always relative to the interpreter's
working directory.
keywords.ml, operators.ml
Complete all functions in each file with respect to the specifications
given in the previous section. You may add in rec keywords. We have provided a
useful helper function desugar in the keywords file which
maps the following syntactic transformation over a list of bindings.
((f x y z) body) -> (f (lambda (x y z) body))
In the operators file, we have also provided check_type and to_flt for dealing
with Scheme's weak typing on numeric types. Additionally, to_bool is defined in the Types
module.
Part 2: Tweaking the Interpreter (15 points)
- Extend our interpreter by adding a display keyboard that prints its argument using the pretty
printer you wrote. It should take exactly one argument and return the empty list. Remember
that whenever you add a keyword, you have to add it to the relevant dispatch table.
- Currently the evaluator evaluates the arguments to a function as
they are applied, meaning that the list of arguments is evaluated from
left to right. Implement a new keyword right-apply in the
spirit of (but slightly different than) apply that evaluates
its arguments from right-to-left. More precisely,
(right-apply f (arg1 arg2 ... argn)) evaluates argn, ..., then arg2, and
finally arg1 and then applies each argument to f. You may validate your keyword's
evaluation order by using the display keyword you wrote previously. Note that this isn't
as easy as it looks: For a function f that prints and returns its argument, ensure that
(apply-right + '((f 0) (f 1) (f 2))) works as expected.
- The let keyword we wrote before is sequential in that each binding is in scope
in all the bindings following it. Write a parallel-let keyword that evaluates each
of its bindings such that none of them are in scope of each other.
- Lazy
evaluation is an alternative to the eager evaluation semantics we
implemented in our interpreter. We will simulate lazy evaluation
within our interpreter with two new keywords: delay
and force. Delay should suspend a computation, creating a
"thunk" that is not evaluated until it is passed into a force
expression. Forcing a thunk evaluates the delayed expression in the
environment in which it was delayed. Note that force should
not do anything on expressions that are not thunks; you may find using
a reserved variable (one whose identifier begins with an underscore)
helpful for this. Note that you are not allowed to change any of the
interpreter's types or fundamental structure: You may only add two new
handler functions and their entries in the dispatch table. We have
provided sample code demonstrating streams in "etc/lazy.scm" along
with a representative session below. As always, the ellipsis is not
actually the output from the interpreter. If (and only if) you're
feeling particularly adventurous, write a lazy mergesort such that
(take k (mergesort l)) can be performed for fixed k in time
linear in the length of the list.
$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands
scheme311> (load "etc/lazy.scm")
...
scheme311> (nth primes 10)
=> 31
scheme311> (take 10 fib-stream)
=> (1 1 2 3 5 8 13 21 34 55)
scheme311> (take 10 (drop 5 (sift 3 naturals)))
=> (8 10 11 13 14 16 17 19 20 22)
scheme311> (take 10 (merge (sift 5 primes) (sift 7 naturals)))
=> (1 2 2 3 3 4 5 6 7 8)
scheme311> :quit
Part 3: Fun with Scheme! (10 points)
Now that you have finished your Scheme interpreter, let's implement something that's particularly
well suited to Scheme's symbolic computation. In most languages, defining new syntactic constructs
requires the use of macros and other features not intrinsic to the language. Not so with Scheme!
Recall that let bindings can be reformulated in terms of lambda applications in a straightforward
manner when there are no data dependencies among the bindings. For example
the code let x = 3 in let y = 4 in x + y can be equivalently restated as (fun x y ->
x + y) 3 4. We will now together implement a Scheme function that will perform this
transformation on code fragments it takes as input.
Your job is to fill out the map, foldl, reverse, first,
second, and third list functions in etc/let-desugar.scm in the
natural way. We will use most of these functions in the provided code for the desugaring process.
The first function provided single-binding-desugar transforms an expression in the
form ((f x y ...) expr) to (f (lambda (x y ...) expr)) and leaves other
bindings alone. The second function, desugar-bindings maps this over a binding list,
doing the same thing as the provided OCaml function desugar, but within Scheme!
Finally, the function let-desugar performs the promised transformation.
Our let-desugar function first splits the code into two parts: the list of bindings
and the body of the let statement. It then desugars the bindings, using our previously defined function.
Next, it splits the binding list in half, forming a list of identifiers and a list of values.
It then constructs the lambda form as follows: (('lambda identifiers body) values).
Here's an example session.
$ ./scheme311
scheme311 interpreter version 1.0
type ":help" for a list of toplevel commands
scheme311> (load "etc/let-desugar.scm")
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> <fun>
=> (let ((x 3) (y 4) (z 5)) (+ x y z))
=> (let (((f x y) (+ x y)) ((g x y) ( * x y)) (a 1) (b 2) (c 3) (d 4)) (f (g a b) (g c d)))
scheme311> (let-desugar test-code-1)
=> ((lambda (x y z) (+ x y z)) 3 4 5)
scheme311> (let-desugar test-code-2)
=> ((lambda (f g a b c d) (f (g a b) (g c d))) (lambda (x y) (+ x y)) (lambda (x y) ( * x y)) 1 2 3 4)
scheme311> (let-desugar '(let ((f (cons 1))) (f 2)))
=> ((lambda (f) (f 2)) (cons 1))
scheme311> (eval (let-desugar test-code-1)) (eval (let-desugar test-code-2))
=> 12
=> 14
scheme311> (eq? (eval (let-desugar test-code-1)) (eval test-code-1))
=> #t
scheme311> (eq? (eval (let-desugar test-code-2)) (eval test-code-2))
=> #t
scheme311> :quit