Problem Set 4: A Scheme Interpreter

Due Thursday 10/18/12 11:59pm

Version: 3

Last Modified: October 11, 2012, 5:47pm

Changes:

Clarified part 1: no memoization for fibonacci
Typo fix in Semantic Overview
Typo fix in cons example

Important notes:

No partners: You must work on this assignment individually.
Compile errors: All programs that you submit must compile. Programs that do not compile will receive an automatic zero. If you are having trouble getting your assignment to compile, please visit consulting hours. If you run out of time, it is better to comment out the parts that do not compile and hand in a file that compiles than hand in a more complete file that does not compile.
Missing functions: We will be using an automatic grading script, so it is crucial that you name your functions and order their arguments according to the problem set instructions, and that you place the functions in the correct files. Otherwise you may not receive credit for a function properly written. Furthermore, please ensure that all functions we provide you initially are still present in your submission.
Code style: Please pay attention to style. Refer to the CS 3110 style guide and lecture notes. Ugly code that is functionally correct may still lose points. Take the extra time to think through the problems and find the most elegant solutions before coding them up.
Wrong versions: Please double-check that you are submitting the correct versions of your files to CMS. The excuse, "I had it all done, but I accidentally submitted the wrong files" will not be accepted.

Part 1: Side Effects (15 points)

From this point onwards, we will be allowing the use of side effects and other imperative features in OCaml. The most basic of these features is the ref, which you will be using in the main part of this assignment. The following few exercises are meant as a brief introduction to refs and side effects.

The Fibonacci sequence is defined as a₀ = 0, a₁ = 1, and a_n = a_n-1 + a_n-2 for n > 1. We have written a function fib : int -> int that calculates the nth Fibonacci number in exponential time. Write the function fib_refs : int -> int that does the same thing, but in linear time, using refs. Do not use memoization (3 points)
The function tabulate : (int -> 'a) -> int -> 'a list takes a function f and a length n, and returns a list of length n where the element at index i is the result of the application f i. For example, tabulate (fun x -> 2*x) 5 yields the list [0; 2; 4; 6; 8]. Write the function tabulate_refs : (int -> 'a) -> int -> 'a list that accomplishes the same thing, using refs. (3 points)
As you may have found, the library function String.rev : string -> string does not exist. If it did, however, it would return the reversed version of the string it was passed. Write the function string_rev : string -> string that does this, without folding or recursion. Do not use String.iteri. You may, however, be interested in the library function List.iter. (4 points)
Suppose we have a list of how much of a commodity we bought each day over some time period (call each day's amount a purchase), and an initial bank balance. We also have an inflation problem, so much that the given inflation rate is daily rather than yearly. We want to calculate the final value of our purchases and remaining money, relative to inflation at the beginning of the period. When we process another purchase, we do three things:
1. Subtract the cost of the purchase from our bank account.
2. Add the cost of the purchase, divided by the cumulative inflation rate, to our holdings. The cumulative inflation rate is the inflation from all days so far. This is how much more the purchase costs now than it would have, had we bought it on day 1.
3. Multiply the cumulative inflation rate by 1 plus the daily rate. Note that the cumulative rate begins at 1. So if the daily inflation rate is 0.2, then after 2 days the cumulative rate would be 1*1.2*1.2 = 1.44.
What we will then return is the amount of money in the account, divided by the final cumulative inflation rate (representing the fact that the buying power of our money has decreased), and the value of our holdings, not divided by anything, tupled in that order. Write a function inflate : float list -> float -> float -> float * float such that inflate lst money rate processes all the purchases in lst in this fashion, given initial balance money and inflation rate rate. Do not use folds or recursion. We have provided you a purely functional version, inflate_fn, for you to check your results against. (5 points)

Disclaimer: CS 3110 is not a substitute for an economics degree.

What to submit

Submit your code in the file part1.ml.

Part 2: A Scheme Interpreter (65 points)

Overview

For this problem, you will implement an interpreter for a simplified version of the Scheme language. Scheme is a functional language whose semantics are very much like OCaml. Evaluation follows the environment model presented in class very closely.

We have provided a lexical analyzer and parser that can take a string input, parse it as a Scheme program, and output an abstract syntax tree (AST) representing the code. The input program is converted to a tree of type Ast.expr, defined in the Ast module. Unlike OCaml, there is not a lot of static typechecking done, so you will find the syntax of Scheme quite a bit less restrictive than OCaml (not necessarily a good thing).

Your job will be to write functions for evaluating the AST that is returned from the parser. You will also have to check for various runtime type errors (such as applying an if-then-else to a non-Boolean), and to raise a runtime type error if the types of the expressions do not make sense in a given context.

A few Scheme programs are provided in the file etc/input.txt. Here is a sample run with our solution code.

zardoz > (load "etc/input.txt")

>> 
>> 
>> 
>> 
>> 
zardoz > (fact 4)

>> 24
zardoz > (length (list 0 0 0))

>> 3
zardoz > (rev (list 1 2 3 4))

>> (4 3 2 1)
zardoz > (cons 1 2)

>> (1 . 2)
zardoz > (fib 10)

>> (55 34 21 13 8 5 3 2 1 1 0)
zardoz > ((lambda (x y) (- x y)) 3110 2110)

>> 1000
zardoz >

The Scheme Language

Values

The language has six types of values:

Integers, which correspond to the OCaml type int
Strings, which correspond to the OCaml type string
Bools, which correspond to the OCaml type bool
Closures, which represent functions and their environments
cons pairs
nil.

There is also a seventh type Undef that is used internally for creating recursive definitions and is not available to the Scheme programmer. The following code, taken from heap.ml, declares the type value. When an expression is evaluated successfully, the resulting value is an entity of this type.

type value = Int of int | Str of string | Bool of bool
           | Closure of expr * env
           | Cons of value * value | Nil
           | Undef

Expressions

The lexical analyzer and parser convert an input program formatted in plain text to an AST, an entity of type Ast.expr, which can then be evaluated. The parser recognizes the following patterns:

Integers (Ast.Int_e of int) consist of sequences of one or more numeric characters. In our version of Scheme, this excludes negative integers, so that 42 and 007 are considered valid integers, whereas -1 is not.
Strings (Ast.Str_e of string) consist of sequences of zero or more characters enclosed in double quotes, e.g. "Hello". Certain strings can be escaped by the backslash character:

Character Escaped

\ \\

" \"

<newline> \n

<tab> \t
Bools (Ast.Bool_e of bool) are specified by the strings #t for true or #f for false
Identifiers (ids) (Ast.Id_e of Ast.id) consist of a single lower- or upper-case letter followed by zero or more lower- or upper-case letters, numeric digits, underscores, and single-quote characters.

Character	Escaped
\	\\
"	\"
<newline>	\n
<tab>	\t

In the following structures, the parentheses must be present:

(lambda x e) represents a function that takes a single value v as its input and evaluates e with x bound to v. This is represented as the type
```
Ast.Fun_e of Ast.id list * Ast.expr
```
(lambda (x1 x2 x3 ... xn) e) represents a function that takes n values v1 ... vn as its input and evaluates e with xi bound to vi for all i. This is represented as the type
```
Ast.Fun_e of Ast.id list * Ast.expr
```
(define x e) evaluates the expression e and binds it to the id x at the top level. It is represented in the AST as
```
Ast.Def_e of Ast.id list * Ast.expr
```
There is also a definerec version for creating recursive functions.
(define (f x1 ... xn) e) defines a function that takes n values (v1 ... vn) as its input and evaluates the expression e. The evaluation function should return a closure containing the AST representation of the function and the current environment. This is then bound to f at the top level. It is represented in the AST as
```
Ast.Def_e of Ast.id list * Ast.expr
```
There is also a definerec version for creating recursive functions.
(if e0 e1 e2) first evaluates e0. If the result is not a boolean, then the interpreter should raise a runtime exception. If the result is true, it should evaluate and return the result of e1. If the result is false, it should evaluate and return the result of e2. It is represented in the AST as
```
Ast.If_e of Ast.expr * Ast.expr * Ast.expr
```
.

(BO e1 e2) represents one of the following binary operations:

BO	AST Name	Meaning
+	`Ast.Plus`	Integer addition
-	`Ast.Minus`	Integer subtraction
*	`Ast.Mul`	Integer multiplication
/	`Ast.Div`	Integer division
%	`Ast.Mod`	Integer modulus
=	`Ast.Eq`	Comparison (equality)
!=	`Ast.Neq`	Comparison (inequality)
<	`Ast.Lt`	Comparison (less than)
<=	`Ast.Leq`	Comparison (less than or equal to)
>	`Ast.Gt`	Comparison (greater than)
>=	`Ast.Geq`	Comparison (greater than or equal to)
&	`Ast.And`	Logical AND
\|	`Ast.Or`	Logical OR

A binary operation is represented by the following AST type:

Ast.Binop_e of op * expr * expr

(UO e1) represents one of the following unary operations:

UO	AST Name	Meaning
-	`Ast.Neg`	Negate (subtract from 0)
~	`Ast.Not`	Logical complement
car	`Ast.Car`	Take the first element of a `cons` pair
cdr	`Ast.Cdr`	Take the second element of a `cons` pair
null	`Ast.Null`	Return true if the argument is `nil`, otherwise return false
load	`Ast.Load`	Read a file (specified by a string-valued argument) from disk

A unary operation is represented by the following AST type:

Ast.Unop_e of op * expr

The minus sign is the only operator that can be used as both a unary and a binary operator.

Lists

The binary operator cons is used to create a pair. The expression (cons e1 e2), when evaluated, should recursively evaluate e1 and e2, then create a value of the form

Cons (v1, v2)

from the resulting values. There are corresponding projections car and cdr that extract the first and second components of a pair, respectively; thus

(car (cons 1 2)) = 1
(cdr (cons 1 2)) = 2

The names car and cdr are historical; they stand for contents of the address register and contents of the decrement register, respectively, and refer to hardware components of the IBM 704 computer on which LISP was originally implemented in the late 1950s.

The special entity nil is used with cons to form lists. Lists in Scheme are sequences of objects combined using cons and terminated by nil. For example, the list

(cons 1 (cons 2 (cons 3 nil)))

is the Scheme equivalent of the OCaml list [1;2;3]. The object nil serves as the empty list, and is analogous to the OCaml []. The AST representation of nil is Ast.Nil_e.

There is a shorthand way to create lists in Scheme. Instead of typing

(cons 1 (cons 2 (cons 3 nil)))

one can type

(list 1 2 3)

and the resulting value is the same list. The output form of a list, i.e. the string that should be printed by your interpreter when you type either of the above expressions at the prompt, is (1 2 3).

The display form of the value (cons 1 2) is (1 . 2). Thus if you type (cons 1 2) at your Scheme interpreter, it should respond with (1 . 2). Note, however, that if the second component is nil instead of 2, it should respond with (1), because in that case it is a list of one element 1.

More generally, if you have any sequence of objects combined with cons, but not terminated with nil, then the output form puts a dot before the last element. Thus

(cons 1 (cons 2 3)) = (1 2 . 3)
(cons 1 (cons 2 nil)) = (1 2)

and

(cons (cons 1 2) (cons 3 4)) = ((1 . 2) 3 . 4)
(cons (cons 1 2) (cons 3 nil)) = ((1 . 2) 3)

Semantic Overview

Operations in Scheme are all represented in prefix notation and must be enclosed in parentheses. For instance, to add 5 and 7, we would type (+ 5 7). There is no notion of precedence and all compound expressions must be enclosed in parentheses. For instance, the OCaml expression 5 + (2 * 7) would be written (+ 5 (* 2 7)) in Scheme.

Comparisons may be performed on Int, String, and Bool types, while arithmetic operators are valid only on Ints, and boolean operations are valid only on Bools.

Variables can be bound to values in several ways. Top-level global variables are bound using the define keyword. Local variables are bound using the let keyword. The expression (let x 5 (* 2 x)) is equivalent to the OCaml let x = 5 in 2*x.

Finally, anonymous functions are defined using the lambda keyword. These can be bound to variables using define or let. For example, (lambda x (* 2 x)) in Scheme is equivalent to fun x -> 2*x in OCaml, and (let f (lambda x (* 2 x)) (f 4)) in Scheme is equivalent to let f = fun x -> 2*x in (f 4) in OCaml.

Furthermore, anonymous functions of several variables can also be defined with lambda. For example,
(let f (lambda (x y z) (+ (+ x y) z)) (f 1 2 3))
is equivalent to the OCaml
let f = fun x y z -> x + y + z in f 1 2 3.
If we would like to use define to bind functions of one or more variables at the top level, we can write
(define (f x y) (+ x y))
or
(define f (lambda (x y) (+ x y))) .

If the function is recursive, use letrec and definerec instead. These behave identically to let and define, respectively, except that before the second argument is evaluated, the function name is included in the environment bound to a dummy value (Undef); then after the second argument is evaluated, the binding is updated to that value. This means that the function name may occur in the function body so that the function may call itself recursively.

An if expression evaluates its first argument, checks that it is a boolean, and if so, evaluates the second or third argument according as the value of the first argument is true or false, respectively. The values of second and third arguments need not be of the same type.

When applied to a cons pair, car returns the first component, while cdr returns the second. You can test whether a list is empty using the null operator.

The load operator can be used to load and interpret the contents of a file. The operators load, define, and definerec may be used only at the top level.

Simulating Lazy Evaluation

The interpreter we implemented exhibits an eager evaluation strategy by default. That is to say, expressions are thoroughly evaluated when they are bound to a variable, such as by a let statement, or by being passed as an argument to a function. Lazy evaluation is an alternative to this strategy. Under lazy evaluation, expressions are not evaluated until it is absolutely necessary to do so. We will simulate lazy evaluation through two additional keywords: delay and force.

Applying delay to an expression creates a thunk, which can be thought of as a closure that takes no parameters but still preserves its environment at the time of the delay. The thunk represents a suspended computation that will not proceed until forced.

Applying force to a delayed expression causes the expression to be evaluated in the environment from when it was delayed. If the expression being forced was not delayed, then the expression is just evaluated normally.

With lazy evaluation, we can create a stream, which is basically a lazy list. The only things that are stored at any one time are the head of the list, and the delayed expression that would generate the remainder of the stream. Here is a sample run with our code.

zardoz > (load "etc/lazy.txt")

[...]

zardoz > (take 10 naturals)

>> (0 1 2 3 4 5 6 7 8 9)
zardoz > (take 10 (drop 50 naturals))

>> (50 51 52 53 54 55 56 57 58 59)
zardoz > (define st (from_list (list 4 86 6 9 3 0)))

>> (4 . )
zardoz > (stream_hd st)

>> 4
zardoz > (to_list (stream_tl st))

>> (86 6 9 3 0)
zardoz > (nth st 4)

>> 3

Your Tasks

All code you submit must adhere to the specifications defined in the respective mli files. You must implement functions in the following files:

heap.ml: This module contains data types for scheme values, as well as types for keeping track of variable bindings.
1. Complete the function value_to_string.
  - For Str values, simply output the appropriate string, and for Int and Bool values, simply use OCaml's string casting methods.
  - Closures should output as and nil as ().
  - For cons cells, the convention in Scheme is to output cons values in the form (x . y) for a single pair of values, or for nested Cons of the form (Cons x (Cons y (Cons z t))) output the value as (x y z . t), placing a dot only before the final element. Note that if the Cons is nested differently, say as (Cons (Cons x y) (Cons z t)), then we would represent this as ((x . y) z . t), since the first element of the Cons is now itself the value (Cons x y) rather than simply x.
  - Lists are represented as (x y z t), without any dot. This format is used only for displaying values in the output, and will not actually work as valid Scheme expressions.
  - Finally, Undef should print to a foreboding message, because they should never appear in the top level.
2. Complete the functions lookup, update, and bind, according to their specifications in the interface.
eval.ml: This module is responsible for evaluating Scheme code that has been converted to an AST.
1. Complete the function eval. This should take in an Ast.expr and a Heap.env and return a Heap.value, which is the result of the expression, evaluated in the environment. If type errors occur, raise a runtime exception.
2. Complete the function apply. This takes in two Heap.value items, the first being a closure and the second being the argument, and returns another Heap.value, the result of the function applied to the argument in the closure's environment. If the first value is not a closure containing a function, raise a runtime exception.
Keep in mind that functions in Scheme are curried. How you handle this is up to you. However, your interpreter must support partial function applications.
repl.ml: This module contains functions for parsing input code and starting evaluation.
1. Complete the function eval_one. This should handle any top level operations (declarations and loads) that can't exist elsewhere in a program. Any declarations made in a loaded file should be included in the scope of the remainder of the calling program.

You may add function definitions to any mli files. However, you should be sure not to change any existing definitions or types.

Building Your Code

We have provided build scripts that you can use to build your project. This will output a file interpreter.exe which you can execute to launch the command line interface.

What to submit

Submit a zip file ps4.zip containing the following files: eval.ml, repl.ml, heap.ml, ast.ml, util.ml, eval.mli, repl.mli, heap.mli, ast.mli, and util.mli.

Part 3: Fun with Scheme! (20 points)

Once you have completed your Scheme interpreter, why not write some useful code to test it out? Complete the following functions to gain valuable practice in coding in Scheme, and also some points.

map and fold: Wish you had your favorite list traversal functions available in Scheme? Why not code them yourself? Write functions fold_left, fold_right, and map that behave the same as their OCaml counterparts. (3 points each = 9 total)
append: As the @ sign in OCaml does, this function should take two lists and return the second one appended to the first one, preserving order. (2 points)
flatten: As in OCaml, flatten should take in a list of lists and return a single list that is the concatenation of all the input lists. Ordering should be preserved. (2 points)
quicksort: Implement quicksort such that (quicksort l) will return a list with the same elements as l, but sorted in ascending order. (7 points)

What to submit

A file scheme.scm containing your implementations.