Problem Set 4: An OCaml Interpreter

Due Thursday, March 14, 23:59

Version: 6

Last Modified: Mon Mar 4 11:55:23 CST 2013

Changes:


Important notes:

  1. No partners: You must work on this assignment individually.
  2. Compile errors: All programs that you submit must compile. Programs that do not compile will receive an automatic zero. If you are having trouble getting your assignment to compile, please visit consulting hours. If you run out of time, it is better to comment out the parts that do not compile and hand in a file that compiles than hand in a more complete file that does not compile.
  3. Missing functions: We will be using an automatic grading script, so it is crucial that you name your functions and order their arguments according to the problem set instructions, and that you place the functions in the correct files. Otherwise you may not receive credit for a function properly written. Furthermore, please ensure that all functions we provide you initially are still present in your submission.
  4. Code style: Please pay attention to style. Refer to the CS 3110 style guide and lecture notes. Ugly code that is functionally correct may still lose points. Take the extra time to think through the problems and find the most elegant solutions before coding them up.
  5. Wrong versions: Please double-check that you are submitting the correct versions of your files to CMS. The excuse, "I had it all done, but I accidentally submitted the wrong files" will not be accepted.

Part 1: Side Effects (20 points)

From this point onwards, we will be allowing the use of side effects and other imperative features in OCaml. The most basic of these features is the ref, which you will be using in the main part of this assignment. The following few exercises are meant as a brief introduction to refs and side effects.
  1. The Fibonacci sequence is defined as a0 = 0, a1 = 1, and an = an-1 + an-2 for n > 1. We have written a function fib : int -> int that calculates the nth Fibonacci number in exponential time. Write the function fib_refs : int -> int that does the same thing, but in linear time, using refs. Do not use memoization (3 points)
  2. The function tabulate : (int -> 'a) -> int -> 'a list takes a function f and a length n, and returns a list of length n where the element at index i is the result of the application f i. For example, tabulate (fun x -> 2*x) 5 yields the list [0; 2; 4; 6; 8]. Write the function tabulate_refs : (int -> 'a) -> int -> 'a list that accomplishes the same thing, using refs. (3 points)
  3. Implement fold_left without using the rec keyword. Do not use library functions other that those included by default in Pervasives.ml; keywords should be sufficient. Karma will be awarded for solutions that do not use loops.(6 points)
  4. Implement merge_sort : ('a -> 'a -> int) ->'a list -> 'a list with and without refs. (merge_sort f xs uses the comparison function f to sort the elements of xs in ascending order.) Ask yourself, which solution is more efficient? Which is more parallelizable? (8 points)

What to submit

Submit your code in the file part1.ml. Make sure it matches the interface part1.mli.

Part 2: An OCaml Interpreter (80 points)

Overview

For this problem, you will implement an interpreter for a simplified version of the OCaml language. Evaluation follows the environment model presented in class very closely.

We have provided a lexical analyzer and parser that can take a string input, parse it as an OCaml program, and output an abstract syntax tree (AST) representing the code. The input program is converted to a tree of type Ast.expr, defined in the Ast module.

Your job will be to write functions for typechecking and evaluating the AST that is returned from the parser.

Our subset of OCaml

Values

Our subset has five types of values:
  1. Integers, which correspond to the type int
  2. Bools, which correspond to the type bool
  3. Closures, which represent functions and their environments
  4. Unit
  5. Lists

There is also a sixth type VUndef that is used internally for creating recursive definitions and is not available to the programmer. Take care to ensure that your interpreter never evaluates an expression to VUndef (raise an exception instead). The following code, taken from ast.ml, declares the type value. When an expression is evaluated successfully, the resulting value is an entity of this type.

type value =
  | VUndef
  | VBool of bool
  | VInt of int
  | VNil
  | VCons of value * value
  | VClosure of ((value ref) IdMap.t) * id * expr

Expressions

We will be interpreting a small subset of OCaml. Notably, this subset only supports anonymous functions. But we still have let bindings; if you wanted to create the identity function and bind it to say id, you could enter this into the read-eval-print loop:
? let id = fun x -> x
If you wanted to make a recursive function, you could do something like this:
? let rec fact = fun n -> if n = 0 then 1 else n * fact (n - 1)
Also notable: no unary arithmetic negation, no strings, and no tuples. Expressions have the following type: (from ast.ml)
(* expressions *)
type expr =
  | Constant of constant
  | Cons of expr * expr
  | IfThenElse of expr * expr * expr
  | Let of id * expr * expr
  | LetRec of id * expr * expr
  | Plus of expr * expr
  | Minus of expr * expr
  | Mult of expr * expr
  | Divide of expr * expr
  | Mod of expr * expr
  | Gt of expr * expr
  | Lt of expr * expr
  | Ltq of expr * expr
  | Gtq of expr * expr
  | Neq of expr * expr
  | Eq of expr * expr
  | Not of expr
  | And of expr * expr
  | Or of expr * expr
  | Fun of id * expr
  | App of expr * expr
  | Var of id
  | Match of expr * ((pattern * expr) list)
Overview of binary operators:
OpAST NameMeaning
+Ast.PlusInteger addition
-Ast.MinusInteger subtraction
*Ast.Mult Integer multiplication
/Ast.DivideInteger division
modAst.ModInteger modulus
=Ast.EqComparison (equality)
<>Ast.Neq Comparison (inequality)
<Ast.Lt Comparison (less than)
<=Ast.Ltq Comparison (less than or equal to)
>Ast.Gt Comparison (greater than)
>=Ast.Gtq Comparison (greater than or equal to)
&&Ast.AndLogical AND
||Ast.OrLogical OR
::Ast.ConsAttach an item to front of list

Overview of unary operations:
OpAST NameMeaning
notAst.NotLogical complement

Pattern Matching

The subset of OCaml also supports simple pattern matching which will be useful for working with lists. A pattern is either a constant, a variable or a cons of two patterns.
type pattern = 
  | PConstant of constant
  | PVar of id
  | PCons of pattern * pattern
This will allow us to deconstruct lists. If there is the possibility that a value does not match any of the patterns in a match expression, the pattern-matching is said to be inexhaustive. You do not need to perform any exhaustiveness checking or output examples of values which will not pattern match. You will still need to throw an exception if a value if found to not match any pattern.

Types

Expressions can have the following types:

(* type expressions *)
type typ =
  | TBool
  | TInt
  | TList of typ
  | TVar of id
  | Arrow of typ * typ

As you know, OCaml can infer the type of expressions. This is done through unification, which basically satisfies constraints on types through substitutions. (This problem set was based on the code at the bottom of that link, see acknowledgements). The first thing you need to do is implement the function collect to collect the constraints to be unified. Type inference works as follows:

Constraints

Expressions and their constraints:
OCamlConstraints
e1 :: e2 type(e2) = type(e1) list
if e1 then e2 else e3 type(e1) = bool, type(e2) = type(e3)
let [rec] x = e1 in e2 type(x) = type(e1)
e1 [+|-|/|*|mod] e2 type(e1) = type(e2) = int
e1 [<|<=|=|=>|>|<>] e2 type(e1) = type(e2)
not etype(e) = bool
e1 [&&|(||)] e2 type(e1) = type(e2) = bool
e1 e2type(e1) = type(e2) -> type(e1 e2)
match e with p_i -> e_i type(e) = type(p_i), type(e_i) = type(match e with p_i -> e_i) (There may be multiple p_i/e_i pairs of course)
p1 :: p2 type(p2) = type(p1) list (where p1 and p2 are patterns)

Unification

The heart of the unification algorithm is that to force a type variable to be equal to a type we can substitute all its occurrences for that type. The following is from ast.ml
type substitution = (id * typ) list
Each id * typ pair represents a substitution of an id for a typ. Composing multiple substitutions has been implemented already; you only need to complete the function unify_one. Handling type variables is the interesting part of this function. If neither of the types to be unified is a type variable then either they are equal or they aren't: if they're equal then carry on merrily, otherwise the program does not type check so throw an appropriate exception. One subtlety: be sure to perform an occurs check for comparing a type variable and a type that may contain that type variable. For instance a constraint like 'a = 'a -> 'b would not pass the occurs check and you should raise an appropriate exception. We have provided a function occurs which may be useful here.

Semantic Overview

You must complete the function eval which takes an expression and an environment and evaluates it to a value. Refer to the lecture notes here on the environment model for more information on how this function should work. One tricky part of writing this function is evaluating let rec f = e1 in e2. This is where VUndef comes in. You need a temporary value to add to the environment f while evaluating e1. Once you have evaluated e1, you have the desired value for f, so update f to the right value and evaluate e2. This trick is sometimes called Landin's knot. If e1 evaluates to VUndef, throw an appropriate exception (Try and find an expression that causes this to happen to test it!)

Your Tasks

We recommend starting by familiarizing yourself with ast.ml. You will likely refer to this file frequently. You must implement functions in the following files:
  1. eval.ml: (40 points) This module contains functions for evaluating expressions.
    1. Complete the function eval.
  2. infer.ml: (20 points) This module is used for inferring the type of an expression.
    1. Complete the function collect.
  3. unify.ml: (20 points) This module is used for solving type constraints.
    1. Complete the function unify_one.
You do not need to modify any other files.

Building Your Code

To build:
$ make
or
$ ./build.sh
or (Windows)
> build.bat
To clean:
$ make clean
or
$ ./clean.sh
or (Windows)
> clean.bat

Running your interpreter

To run:
$ ./interpreter.exe
or (Windows)
> interpreter.exe
However, we recommend using either of the following (for linux, supports history)
$ rlwrap ./interpreter.exe
or
$ ledit ./interpreter.exe
To load a file of definitions: (We recommend using rlwrap or ledit when loading a file as well.)
$ ./interpreter.exe <file>
or (Windows)
> interpreter.exe <file>
Exiting the interpreter:
? #quit

Debugging

A boolean flag debug is provided in ast.ml. Set it to true to get interesting feedback about what is going on during type inference. You are free to add additional debugging print statements, but please structure your code such that if this flag is set to false, only what we have provided in repl.ml prints anything. Use debug_print or write something similar. We have provided a library of functions for you to play with in lib.ml.

What to submit

For Linux or Mac OS, run the following, then submit the zip file produced. Your code must compile.
$ ./submit.sh
For windows, run clean.bat and then zip up what's left. Make sure your code compiles.

Acknowledgments

Many thanks to Nate Foster and Andrew Myers for providing starter code for type inference. That code may be found here, and you may refer to it while implementing your type inference.