A4: OCalf

# A4: OCalf **Soft deadline:** Thursday, 10/29/15, 11:59 pm **Hard deadline:** Saturday, 10/31/15, 11:59 pm *This assignment may be done as an individual or with one partner. Sharing of code is permitted only between you and your partner; sharing among larger groups is prohibited.* A baby camel is called a "calf." In this assignment you will implement an interpreter for OCalf, a functional language containing many of the core features of OCaml. ## Overview Here is a Backus-Naur form (BNF) grammar for the syntax of OCalf expressions, to give you a sense of what features are in the language: ``` e ::= (* expressions *) | () | i | b | s | x | e1 bop e2 | if e1 then e2 else e3 | let [rec] x = e1 in e2 | e1 e2 | fun x -> e | (e1,e2) | C e | match e with p1 -> e1 | ... | pn -> en bop ::= (* binary operators *) | + | - | * | > | < | = | >= | <= | <> | ^ p ::= (* patterns *) | () | i | b | s | x | C p | (p1,p2) i ::= integers b ::= booleans s ::= strings x ::= variable identifiers C ::= constructor identifiers ``` Some of the ways that OCalf differs from OCaml are the following: * All OCalf variants must be non-constant. * OCalf has pairs, but not \$k\$-tuples for \$k \geq 3\$. * OCalf does not have built-in (syntactic support for) lists. * OCalf does not have a module system. * OCalf has no imperative features. Some of the OCalf interpreter is already implemented for you. Your task is to implement evaluation and type inference, as well as to build a test suite that you will submit. You will need to understand the big-step environment model of OCaml to complete the interpreter. For karma, you can implement many extensions to the interpreter. ## Objectives * Know the design of an interpreter and REPL. * Implement a big-step environment model semantics, including closures. * Better understand OCaml itself by implementing many of its main features. * Comprehend the important building blocks of type inference, and implement one of them. * Formulate a test suite that can detect errors in implementations other than your own. ## Recommended reading * [Lectures and recitations 12–15][web] * [Recitation 10 (Testing)][web] * The [CS 3110 Style Guide][style] [web]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/ [style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html ## Requirements * You must implement the evaluation of OCalf expressions, as described below under Part 1. * You must submit a test suite for evaluation, as described below under Part 2. * You must implement the type inference and checking of OCalf expressions, as described below under Part 3. * We must be able to compile and run your interpreter as described below under "Compiling and running". * Your code must be written with good style and be well documented. * You must use Git (or another version control system) as part of your development process. ## What we provide In the release code you will find these files: * Many `.mli` and `.ml` files, which are described below under Part 0. * A template file `a4.txt` for submitting your written feedback. ## What to turn in Submit files with these names on [CMS][cms]: * `a4src.zip`, containing your solution code and your test suite. * `vclog.txt` containing your version control log. * `a4.txt`, containing your written feedback. [cms]: https://cms.csuglab.cornell.edu/ **To prepare `a4src.zip` for submission:** From the directory that contains `main.ml`, bundle all your source code and your test suite into a zip file with this command: ``` $ zip a4src.zip *.ml{,i,y,l} ``` Do not include any compiled bytecode files, otherwise your submission might become too big to upload. Double check that you got all your files with this command: ``` $ zipinfo -1 a4src.zip ``` **To prepare `vclog.txt` for submission:** Run the following command in the directory containing `main.ml`: ``` $ git log --stat > vclog.txt ``` ## Git You are required to use [Git][git] (or another version control system) whether you are working as an individual or with a partner. Throughout your development of A4, commit your changes to a repo. Use those checkins to provide checkpoints, in case you need to restore your development to a previous point. Synch with a remote repo to communicate code between you and your partner, or simply to backup your development if you are working as an individual. **Private repos are of the utmost importance. A public repo would share your code with the entire world, including your classmates, thus violating the course policy on academic integrity. Therefore we require that you keep all your CS 3110 related code in private repos.** [git]: https://git-scm.com/ ## Grading issues * **Compiling and running:** Your interpreter must compile according to the instructions given below in Part 0 under "Compiling and running". You may not change the names and types appearing in `.mli` files. Solutions that do not obey these stipulations will receive minimal credit. * **Code style:** Refer to the [CS 3110 style guide][style]. Ugly code that is functionally correct will nonetheless be penalized. Take extra time to think and find elegant solutions. * **Late submissions:** Carefully review the [course policies][syllabus] on submission and late assignments. Verify before the deadline on CMS that you have submitted the correct version. * **Environment:** Your solution must function properly in the [3110 virtual machine][vm], which is the official grading environment for the course. [style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html [syllabus]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/syllabus.php [vm]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/vm.html ## Permitted OCaml features Imperative data structures are now permitted. Side effects are not necessarily bad. But we urge you to think carefully before choosing to use imperative data structures. ## Part 0: Understand the OCalf codebase Your preliminary task is to familiarize yourself with the structure of the code we have shipped to you. We provide the following modules, comprising both `.ml` and `.mli` files, in the release code: * `Eval` and `Infer` have skeleton code for the functions that implement evaluation and type inference. There are some unimplemented helper functions in these files that the course staff found helpful in their own solution. You are free to implement them, change them, or remove them entirely. * `Ast` and `TypedAst` contain the type definitions used in evaluation and type inference, respectively. The `TypedAst` module also contains `annotate` and `strip` functions for converting between the two. * `Examples` and `Lambda` contain example OCalf expressions. `Examples` contains many small examples from this writeup. `Lambda` contains a large example. * `Printer` contains functions for printing out values of various types. * `Parse` contains functions to construct ASTs from strings. It relies on a lexer and parser implemented in `lexer.mll` and `parser.mly`. **Exercise:** Skim each of the `.mli` and `.ml` files in the release code, and plan to come back and read them in more detail later as necessary. **Compiling and running:** From the directory containing `main.ml`, run ``` $ cs3110 compile main.ml ``` to compile the interpreter. As the parser is compiled, it will produce a message `1 shift/reduce conflict`. This is expected behavior. To run the intepreter, we will use OCaml's own REPL. (For karma, you can build a REPL for OCalf.) So after compiling, load utop. You can use the OCalf interpreter by directly calling functions that implement it. For example, ``` "if true then 3110 else 0" |> parse_expr |> eval [] |> string_of_value ``` causes the string `"if true then 3110 else 0"` representing an OCalf expression to be parsed into an OCalf AST, evaluated to an OCalf value, then converted to an OCaml string suitable for printing. (But note that evaluation won't yet succeed with the release code, because `eval` is unimplemented.) Note that the reason functions like `eval` are available in utop is that we provide an `.ocamlinit` file in the release code.* When utop starts, it automatically `#use`s this file, which automatically `#load`s and `open`s many of the modules from the release code for your convenience in interactive testing. Whenever you find yourself typing the same thing into the REPL more than once, consider adding it to your `.ocamlinit`! \* Files whose names start with `.` are *hidden*; use `ls -A` to show a directory listing that includes hidden files. ## Part 1: Evaluation Implement the function ``` eval : Eval.environment -> Ast.expr -> Eval.value ``` in `eval.ml`. This function evaluates OCalf expressions in the big-step environment model semantics. For example, ``` eval [] (parse_expr "if false then 3 + 5 else 3 * 5"));; - : value = VInt 15 ``` (Note that we expose the representation of environments as association lists. Arguably this is poor design: all the client of an environment needs to know is that it is a dictionary, and the particular dictionary representation is irrelevant.) Whenever evaluation reaches a place where the semantics gets *stuck*, meaning that evaluation could not meaningfully proceed but the expression being evaluated is not yet a value, `eval` should produce the special value `VError`. For example, `3 + true` and `match 3 with | 1 -> false` should both evaluate to `VError`. Although you are free to tackle the implementation of `eval` in any way you see fit, you are strongly encouraged to follow the plan outlined below. Our test cases will proceed in the plan's order, so following the plan will maximize your chances of partial credit. **Step 0:** Remind yourself of the big-step semantics of OCaml, because you are essentially implementing that judgment. **Step 1: Implement eval without LetRec or Match.** Implement `eval` for <span style="color:green;">unit, integers, Booleans, strings,</span> `BinOp`, `If`, `Var`, `Fun`, `Pair`, `Variant`, `App`, and `Let`. **Step 2: Implement LetRec.** Implement `eval` for `LetRec`. This is tricky, because a binding is needed in the environment for the defined name before its definition can be evaluated. We'll solve this problem with a technique called *backpatching*. To evaluate `let rec f = e1 in e2` using backpatching, first evaluate `e1` to a value `v1` in an environment where `f` is bound to a *dummy value*, which may be any value at all. (If the value of `f` is used while evaluating `e1`, it is an error. This would occur, for example, if the programmer wrote `let rec x = 3 + x in ...`.) Then imperatively update the binding of `f` to `v1`. This "ties the knot," allowing `v1` to refer to itself. Finally evaluate `e2` in the environment where `f` is bound to `v1`. To support backpatching, the `environment` type contains `binding ref`'s instead of `binding`'s, thus enabling bindings to be mutated. **Step 3: Implement Match.** Implement `eval` for `Match`. You do not need to check whether pattern matching is exhaustive or whether there are unused match cases. Return `VError` if pattern matching fails at run time. ## Part 2: Test suite As part of developing `eval`, you naturally will be constructing test cases that demonstrate the (in)correctness of your implementation. Let's take that one step further. The course staff will develop many buggy implementations of `eval`. Your task is to construct a suite of test cases that finds all our bugs. Use `test_eval.ml` from the release code as a template. Add your test suite to the file. We will copy that file (and only that file) into each of our buggy interpreters, then run ``` $ cs3110 compile test_eval.ml $ cs3110 test test_eval.ml ``` We will examine the output to see whether your test suite correctly detects the bugs in our interpreters. ## Part 3: Type inference OCalf's type inference algorithm proceeds as follows: 1. Each node of the AST is *annotated* (aka *decorated*) with a unique type variable. 2. The AST is traversed to *collect* a set of equations between types that must hold for the expression to be well-typed. 3. Finally, those equations are solved to find an assignment of types to the original variables; this step is called *unification*. If the unification phase discovers that the system is unsatisfiable, then it is impossible to give a type to the expression, so a type error would be reported. If the system is underconstrained, then there will be some "leftover" variables after unification. The typechecker would give them user-friendly names like `'a` and `'b`. * * * **Example.** Consider `(fun x -> 3 + x)`. The annotation phase would add a new type variable to each subexpression: <p class="text-center"> <img src="annotating.png" alt="parsing" width="600"> </p> Next, the collection phase would collect equations from each node: * We know that `(fun x -> e) : t1 -> t2` if `e:t2 `under the assumption `x:t1`. Stated differently, `(fun (x:t1) -> e:t2) : t3` if and only if `t3 = t1 -> t2`. Therefore, the equation `'t05 = 't04 -> 't03` would be collected from `fun` node. * Similarly, the equations `'t03 = int`, `'t02 = int`, and `'t01 = int` would be collected from the `+` node. * The equation `'t02 = int` would be collected from the `3` node. * We know that if `x` had type `t` when it was bound then it has type `t` when it is used. Therefore, `'t01 = 't04` would be collected from the `x: 't01` node. Finally, the system of equations is solved in the unification phase, assigning `int` to `'t01` through `'t04`, and `int -> int` to `'t05`. * * * We provide the annotation and unification phases of type inference for you; your task is to implement the function ``` collect : variant_spec list -> annotated_expr -> equation list ``` in `infer.ml`. We strongly encourage you to follow this plan: **Step 1: Implement collect without Match or Variant.** Implement `collect` for all syntactic forms except `Variant` and `Match`. **Step 2: Partially implement collect for Match.** Implement `collect` for `Match`, but omit handling of `PVariant` patterns. Make sure the bindings from the patterns are available while checking the bodies of the cases. **Step 3: Implement collect for Variant.** Extend `collect` to handle variants. Deriving the correct constraints for variants is tricky. Consider this code: ``` type 'a list = Cons of 'a * 'a list | Nil of unit let x = Cons(1,Nil ()) in let y = Cons(true,Nil ()) in 42 ``` An overly-strict implementation of collection might report a type error, if collection generates constraints that force the separate occurrences of `Nil` and `Cons` in binding `x` and `y` to have the same types. A better implementation would generate constraints with distinct type variables, thus permitting the code above. ## Karma Implementing some of these karma problems might require changing the names or types that appear in the `.mil` files in the release code. Do so with care. Adding a new function to an interface file will not break our testing harness, but changing the name or type of a previously existing function could, especially `eval` and `collect`. Likewise, extending the AST types will not break our testing harness, but changing the name of an existing constructor or the data it carries would. If in doubt, please ask! <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **REPL:** Implement an OCalf REPL. Like utop, your REPL could provide various directives, syntax highlighting, tab completion, etc. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Pervasives:** In the same way that OCaml's `Pervasives` module is automatically opened, automatically populate the initial OCalf environment with useful functions. A good place to start would be functions for I/O (e.g., the `print_* and read_*` functions in OCaml's `Pervasives`). You will need to extend the OCalf notion of values to include functions that are provided by OCaml, rather than implemented in OCalf. In this, you could be inspired by OCaml's [external definitions][external]. Instead of externals naming C functions, though, OCalf externals could name OCaml functions. [external]: http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#external-declaration <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Lists:** Add syntactic support for lists to the OCalf lexer and parser, along with a built-in `list` type constructor. Implement a collection of functions similar to the OCaml `List` module in OCalf. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **let-polymorphism:** **Caution:** This karma problem is likely to cause you to reorganize your code. Make sure you make good use of source control, helper functions, and unit tests to ensure that you don't break your existing code while working on it! The type inference algorithm described above allows us to give expressions polymorphic types: `fun x -> x` will be given the type `'a -> 'a`. But it does not let us use them polymorphically. Consider the following expression: ``` let (any:t1) = (fun (x:t3) -> (x:t4)):t2 in (any 1, any "where") ``` OCaml will give this expression the type `int * string`, but our naive inference algorithm fails. The solution is *let-polymorphism*: Every time a variable defined by a `let` expression is used, we should create a new type variable corresponding to the use. We use the constraints generated while type checking the body of the `let` as a template for a new set of constraints on the new variable. In the above example, while type checking the body of the let expression, we will generate the constraints `t1 = t2`, `t2 = t3->t4`, and `t3 = t4`. While checking the expression `(any:t5) 1`, we can recreate these constraints with new type variables `t1'`, `t2'`, and so on. We will output the constraints `t1'=t2'`, `t2' = t3'->t4'` and `t3'=t4'`. We also will output the constraint `t5=t1'`. Because the type variables are cloned, they are free to vary independently. Because we have cloned their constraints, they must be used in a manner that is consistent with their definition. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> **Meta:** Implement an OCalf interpreter in OCalf. Start by defining an OCalf variant to represent OCalf ASTs. Then implement an `eval` function in OCalf to evaluate OCalf expressions. Since OCalf doesn't have imperative features, you won't be able to implement `let rec` with backpatching. Instead, you could use a technique called *recursive closures*, which is described in the [final two slides of a 3110 lecture from Spring 2015][rec-cl]. Note those slides use a different notation than we used this semester. [rec-cl]: http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/8/lec08.pdf