A4: OCalf

# A4: OCalf **Deadline:** Wednesday, 11/02/16, 11:59 pm *This assignment may be done as an individual or with one partner. Sharing of code is permitted only between you and your partner; sharing among larger groups is prohibited. Be careful that you do not end up working as a three or four person team with the rest of your final project group: we have had AI cases in the past resulting from that behavior.* A baby camel is called a *calf*. In this assignment you will implement an interpreter for OCalf, a functional language containing many of the core features of OCaml. ## Overview OCalf is intended to be a sub-language of OCaml: unless otherwise specified, OCalf programs should behave like OCaml programs. Here is a Backus-Naur form (BNF) grammar for the syntax of OCalf expressions, to give you a sense of what features are in the language: ``` e ::= (* expressions *) | () | i | b | s | x | e1 bop e2 | if e1 then e2 else e3 | let [rec] x = e1 in e2 | e1 e2 | fun x -> e | (e1,e2) | C e | match e with p1 -> e1 | ... | pn -> en bop ::= (* binary operators *) | + | - | * | > | < | = | >= | <= | <> | ^ p ::= (* patterns *) | () | i | b | s | x | C p | (p1,p2) i ::= integers b ::= booleans s ::= strings x ::= variable identifiers C ::= constructor identifiers ``` Some of the ways that OCalf differs from OCaml are the following: * OCalf source-file integers are non-negative, and there is no unary integer negation. Negative integers can nonetheless be created, e.g. by writing `0-10` to produce negative ten. * OCalf variants must be non-constant. * OCalf has pairs, but not \$k\$-tuples for \$k \geq 3\$. * OCalf does not have built-in lists. * OCalf does not have a module system. * OCalf has no imperative features. Parsing and lexing has been implemented for you, and the AST has been designed for you. You will implement the rest of the interpreter, including evaluation and type inference. Evaluation will be implemented using the big-step environment model. ## Objectives * Know the design of an interpreter. * Implement a big-step environment model semantics, including closures. * Implement type inference. * Better understand OCaml itself by implementing many of its main features. ## Recommended reading * [Lectures and recitations 13–17][web] [web]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/ [style]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/style.html ## Requirements * You must implement the evaluation as described below in Part 2. * You must implement type inference and checking as described below in Part 3. * You must submit an [overview document][overview]. * You must use Git as part of your development process. [overview]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/overview.html ## What we provide See below under Part 1 for a description of the release code. ## What to turn in Submit files with these names on [CMS][cms]: * `a4src.zip`, containing your solution code. See below for how you must construct this file. * `gitlog.txt` containing your git log. * `overview.pdf` or `overview.txt`, containing your overview document. Although you should most certainly build a test suite (and we provide you with the start of one in `test.ml`), we will not be grading your test suite as part of this assignment. [cms]: https://cms.csuglab.cornell.edu/ **To prepare `a4src.zip` for submission:** From the directory that contains `main.ml`, run `make zip`. This will produce a file named `a4src.zip` for you automatically containing all your source code located in that directory. Double check that you got all your source files as follows: ``` $ zipinfo -1 a4src.zip ast.ml eval.ml infer.ml main.ml parse.ml printer.ml test.ml typedAst.ml ast.mli eval.mli infer.mli parse.mli printer.mli typedAst.mli parser.mly lexer.mll ``` If the output you receive is not **identical** to the above, something is wrong. For example, output that included the following would be wrong: ``` a4/main.ml ``` because the `.ml` file is located in a subdirectory rather than being at the top level directory inside the zip file. Submissions that do not meet the above criteria regarding the output of `zipinfo` **will be penalized.** As usual, make sure you are satisfied with the version of the file you upload in CMS **before** the deadline. **To prepare `gitlog.txt` for submission:** Run the following command in the directory containing `main.ml`: ``` $ git log --stat > gitlog.txt ``` ## Git **Private repos are of the utmost importance.** A public repo would share your code with the entire world, including your classmates, thus violating the course policy on academic integrity. Therefore we require that you keep all your CS 3110 related code in private repos. ## Grading issues * **Late submissions:** Carefully review the course policies on [submission and late assignments][late]. Verify before the deadline on CMS that you have submitted the correct version. * **Environment, names, and types:** Your solution must compile and run under OCaml 4.03.0. You are required to adhere to the names and types provided in the released `.mli` files. Your solution must compile against the provided public test cases in `test.ml` without any changes to those tests. Otherwise, your solution will receive minimal credit. * **Code style:** Refer to the [CS 3110 style guide][style]. Ugly code that is functionally correct will nonetheless be penalized. Take extra time to think and find elegant solutions. [late]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/syllabus.php#late [style]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/style.html ## Permitted OCaml features Imperative features are now permitted. Side effects are not necessarily bad! But we urge you to think carefully before choosing to use imperative features. Your solutions may not require linking any additional libraries/packages beyond OUnit. ## Part 0: Sanity check We are not shipping a `make check` with this assignment. Instead, we stipulate the following: * Your submission must compile and run with OCaml 4.03.0 and OUnit 2.0.0. * You may not change the provided `.mli` files. * Your solution must compile against the provided test cases in `test.ml` without any changes to those test cases whatsoever. By now, you should be able to take responsibility for adhering to these requirements yourself, without needing automated help. ## Part 1: Understand the OCalf codebase **Release code:** Your preliminary task is to familiarize yourself with the structure of the code we have shipped to you. We provide the following compilation units, comprising both `.ml` and `.mli` files, in the release code: * `Eval` and `Infer` have skeleton code for the functions that implement evaluation and type inference. There are some unimplemented helper functions in these files that the course staff found helpful in their own solution. You are free to implement them, change them, or remove them entirely. * `Ast` and `TypedAst` contain the type definitions used in evaluation and type inference, respectively. The `TypedAst` module also contains `annotate` and `strip` functions for converting between the two. * `Printer` contains functions for printing out values of various types. * `Parse` contains functions to construct ASTs from strings. It relies on a lexer and parser implemented in `lexer.mll` and `parser.mly`. Skim each of the `.mli` and `.ml` files in the release code, and plan to come back and read them in more detail later as necessary. **Compiling and running:** From the directory containing `main.ml`, run `make` to compile the interpreter. As the parser is compiled, it will produce a message `1 shift/reduce conflict`. This is expected behavior. Note that `make` compiles the interpreter but does compile or run your test suite; use `make test` for that. To run the intepreter, we will use OCaml's own REPL. (For karma, you can build a REPL for OCalf.) So after compiling, load utop. We provide a `.ocamlinit` file that makes the OCalf interpreter's functions available. For example, ``` "if true then 3110 else 0" |> parse_expr |> eval [] |> string_of_value ``` causes the string `"if true then 3110 else 0"` representing an OCalf expression to be parsed into an OCalf AST, evaluated to an OCalf value, then converted to an OCaml string suitable for printing. But note that evaluation won't yet succeed with the release code, because `eval` is unimplemented. **Testing:** From the directory containing `main.ml`, run `make test` to compile and run the OUnit test suite in `test.ml`. Use that file as a basis for creating your own test suite. ## Part 2: Evaluation Implement the function ``` eval : Eval.environment -> Ast.expr -> Eval.value ``` in `eval.ml`. This function evaluates OCalf expressions in the big-step environment model semantics. For example, ``` eval [] (parse_expr "if false then 3 + 5 else 3 * 5"));; - : value = VInt 15 ``` Note that we expose the representation of environments as association lists. Arguably this is poor design: all the client of an environment needs to know is that it is a dictionary, and the particular dictionary representation is irrelevant. **Errors.** Whenever evaluation reaches a place where the semantics gets *stuck*, meaning that evaluation could not meaningfully proceed but the expression being evaluated is not yet a value, `eval` should produce the value `VError`. This will most often occur when `eval` is asked to evaluate an expression that would not type check. For example, `3 + true` and `match 3 with | 1 -> false` should both evaluate to `VError`. Whenever a necessary evaluation of a subexpression evaluates to `VError`, the expression that it is part of also evaluates to `VError`. For example, `snd (3+true, 5)`, where `snd` is `fun p -> match p with (x,y) -> y`, evaluates to an error, because `3+true` evaluates to an error, and evaluation of a pair necessarily requires evaluating both components of the pair, even though `snd` will eventually discard the erroneous first component. But `if true then 3110 else (3+true)` would not evaluate to `VError`, because evaluation of the `else` branch is not necessary according to the semantics. The `VError` constructor carries a string, and the value of that string is unspecified: use it to produce a helpful error message. When the autograder tests your interpreter, it will ignore the string entirely. *Hint: when your evaluator detects an error, don't try to return it all the way up through the call stack of recursive calls to `eval`. Instead, make use of OCaml exceptions.* **Advice.** Although you are free to tackle the implementation of `eval` in any way you see fit, you are strongly encouraged to follow the plan outlined below. Our test cases will proceed in the plan's order, so following the plan will maximize your chances of partial credit. **Step 0:** Remind yourself of the big-step semantics of OCaml, because you are essentially implementing that judgment. **Step 1: Implement eval without LetRec or Match.** Implement `eval` for unit, integers, Booleans, strings, `BinOp`, `If`, `Var`, `Fun`, `Pair`, `Variant`, `App`, and `Let`. The behavior of the OCalf binary comparison operators is specified only for *primitive* types (`unit`, `int`, `bool`, `string`), and should be the same as the OCaml operators on those types. On non-primitive types (e.g., pairs and variants), the behavior of these operators is unspecified, so your implementation may do whatever you like. **Step 2: Implement LetRec.** Implement `eval` for `LetRec`. This is tricky, because a binding is needed in the environment for the defined name before its definition can be evaluated. We'll solve this problem with a technique called *backpatching*. To evaluate `let rec f = e1 in e2` using backpatching, first evaluate `e1` to a value `v1` in an environment where `f` is bound to a *dummy value*, which may be any value at all. (If the value of `f` is used while evaluating `e1`, it is an error. This would occur, for example, if the programmer wrote `let rec x = 3 + x in ...`.) Then imperatively update the binding of `f` to `v1`. This "ties the knot," allowing `v1` to refer to itself. Finally evaluate `e2` in the environment where `f` is bound to `v1`. To support backpatching, the `environment` type contains `binding ref`'s instead of `binding`'s, thus enabling bindings to be mutated. **Step 3: Implement Match.** Implement `eval` for `Match`. <span style="color:green"> [CLARIFICATION 10/28/16] Because at this point you are implementing only dynamic evaluation—not static checking—of match expressions, you do not need to implement any of the static checking that OCaml would normally do of match expressions. Normally OCaml would do some extended static checking for errors that are not just type errors (e.g., unused match cases, multiple bindings of a variable in a pattern, or inexhaustive match cases) then either emit a warning or a compile-time error. You are not implementing that kind of static checking for OCalf. But it is possible for evaluation of OCalf and OCaml match expressions to get stuck if no pattern matches the value; consider how `match 0 with 1 -> true` raises `Match_failure` in OCaml. In that scenario, OCalf evaluation ought to return `VError`, because that is the specified behavior of OCalf when evaluation gets stuck.</span> ## Part 3: Type inference OCalf's type inference algorithm proceeds as follows: 1. Each node of the AST is *annotated* (aka *decorated*) with a unique type variable. 2. The AST is traversed to *collect* a set of equations between types that must hold for the expression to be well-typed. 3. Finally, those equations are solved to find an assignment of types to the original variables; this step is called *unification*. If the unification phase discovers that the system is unsatisfiable, then it is impossible to give a type to the expression, so a type error would be reported. If the system is underconstrained, then there will be some "leftover" variables after unification. The typechecker would give them user-friendly names like `'a` and `'b`. * * * **Example.** Consider `(fun x -> 3 + x)`. The annotation phase would add a new type variable to each subexpression: <p class="text-center"> <img src="annotating.png" alt="parsing" width="600"> </p> Next, the collection phase would collect equations from each node: * We know that `(fun x -> e) : t1 -> t2` if `e:t2 `under the assumption `x:t1`. Stated differently, `(fun (x:t1) -> e:t2) : t3` if and only if `t3 = t1 -> t2`. Therefore, the equation `'t05 = 't04 -> 't03` would be collected from `fun` node. * Similarly, the equations `'t03 = int`, `'t02 = int`, and `'t01 = int` would be collected from the `+` node. * The equation `'t02 = int` would be collected from the `3` node. * We know that if `x` had type `t` when it was bound then it has type `t` when it is used. Therefore, `'t01 = 't04` would be collected from the `x: 't01` node. Finally, the system of equations is solved in the unification phase, assigning `int` to `'t01` through `'t04`, and `int -> int` to `'t05`. * * * We provide the annotation and unification phases of type inference for you; your task is to implement the function ``` collect : variant_spec list -> annotated_expr -> equation list ``` in `infer.ml`. If `collect` itself discovers a type error that means that the expression cannot be typed, regardless of the equations that are generated, then `collect` may raise `Failure`. **Advice:** We strongly encourage you to follow this plan: **Step 1: Implement collect without Match or Variant.** Implement `collect` for all syntactic forms except `Variant` and `Match`. **Step 2: Partially implement collect for Match.** Implement `collect` for `Match`, but omit handling of `PVariant` patterns. Make sure the bindings from the patterns are available while checking the bodies of the cases. **Step 3: Implement collect for Variant.** Extend `collect` to handle variants. Deriving the correct constraints for variants is tricky. Consider this code: ``` type 'a list = Cons of 'a * 'a list | Nil of unit let x = Cons(1,Nil ()) in let y = Cons(true,Nil ()) in 42 ``` An overly-strict implementation of collection might report a type error, if collection generates constraints that force the separate occurrences of `Nil` and `Cons` in binding `x` and `y` to have the same types. A better implementation would generate constraints with distinct type variables, thus permitting the code above. ## Karma Implementing some of these karma problems might require changing the names or types that appear in the `.mli` files in the release code. Do so with care. Adding a new function to an interface file will not break our testing harness, but changing the name or type of a previously existing function could, especially `eval` and `collect`. Likewise, extending the AST types with new constructors will not break our testing harness, but changing the name of an existing constructor or the data it carries would. If in doubt, please ask! <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **REPL:** Implement an OCalf REPL. Like utop, your REPL could provide various directives, syntax highlighting, tab completion, etc. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Pervasives:** In the same way that OCaml's `Pervasives` module is automatically opened, automatically populate the initial OCalf environment with useful functions. A good place to start would be functions for I/O (e.g., the `print_* and read_*` functions in OCaml's `Pervasives`). You will need to extend the OCalf notion of values to include functions that are provided by OCaml, rather than implemented in OCalf. In this, you could be inspired by OCaml's [external definitions][external]. Instead of externals naming C functions, though, OCalf externals could name OCaml functions. [external]: http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#external-declaration <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Lists:** Add syntactic support for lists to the OCalf lexer and parser, along with a built-in `list` type constructor. Implement a collection of functions similar to the OCaml `List` module in OCalf. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **let-polymorphism:** **Caution:** This karma problem is likely to cause you to reorganize your code. Make sure you make good use of source control, helper functions, and unit tests to ensure that you don't break your existing code while working on it! The type inference algorithm described above allows us to give expressions polymorphic types: `fun x -> x` will be given the type `'a -> 'a`. But it does not let us use them polymorphically. Consider the following expression: ``` let (any:t1) = (fun (x:t3) -> (x:t4)):t2 in (any 1, any "where") ``` OCaml will give this expression the type `int * string`, but our naive inference algorithm fails. The solution is *let-polymorphism*: Every time a variable defined by a `let` expression is used, we should create a new type variable corresponding to the use. We use the constraints generated while type checking the body of the `let` as a template for a new set of constraints on the new variable. In the above example, while type checking the body of the let expression, we will generate the constraints `t1 = t2`, `t2 = t3->t4`, and `t3 = t4`. While checking the expression `(any:t5) 1`, we can recreate these constraints with new type variables `t1'`, `t2'`, and so on. We will output the constraints `t1'=t2'`, `t2' = t3'->t4'` and `t3'=t4'`. We also will output the constraint `t5=t1'`. Because the type variables are cloned, they are free to vary independently. Because we have cloned their constraints, they must be used in a manner that is consistent with their definition. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> **Meta:** Implement an OCalf interpreter in OCalf. Start by defining an OCalf variant to represent OCalf ASTs. Then implement an `eval` function in OCalf to evaluate OCalf expressions. Since OCalf doesn't have imperative features, you won't be able to implement `let rec` with backpatching. Instead, you could use a technique called *recursive closures*, which is described in the [final two slides of a 3110 lecture from Spring 2015][rec-cl]. Note those slides use a different notation than we used this semester. [rec-cl]: http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/8/lec08.pdf