# A4: OCalf **Soft deadline:** Thursday, 10/29/15, 11:59 pm **Hard deadline:** Saturday, 10/31/15, 11:59 pm *This assignment may be done as an individual or with one partner. Sharing of code is permitted only between you and your partner; sharing among larger groups is prohibited.* A baby camel is called a "calf." In this assignment you will implement an interpreter for OCalf, a functional language containing many of the core features of OCaml. ## Overview Here is a Backus-Naur form (BNF) grammar for the syntax of OCalf expressions, to give you a sense of what features are in the language:  e ::= (* expressions *) | () | i | b | s | x | e1 bop e2 | if e1 then e2 else e3 | let [rec] x = e1 in e2 | e1 e2 | fun x -> e | (e1,e2) | C e | match e with p1 -> e1 | ... | pn -> en bop ::= (* binary operators *) | + | - | * | > | < | = | >= | <= | <> | ^ p ::= (* patterns *) | () | i | b | s | x | C p | (p1,p2) i ::= integers b ::= booleans s ::= strings x ::= variable identifiers C ::= constructor identifiers  Some of the ways that OCalf differs from OCaml are the following: * All OCalf variants must be non-constant. * OCalf has pairs, but not \$$k\$$-tuples for \$$k \geq 3\$$. * OCalf does not have built-in (syntactic support for) lists. * OCalf does not have a module system. * OCalf has no imperative features. Some of the OCalf interpreter is already implemented for you. Your task is to implement evaluation and type inference, as well as to build a test suite that you will submit. You will need to understand the big-step environment model of OCaml to complete the interpreter. For karma, you can implement many extensions to the interpreter. ## Objectives * Know the design of an interpreter and REPL. * Implement a big-step environment model semantics, including closures. * Better understand OCaml itself by implementing many of its main features. * Comprehend the important building blocks of type inference, and implement one of them. * Formulate a test suite that can detect errors in implementations other than your own. ## Recommended reading * [Lectures and recitations 12&ndash;15][web] * [Recitation 10 (Testing)][web] * The [CS 3110 Style Guide][style] [web]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/ [style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html ## Requirements * You must implement the evaluation of OCalf expressions, as described below under Part 1. * You must submit a test suite for evaluation, as described below under Part 2. * You must implement the type inference and checking of OCalf expressions, as described below under Part 3. * We must be able to compile and run your interpreter as described below under "Compiling and running". * Your code must be written with good style and be well documented. * You must use Git (or another version control system) as part of your development process. ## What we provide In the release code you will find these files: * Many .mli and .ml files, which are described below under Part 0. * A template file a4.txt for submitting your written feedback. ## What to turn in Submit files with these names on [CMS][cms]: * a4src.zip, containing your solution code and your test suite. * vclog.txt containing your version control log. * a4.txt, containing your written feedback. [cms]: https://cms.csuglab.cornell.edu/ **To prepare a4src.zip for submission:** From the directory that contains main.ml, bundle all your source code and your test suite into a zip file with this command:  $zip a4src.zip *.ml{,i,y,l}  Do not include any compiled bytecode files, otherwise your submission might become too big to upload. Double check that you got all your files with this command: $ zipinfo -1 a4src.zip  **To prepare vclog.txt for submission:** Run the following command in the directory containing main.ml:  $git log --stat > vclog.txt  ## Git You are required to use [Git][git] (or another version control system) whether you are working as an individual or with a partner. Throughout your development of A4, commit your changes to a repo. Use those checkins to provide checkpoints, in case you need to restore your development to a previous point. Synch with a remote repo to communicate code between you and your partner, or simply to backup your development if you are working as an individual. **Private repos are of the utmost importance. A public repo would share your code with the entire world, including your classmates, thus violating the course policy on academic integrity. Therefore we require that you keep all your CS 3110 related code in private repos.** [git]: https://git-scm.com/ ## Grading issues * **Compiling and running:** Your interpreter must compile according to the instructions given below in Part 0 under "Compiling and running". You may not change the names and types appearing in .mli files. Solutions that do not obey these stipulations will receive minimal credit. * **Code style:** Refer to the [CS 3110 style guide][style]. Ugly code that is functionally correct will nonetheless be penalized. Take extra time to think and find elegant solutions. * **Late submissions:** Carefully review the [course policies][syllabus] on submission and late assignments. Verify before the deadline on CMS that you have submitted the correct version. * **Environment:** Your solution must function properly in the [3110 virtual machine][vm], which is the official grading environment for the course. [style]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/handouts/style.html [syllabus]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/syllabus.php [vm]: http://www.cs.cornell.edu/Courses/cs3110/2015fa/vm.html ## Permitted OCaml features Imperative data structures are now permitted. Side effects are not necessarily bad. But we urge you to think carefully before choosing to use imperative data structures. ## Part 0: Understand the OCalf codebase Your preliminary task is to familiarize yourself with the structure of the code we have shipped to you. We provide the following modules, comprising both .ml and .mli files, in the release code: * Eval and Infer have skeleton code for the functions that implement evaluation and type inference. There are some unimplemented helper functions in these files that the course staff found helpful in their own solution. You are free to implement them, change them, or remove them entirely. * Ast and TypedAst contain the type definitions used in evaluation and type inference, respectively. The TypedAst module also contains annotate and strip functions for converting between the two. * Examples and Lambda contain example OCalf expressions. Examples contains many small examples from this writeup. Lambda contains a large example. * Printer contains functions for printing out values of various types. * Parse contains functions to construct ASTs from strings. It relies on a lexer and parser implemented in lexer.mll and parser.mly. **Exercise:** Skim each of the .mli and .ml files in the release code, and plan to come back and read them in more detail later as necessary. **Compiling and running:** From the directory containing main.ml, run $ cs3110 compile main.ml  to compile the interpreter. As the parser is compiled, it will produce a message 1 shift/reduce conflict. This is expected behavior. To run the intepreter, we will use OCaml's own REPL. (For karma, you can build a REPL for OCalf.) So after compiling, load utop. You can use the OCalf interpreter by directly calling functions that implement it. For example,  "if true then 3110 else 0" |> parse_expr |> eval [] |> string_of_value  causes the string "if true then 3110 else 0" representing an OCalf expression to be parsed into an OCalf AST, evaluated to an OCalf value, then converted to an OCaml string suitable for printing. (But note that evaluation won't yet succeed with the release code, because eval is unimplemented.) Note that the reason functions like eval are available in utop is that we provide an .ocamlinit file in the release code.* When utop starts, it automatically #uses this file, which automatically #loads and opens many of the modules from the release code for your convenience in interactive testing. Whenever you find yourself typing the same thing into the REPL more than once, consider adding it to your .ocamlinit! \* Files whose names start with . are *hidden*; use ls -A to show a directory listing that includes hidden files. ## Part 1: Evaluation Implement the function  eval : Eval.environment -> Ast.expr -> Eval.value  in eval.ml. This function evaluates OCalf expressions in the big-step environment model semantics. For example,  eval [] (parse_expr "if false then 3 + 5 else 3 * 5"));; - : value = VInt 15  (Note that we expose the representation of environments as association lists. Arguably this is poor design: all the client of an environment needs to know is that it is a dictionary, and the particular dictionary representation is irrelevant.) Whenever evaluation reaches a place where the semantics gets *stuck*, meaning that evaluation could not meaningfully proceed but the expression being evaluated is not yet a value, eval should produce the special value VError. For example, 3 + true and match 3 with | 1 -> false should both evaluate to VError. Although you are free to tackle the implementation of eval in any way you see fit, you are strongly encouraged to follow the plan outlined below. Our test cases will proceed in the plan's order, so following the plan will maximize your chances of partial credit. **Step 0:** Remind yourself of the big-step semantics of OCaml, because you are essentially implementing that judgment. **Step 1: Implement eval without LetRec or Match.** Implement eval for <span style="color:green;">unit, integers, Booleans, strings,</span> BinOp, If, Var, Fun, Pair, Variant, App, and Let. **Step 2: Implement LetRec.** Implement eval for LetRec. This is tricky, because a binding is needed in the environment for the defined name before its definition can be evaluated. We'll solve this problem with a technique called *backpatching*. To evaluate let rec f = e1 in e2 using backpatching, first evaluate e1 to a value v1 in an environment where f is bound to a *dummy value*, which may be any value at all. (If the value of f is used while evaluating e1, it is an error. This would occur, for example, if the programmer wrote let rec x = 3 + x in ....) Then imperatively update the binding of f to v1. This "ties the knot," allowing v1 to refer to itself. Finally evaluate e2 in the environment where f is bound to v1. To support backpatching, the environment type contains binding ref's instead of binding's, thus enabling bindings to be mutated. **Step 3: Implement Match.** Implement eval for Match. You do not need to check whether pattern matching is exhaustive or whether there are unused match cases. Return VError if pattern matching fails at run time. ## Part 2: Test suite As part of developing eval, you naturally will be constructing test cases that demonstrate the (in)correctness of your implementation. Let's take that one step further. The course staff will develop many buggy implementations of eval. Your task is to construct a suite of test cases that finds all our bugs. Use test_eval.ml from the release code as a template. Add your test suite to the file. We will copy that file (and only that file) into each of our buggy interpreters, then run  $cs3110 compile test_eval.ml$ cs3110 test test_eval.ml  We will examine the output to see whether your test suite correctly detects the bugs in our interpreters. ## Part 3: Type inference OCalf's type inference algorithm proceeds as follows: 1. Each node of the AST is *annotated* (aka *decorated*) with a unique type variable. 2. The AST is traversed to *collect* a set of equations between types that must hold for the expression to be well-typed. 3. Finally, those equations are solved to find an assignment of types to the original variables; this step is called *unification*. If the unification phase discovers that the system is unsatisfiable, then it is impossible to give a type to the expression, so a type error would be reported. If the system is underconstrained, then there will be some "leftover" variables after unification. The typechecker would give them user-friendly names like 'a and 'b. * * * **Example.** Consider (fun x -> 3 + x). The annotation phase would add a new type variable to each subexpression: <p class="text-center"> <img src="annotating.png" alt="parsing" width="600"> </p> Next, the collection phase would collect equations from each node: * We know that (fun x -> e) : t1 -> t2 if e:t2 under the assumption x:t1. Stated differently, (fun (x:t1) -> e:t2) : t3 if and only if t3 = t1 -> t2. Therefore, the equation 't05 = 't04 -> 't03 would be collected from fun node. * Similarly, the equations 't03 = int, 't02 = int, and 't01 = int would be collected from the + node. * The equation 't02 = int would be collected from the 3 node. * We know that if x had type t when it was bound then it has type t when it is used. Therefore, 't01 = 't04 would be collected from the x: 't01 node. Finally, the system of equations is solved in the unification phase, assigning int to 't01 through 't04, and int -> int to 't05. * * * We provide the annotation and unification phases of type inference for you; your task is to implement the function  collect : variant_spec list -> annotated_expr -> equation list  in infer.ml. We strongly encourage you to follow this plan: **Step 1: Implement collect without Match or Variant.** Implement collect for all syntactic forms except Variant and Match. **Step 2: Partially implement collect for Match.** Implement collect for Match, but omit handling of PVariant patterns. Make sure the bindings from the patterns are available while checking the bodies of the cases. **Step 3: Implement collect for Variant.** Extend collect to handle variants. Deriving the correct constraints for variants is tricky. Consider this code:  type 'a list = Cons of 'a * 'a list | Nil of unit let x = Cons(1,Nil ()) in let y = Cons(true,Nil ()) in 42  An overly-strict implementation of collection might report a type error, if collection generates constraints that force the separate occurrences of Nil and Cons in binding x and y to have the same types. A better implementation would generate constraints with distinct type variables, thus permitting the code above. ## Karma Implementing some of these karma problems might require changing the names or types that appear in the .mil files in the release code. Do so with care. Adding a new function to an interface file will not break our testing harness, but changing the name or type of a previously existing function could, especially eval and collect. Likewise, extending the AST types will not break our testing harness, but changing the name of an existing constructor or the data it carries would. If in doubt, please ask! <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **REPL:** Implement an OCalf REPL. Like utop, your REPL could provide various directives, syntax highlighting, tab completion, etc. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Pervasives:** In the same way that OCaml's Pervasives module is automatically opened, automatically populate the initial OCalf environment with useful functions. A good place to start would be functions for I/O (e.g., the print_* and read_* functions in OCaml's Pervasives). You will need to extend the OCalf notion of values to include functions that are provided by OCaml, rather than implemented in OCalf. In this, you could be inspired by OCaml's [external definitions][external]. Instead of externals naming C functions, though, OCalf externals could name OCaml functions. [external]: http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#external-declaration <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Lists:** Add syntactic support for lists to the OCalf lexer and parser, along with a built-in list type constructor. Implement a collection of functions similar to the OCaml List module in OCalf. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **let-polymorphism:** **Caution:** This karma problem is likely to cause you to reorganize your code. Make sure you make good use of source control, helper functions, and unit tests to ensure that you don't break your existing code while working on it! The type inference algorithm described above allows us to give expressions polymorphic types: fun x -> x will be given the type 'a -> 'a. But it does not let us use them polymorphically. Consider the following expression:  let (any:t1) = (fun (x:t3) -> (x:t4)):t2 in (any 1, any "where")  OCaml will give this expression the type int * string, but our naive inference algorithm fails. The solution is *let-polymorphism*: Every time a variable defined by a let expression is used, we should create a new type variable corresponding to the use. We use the constraints generated while type checking the body of the let as a template for a new set of constraints on the new variable. In the above example, while type checking the body of the let expression, we will generate the constraints t1 = t2, t2 = t3->t4, and t3 = t4. While checking the expression (any:t5) 1, we can recreate these constraints with new type variables t1', t2', and so on. We will output the constraints t1'=t2', t2' = t3'->t4' and t3'=t4'. We also will output the constraint t5=t1'. Because the type variables are cloned, they are free to vary independently. Because we have cloned their constraints, they must be used in a manner that is consistent with their definition. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> **Meta:** Implement an OCalf interpreter in OCalf. Start by defining an OCalf variant to represent OCalf ASTs. Then implement an eval function in OCalf to evaluate OCalf expressions. Since OCalf doesn't have imperative features, you won't be able to implement let rec with backpatching. Instead, you could use a technique called *recursive closures*, which is described in the [final two slides of a 3110 lecture from Spring 2015][rec-cl]. Note those slides use a different notation than we used this semester. [rec-cl]: http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/8/lec08.pdf