# A4: OCalf **Deadline:** Wednesday, 11/02/16, 11:59 pm *This assignment may be done as an individual or with one partner. Sharing of code is permitted only between you and your partner; sharing among larger groups is prohibited. Be careful that you do not end up working as a three or four person team with the rest of your final project group: we have had AI cases in the past resulting from that behavior.* A baby camel is called a *calf*. In this assignment you will implement an interpreter for OCalf, a functional language containing many of the core features of OCaml. ## Overview OCalf is intended to be a sub-language of OCaml: unless otherwise specified, OCalf programs should behave like OCaml programs. Here is a Backus-Naur form (BNF) grammar for the syntax of OCalf expressions, to give you a sense of what features are in the language:  e ::= (* expressions *) | () | i | b | s | x | e1 bop e2 | if e1 then e2 else e3 | let [rec] x = e1 in e2 | e1 e2 | fun x -> e | (e1,e2) | C e | match e with p1 -> e1 | ... | pn -> en bop ::= (* binary operators *) | + | - | * | > | < | = | >= | <= | <> | ^ p ::= (* patterns *) | () | i | b | s | x | C p | (p1,p2) i ::= integers b ::= booleans s ::= strings x ::= variable identifiers C ::= constructor identifiers  Some of the ways that OCalf differs from OCaml are the following: * OCalf source-file integers are non-negative, and there is no unary integer negation. Negative integers can nonetheless be created, e.g. by writing 0-10 to produce negative ten. * OCalf variants must be non-constant. * OCalf has pairs, but not \$$k\$$-tuples for \$$k \geq 3\$$. * OCalf does not have built-in lists. * OCalf does not have a module system. * OCalf has no imperative features. Parsing and lexing has been implemented for you, and the AST has been designed for you. You will implement the rest of the interpreter, including evaluation and type inference. Evaluation will be implemented using the big-step environment model. ## Objectives * Know the design of an interpreter. * Implement a big-step environment model semantics, including closures. * Implement type inference. * Better understand OCaml itself by implementing many of its main features. ## Recommended reading * [Lectures and recitations 13&ndash;17][web] [web]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/ [style]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/style.html ## Requirements * You must implement the evaluation as described below in Part 2. * You must implement type inference and checking as described below in Part 3. * You must submit an [overview document][overview]. * You must use Git as part of your development process. [overview]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/overview.html ## What we provide See below under Part 1 for a description of the release code. ## What to turn in Submit files with these names on [CMS][cms]: * a4src.zip, containing your solution code. See below for how you must construct this file. * gitlog.txt containing your git log. * overview.pdf or overview.txt, containing your overview document. Although you should most certainly build a test suite (and we provide you with the start of one in test.ml), we will not be grading your test suite as part of this assignment. [cms]: https://cms.csuglab.cornell.edu/ **To prepare a4src.zip for submission:** From the directory that contains main.ml, run make zip. This will produce a file named a4src.zip for you automatically containing all your source code located in that directory. Double check that you got all your source files as follows:  $zipinfo -1 a4src.zip ast.ml eval.ml infer.ml main.ml parse.ml printer.ml test.ml typedAst.ml ast.mli eval.mli infer.mli parse.mli printer.mli typedAst.mli parser.mly lexer.mll  If the output you receive is not **identical** to the above, something is wrong. For example, output that included the following would be wrong:  a4/main.ml  because the .ml file is located in a subdirectory rather than being at the top level directory inside the zip file. Submissions that do not meet the above criteria regarding the output of zipinfo **will be penalized.** As usual, make sure you are satisfied with the version of the file you upload in CMS **before** the deadline. **To prepare gitlog.txt for submission:** Run the following command in the directory containing main.ml: $ git log --stat > gitlog.txt  ## Git **Private repos are of the utmost importance.** A public repo would share your code with the entire world, including your classmates, thus violating the course policy on academic integrity. Therefore we require that you keep all your CS 3110 related code in private repos. ## Grading issues * **Late submissions:** Carefully review the course policies on [submission and late assignments][late]. Verify before the deadline on CMS that you have submitted the correct version. * **Environment, names, and types:** Your solution must compile and run under OCaml 4.03.0. You are required to adhere to the names and types provided in the released .mli files. Your solution must compile against the provided public test cases in test.ml without any changes to those tests. Otherwise, your solution will receive minimal credit. * **Code style:** Refer to the [CS 3110 style guide][style]. Ugly code that is functionally correct will nonetheless be penalized. Take extra time to think and find elegant solutions. [late]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/syllabus.php#late [style]: http://www.cs.cornell.edu/Courses/cs3110/2016fa/handouts/style.html ## Permitted OCaml features Imperative features are now permitted. Side effects are not necessarily bad! But we urge you to think carefully before choosing to use imperative features. Your solutions may not require linking any additional libraries/packages beyond OUnit. ## Part 0: Sanity check We are not shipping a make check with this assignment. Instead, we stipulate the following: * Your submission must compile and run with OCaml 4.03.0 and OUnit 2.0.0. * You may not change the provided .mli files. * Your solution must compile against the provided test cases in test.ml without any changes to those test cases whatsoever. By now, you should be able to take responsibility for adhering to these requirements yourself, without needing automated help. ## Part 1: Understand the OCalf codebase **Release code:** Your preliminary task is to familiarize yourself with the structure of the code we have shipped to you. We provide the following compilation units, comprising both .ml and .mli files, in the release code: * Eval and Infer have skeleton code for the functions that implement evaluation and type inference. There are some unimplemented helper functions in these files that the course staff found helpful in their own solution. You are free to implement them, change them, or remove them entirely. * Ast and TypedAst contain the type definitions used in evaluation and type inference, respectively. The TypedAst module also contains annotate and strip functions for converting between the two. * Printer contains functions for printing out values of various types. * Parse contains functions to construct ASTs from strings. It relies on a lexer and parser implemented in lexer.mll and parser.mly. Skim each of the .mli and .ml files in the release code, and plan to come back and read them in more detail later as necessary. **Compiling and running:** From the directory containing main.ml, run make to compile the interpreter. As the parser is compiled, it will produce a message 1 shift/reduce conflict. This is expected behavior. Note that make compiles the interpreter but does compile or run your test suite; use make test for that. To run the intepreter, we will use OCaml's own REPL. (For karma, you can build a REPL for OCalf.) So after compiling, load utop. We provide a .ocamlinit file that makes the OCalf interpreter's functions available. For example,  "if true then 3110 else 0" |> parse_expr |> eval [] |> string_of_value  causes the string "if true then 3110 else 0" representing an OCalf expression to be parsed into an OCalf AST, evaluated to an OCalf value, then converted to an OCaml string suitable for printing. But note that evaluation won't yet succeed with the release code, because eval is unimplemented. **Testing:** From the directory containing main.ml, run make test to compile and run the OUnit test suite in test.ml. Use that file as a basis for creating your own test suite. ## Part 2: Evaluation Implement the function  eval : Eval.environment -> Ast.expr -> Eval.value  in eval.ml. This function evaluates OCalf expressions in the big-step environment model semantics. For example,  eval [] (parse_expr "if false then 3 + 5 else 3 * 5"));; - : value = VInt 15  Note that we expose the representation of environments as association lists. Arguably this is poor design: all the client of an environment needs to know is that it is a dictionary, and the particular dictionary representation is irrelevant. **Errors.** Whenever evaluation reaches a place where the semantics gets *stuck*, meaning that evaluation could not meaningfully proceed but the expression being evaluated is not yet a value, eval should produce the value VError. This will most often occur when eval is asked to evaluate an expression that would not type check. For example, 3 + true and match 3 with | 1 -> false should both evaluate to VError. Whenever a necessary evaluation of a subexpression evaluates to VError, the expression that it is part of also evaluates to VError. For example, snd (3+true, 5), where snd is fun p -> match p with (x,y) -> y, evaluates to an error, because 3+true evaluates to an error, and evaluation of a pair necessarily requires evaluating both components of the pair, even though snd will eventually discard the erroneous first component. But if true then 3110 else (3+true) would not evaluate to VError, because evaluation of the else branch is not necessary according to the semantics. The VError constructor carries a string, and the value of that string is unspecified: use it to produce a helpful error message. When the autograder tests your interpreter, it will ignore the string entirely. *Hint: when your evaluator detects an error, don't try to return it all the way up through the call stack of recursive calls to eval. Instead, make use of OCaml exceptions.* **Advice.** Although you are free to tackle the implementation of eval in any way you see fit, you are strongly encouraged to follow the plan outlined below. Our test cases will proceed in the plan's order, so following the plan will maximize your chances of partial credit. **Step 0:** Remind yourself of the big-step semantics of OCaml, because you are essentially implementing that judgment. **Step 1: Implement eval without LetRec or Match.** Implement eval for unit, integers, Booleans, strings, BinOp, If, Var, Fun, Pair, Variant, App, and Let. The behavior of the OCalf binary comparison operators is specified only for *primitive* types (unit, int, bool, string), and should be the same as the OCaml operators on those types. On non-primitive types (e.g., pairs and variants), the behavior of these operators is unspecified, so your implementation may do whatever you like. **Step 2: Implement LetRec.** Implement eval for LetRec. This is tricky, because a binding is needed in the environment for the defined name before its definition can be evaluated. We'll solve this problem with a technique called *backpatching*. To evaluate let rec f = e1 in e2 using backpatching, first evaluate e1 to a value v1 in an environment where f is bound to a *dummy value*, which may be any value at all. (If the value of f is used while evaluating e1, it is an error. This would occur, for example, if the programmer wrote let rec x = 3 + x in ....) Then imperatively update the binding of f to v1. This "ties the knot," allowing v1 to refer to itself. Finally evaluate e2 in the environment where f is bound to v1. To support backpatching, the environment type contains binding ref's instead of binding's, thus enabling bindings to be mutated. **Step 3: Implement Match.** Implement eval for Match. <span style="color:green"> [CLARIFICATION 10/28/16] Because at this point you are implementing only dynamic evaluation&mdash;not static checking&mdash;of match expressions, you do not need to implement any of the static checking that OCaml would normally do of match expressions. Normally OCaml would do some extended static checking for errors that are not just type errors (e.g., unused match cases, multiple bindings of a variable in a pattern, or inexhaustive match cases) then either emit a warning or a compile-time error. You are not implementing that kind of static checking for OCalf. But it is possible for evaluation of OCalf and OCaml match expressions to get stuck if no pattern matches the value; consider how match 0 with 1 -> true raises Match_failure in OCaml. In that scenario, OCalf evaluation ought to return VError, because that is the specified behavior of OCalf when evaluation gets stuck.</span> ## Part 3: Type inference OCalf's type inference algorithm proceeds as follows: 1. Each node of the AST is *annotated* (aka *decorated*) with a unique type variable. 2. The AST is traversed to *collect* a set of equations between types that must hold for the expression to be well-typed. 3. Finally, those equations are solved to find an assignment of types to the original variables; this step is called *unification*. If the unification phase discovers that the system is unsatisfiable, then it is impossible to give a type to the expression, so a type error would be reported. If the system is underconstrained, then there will be some "leftover" variables after unification. The typechecker would give them user-friendly names like 'a and 'b. * * * **Example.** Consider (fun x -> 3 + x). The annotation phase would add a new type variable to each subexpression: <p class="text-center"> <img src="annotating.png" alt="parsing" width="600"> </p> Next, the collection phase would collect equations from each node: * We know that (fun x -> e) : t1 -> t2 if e:t2 under the assumption x:t1. Stated differently, (fun (x:t1) -> e:t2) : t3 if and only if t3 = t1 -> t2. Therefore, the equation 't05 = 't04 -> 't03 would be collected from fun node. * Similarly, the equations 't03 = int, 't02 = int, and 't01 = int would be collected from the + node. * The equation 't02 = int would be collected from the 3 node. * We know that if x had type t when it was bound then it has type t when it is used. Therefore, 't01 = 't04 would be collected from the x: 't01 node. Finally, the system of equations is solved in the unification phase, assigning int to 't01 through 't04, and int -> int to 't05. * * * We provide the annotation and unification phases of type inference for you; your task is to implement the function  collect : variant_spec list -> annotated_expr -> equation list  in infer.ml. If collect itself discovers a type error that means that the expression cannot be typed, regardless of the equations that are generated, then collect may raise Failure. **Advice:** We strongly encourage you to follow this plan: **Step 1: Implement collect without Match or Variant.** Implement collect for all syntactic forms except Variant and Match. **Step 2: Partially implement collect for Match.** Implement collect for Match, but omit handling of PVariant patterns. Make sure the bindings from the patterns are available while checking the bodies of the cases. **Step 3: Implement collect for Variant.** Extend collect to handle variants. Deriving the correct constraints for variants is tricky. Consider this code:  type 'a list = Cons of 'a * 'a list | Nil of unit let x = Cons(1,Nil ()) in let y = Cons(true,Nil ()) in 42  An overly-strict implementation of collection might report a type error, if collection generates constraints that force the separate occurrences of Nil and Cons in binding x and y to have the same types. A better implementation would generate constraints with distinct type variables, thus permitting the code above. ## Karma Implementing some of these karma problems might require changing the names or types that appear in the .mli files in the release code. Do so with care. Adding a new function to an interface file will not break our testing harness, but changing the name or type of a previously existing function could, especially eval and collect. Likewise, extending the AST types with new constructors will not break our testing harness, but changing the name of an existing constructor or the data it carries would. If in doubt, please ask! <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **REPL:** Implement an OCalf REPL. Like utop, your REPL could provide various directives, syntax highlighting, tab completion, etc. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Pervasives:** In the same way that OCaml's Pervasives module is automatically opened, automatically populate the initial OCalf environment with useful functions. A good place to start would be functions for I/O (e.g., the print_* and read_* functions in OCaml's Pervasives). You will need to extend the OCalf notion of values to include functions that are provided by OCaml, rather than implemented in OCalf. In this, you could be inspired by OCaml's [external definitions][external]. Instead of externals naming C functions, though, OCalf externals could name OCaml functions. [external]: http://caml.inria.fr/pub/docs/manual-ocaml/intfc.html#external-declaration <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **Lists:** Add syntactic support for lists to the OCalf lexer and parser, along with a built-in list type constructor. Implement a collection of functions similar to the OCaml List module in OCalf. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_black.png" height="40" width="40"> **let-polymorphism:** **Caution:** This karma problem is likely to cause you to reorganize your code. Make sure you make good use of source control, helper functions, and unit tests to ensure that you don't break your existing code while working on it! The type inference algorithm described above allows us to give expressions polymorphic types: fun x -> x will be given the type 'a -> 'a. But it does not let us use them polymorphically. Consider the following expression:  let (any:t1) = (fun (x:t3) -> (x:t4)):t2 in (any 1, any "where")  OCaml will give this expression the type int * string, but our naive inference algorithm fails. The solution is *let-polymorphism*: Every time a variable defined by a let expression is used, we should create a new type variable corresponding to the use. We use the constraints generated while type checking the body of the let as a template for a new set of constraints on the new variable. In the above example, while type checking the body of the let expression, we will generate the constraints t1 = t2, t2 = t3->t4, and t3 = t4. While checking the expression (any:t5) 1, we can recreate these constraints with new type variables t1', t2', and so on. We will output the constraints t1'=t2', t2' = t3'->t4' and t3'=t4'. We also will output the constraint t5=t1'. Because the type variables are cloned, they are free to vary independently. Because we have cloned their constraints, they must be used in a manner that is consistent with their definition. <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> <img src="camel_orange.png" height="40" width="40"> **Meta:** Implement an OCalf interpreter in OCalf. Start by defining an OCalf variant to represent OCalf ASTs. Then implement an eval function in OCalf to evaluate OCalf expressions. Since OCalf doesn't have imperative features, you won't be able to implement let rec with backpatching. Instead, you could use a technique called *recursive closures*, which is described in the [final two slides of a 3110 lecture from Spring 2015][rec-cl]. Note those slides use a different notation than we used this semester. [rec-cl]: http://www.cs.cornell.edu/Courses/cs3110/2015sp/lectures/8/lec08.pdf