# JoCalf Language Manual JoCalf is an language that is most similar to OCaml, but also inherits features from JavaScript and Racket. From OCaml, JoCalf gets its basic syntax, higher-order functions, and references. From Racket, an untyped functional language in the Scheme family, JoCalf gets functions that truly take multiple arguments (unlike OCaml functions, which take just a single argument, despite syntactic sugar that makes it appear otherwise). From JavaScript, JoCalf gets operators that convert their arguments' run-time types, exceptions that can be throw and caught, and objects with fields that can be mutated. This informal semantics describes how evaluation occurs, but it mostly elides the details of how exceptions propagate and how changes to the state are threaded throughout evaluation. Those details can be found in the [formal semantics][semantics]. [semantics]: semantics.txt **Table of Contents:** * [Tutorial](#tutorial) - [Basics](#basics) - [Functions](#functions) - [Imperative features](#imperative) - [Objects](#objects) * [Reference](#reference) - [Syntax](#syntax) - [Dynamic semantics](#semantics) - [Let definitions](#letdefn) - [Constants](#constants) - [Parenthesized expressions](#parens) - [Variables and let expressions](#vars_letexpr) - [Functions and application](#fn_app) - [Control flow: if, sequence, while](#control) - [Operators](#operators) - [References](#refs) - [Exceptions](#exns) - [Objects](#objs) * [External function library](#externs) <a name="tutorial"></a> ## Tutorial This brief tutorial shows off some features of JoCalf using its toplevel (aka REPL). Like the OCaml toplevel, JoCalf's toplevel accepts *phrases*, which are either expressions or let definitions. Terminating a phrase with double semicolon is optional. <a name="basics"></a> #### Basics JoCalf's operators are more flexible than OCaml's, allowing arguments of various types to be combined. But sometimes the result is a special value `undefined`. ``` # 1 + 1 2 # "1" + "1" "11" # 31 + "10" "3110" # 1 * "zzz" undefined ``` Let expressions and definitions are much the same as OCaml. ``` # let x = 1+1 in x+x 4 # let x = 1 1 # x 1 # y Exception: "Unbound variable" ``` If expressions are more flexible, allowing non-Boolean guard expressions, missing else branches, and branches of differing types. The short-circuit conjunction and disjunction operators return values that might not be Boolean. ``` # if true then 42 else "forty two" 42 # if 3110 then "yay" else "boo" "yay" # if 0 then "yay" undefined # true && 1 1 # 1 && true true # "cool cool" || false "cool cool" ``` <a name="functions"></a> #### Functions and external functions Functions are always anonymous and permit multiple arguments. Moreover, functions may not be partially applied; all arguments must be supplied when the function is applied. ``` # let add = fun (x y) -> x + y <closure> # add 2 3 5 # add 1 Exception: "Application: wrong number of arguments" ``` The parentheses around the arguments in a function expression are mandatory. ``` # let add = fun x y -> x + y Syntax error, line 1, characters 14-15: x ``` Let bindings may be used to create recursive functions. ``` # let rec fact (n) = if n = 0 then 1 else n * (fact (n-1)) <closure> # fact 5 120 ``` External functions are functions that can be used in JoCalf but are actually implemented in OCaml&mdash;similarly to how OCaml provides some functions in its standard library that are actually implemented in C. Externals are needed when the function would be impossible to implement directly in JoCalf. ``` # length "hello" 5 # is_int 42 42 # is_int "42" false ``` <a name="imperative"></a> #### Imperative features References, sequencing, and while loops are similar to OCaml. Assignment returns the value of its right hand side. ``` # let inc = fun (r) -> r := !r + 1 <closure> # let x = ref 0 <location> # x := 10 10 # inc x; inc x; inc x 13 # !x 13 # while !x > 0 do x := !x-1 done undefined # !x 0 ``` Exceptions are an amalgam of JavaScript and OCaml: ``` # throw 42 Exception: 42 # try throw "oops" catch exc handle exc + " caught" "oops caught" # try throw 1 catch x handle throw 3 finally throw 2 Exception: 2 ``` <a name="objects"></a> #### Objects Finally, JoCalf has objects. The object system is relatively immature (as befits a young language): it lacks methods, inheritance, and the `this` keyword (implicit or explicit). Hence objects are mostly just (functional) records. Their fields can be accessed with a couple different syntactic forms. Field names are always strings, but the field to be accessed can be computed at run time. ``` # let o = {"x": 1, "1": 42, "dbl": fun (z) -> 2*z} <object> # o["x"] 1 # o.x 1 # o["1"] 42 # o[3-2] 42 # o["d"+"bl"] 10 20 # let o' = {"x": 1, "f" : fun (y) -> x+y} <object> # o'.g undefined (* because g is not a field of o' *) # o'.f 2 Exception: "Unbound variable" (* because x is not bound in the function *) ``` <a name="reference"></a> ## Reference Next, we give a reference for each syntactic form of JoCalf, explaining how each form evaluates. <a name="syntax"></a> #### Syntax The abstract syntax of JoCalf is as follows: ``` (* expressions *) e ::= | i | b | s | undefined (* constants *) | (e) | begin e end (* nesting of expressions *) | x (* variables *) | let x = e1 in e2 (* let expressions *) | let rec f (x1 ... xn) = e1 in e2 (* recursive let expressions *) | fun (x1 ... xn) -> e (* functions *) | e0 ... en (* application *) | if e1 then e2 [else e3] (* control flow: if, sequence, while *) | e1; ...; en | while e1 do e2 done | uop e | e1 bop e2 (* unary and binary operators *) | e1 && e2 | e1 || e2 | ref e | !e | e1 := e2 (* references *) | throw e (* exceptions *) | try e1 catch x handle e2 [finally e3] | {s1:e1, ..., sn:en} (* objects *) | e1[e2] | e1.e2 | e1[e2] <- e3 | delete e1[e2] (* definitions *) d ::= | let x = e (* let definition *) | let rec f (x1 ... xn) = e1 in e2 (* recursive let definition *) (* unary operators *) uop ::= | not | - | typeof (* binary operators *) bop ::= | + | - | * | / | mod | < | <= | > | >= | = | != | == | !== i ::= integers b ::= booleans s ::= strings x ::= identifiers ``` <a name="semantics"></a> #### Dynamic semantics **Values.** JoCalf values are defined as follows: ``` (* values *) v ::= | i | s | b | undefined (* constant values *) | {s1:v1, ..., sn:vn} (* object values *) | loc | cl | extern (* unexpressible values *) loc ::= memory locations cl ::= closures extern ::= external functions ``` The final three kinds of values&mdash;locations, closures, and external functions&mdash;cannot be directly expressed in JoCalf, but result from evaluation. *External functions* are functions that are available for use by JoCalf programs but are not implemented in JoCalf. Instead, they are implemented in OCaml as part of the interpreter itself. **Conversion of values.** JoCalf allows run-time conversion between different types of values. * To convert a value to a Boolean: - The values `true`, any integer except `0`, any string except the empty string `""`, and any object, location, extern, or closure all convert to the Boolean `true`. We call these values **truthy**. - The values `false`, the integer `0`, the empty string `""`, and the `undefined` value all convert to the Boolean `false`. We call these values **falsy**. * To convert a value to an integer: - An integer `i` converts to itself. - The Boolean `true` converts to `1`, and `false` converts to `0`. - A string `s` converts to `Pervasives.int_of_string s` if that OCaml function returns an integer. But if that OCaml function raises `Failure`, then the string converts to `undefined`. - A location, object, extern, or closure converts to `undefined`. - The value `undefined` remains `undefined`; it does not become an integer. * To convert a value to a string: - **Note that this algorithm is *not* how the REPL nor how the `interp` function in `Main` convert values to strings to display them to the user.** - A string `s` converts to itself. - An integer `i` converts to `Pervasives.string_of_int i`. - The Boolean `true` converts to `"true"`, and `false` converts to `"false"`. - A location, object, extern, or closure converts to `"undefined"`. - The value `undefined` converts to `"undefined"`. * To convert a value to a *primitive* value: - Integer, Boolean, string, and `undefined` are already primitive values. They convert to themselves. - Locations, objects, externs, and closures all convert to the primitive value `undefined`. **Expressions.** Evaluation of a JoCalf expression requires two additional pieces of information: the *environment*, giving a mapping from free variables to their values, and the *state*, giving a mapping from memory locations to their values. Evaluation produces a *result*, which is either a value or an exception, as well as an updated state. ``` (* results *) r ::= | v (* value *) | exn v (* exception carrying a value *) ``` **Definitions.** Evaluation of a let definition in the toplevel is much like evaluation of an expression, but it additionally introduces new bindings into the environment. **Phrases.** A phrase is either a definition or an expression, and evaluates accordingly. <a name="letdefn"></a> #### Let definitions The let definition ``` let x = e ``` evaluates `e` to a value `v`, and binds `x` to `v` in the environment. The recursive let definition ``` let rec f (x1 ... xn) = e ``` evaluates to a closure whose environment binds `f` to the same closure. *Implementation note:* The description above is intentially circular. Indeed, exploiting circularity is one simple way to implement recursion. (Another common alternative is to use a fixpoint combinator such as the famous Y combinator). To create the circular structure described above, you can use OCaml's references. First create a closure containing a (reference to) a dummy environment, then use assignment to "backpatch" the environment with one where `f` maps to the very same closure. <a name="constants"></a> #### Constants **Integers.** JoCalf integers are mostly the same as OCaml's. On a 64-bit implementation, that means they range from \\(−2^{62}\\) to \\(2^{62}−1\\). Integer literals may be written in decimal, hexadecimal, octal, and binary: ``` # 42 42 # 0x2a 42 # 0o52 42 # 0b101010 42 ``` *Implementation note:* the JoCalf parser produces an OCaml `string`, not an `int`, for integers. It is up to your interpreter to instead produce an integer. There are three problems to solve here. 1. The parser needs your help to recognize negative integer literals. For example, given the JoCalf expression `-1`, the parser produces a unary negation applied to `"1"`. A minus sign `-` followed in the JoCalf source code by an integer literal should itself be an integer literal, not a unary negation operation. So your AST factory should detect this situation, eliminate the unary negation, and tranform the string to `"-1"`. 2. Sometime before or during evaluation, those strings need to become an OCaml `int`. You could simpy use OCaml's `int_of_string` to transform the `string` to an `int`. 3. The parser needs your help to recognize whether integer literals are between the bounds of `min_int` and `max_int`, inclusive. Assuming you have solved the two problems above correctly, `int_of_string` will take care of this for you. **Strings.** JoCalf strings are the same as OCaml's, but do not support Unicode. There is some support for escaping provided by the parser: ``` # "\052" + "\050" "42" # "\n" "\n" ``` **Booleans.** JoCalf's Booleans are `true` and `false`. **Undefined.** JoCalf has a special value written `undefined` that is the only value of its type. JoCalf uses `undefined` to represent the result of an operation or language construct that does not produce a meaningful value. <a name="parens"></a> #### Parenthesized expressions Expressions can be parenthesized, as in OCaml, with `( ... )` or `begin ... end`. *Implementation note:* You don't have to do anything to deal with parenthesized expressions. The parser automatically handles them for you. Remember that parentheses are not explicitly represented in the AST; rather, they are implicitly represented by the shape of the tree. <a name="vars_letexpr"></a> #### Variables and let expressions Let expressions bind variables to values. JoCalf variable identifiers are a subset of OCaml identifiers: the first character must be a lowercase letter or an underscore, and any remaining letters must be a letter (of any case), a digit, an underscore, or a single quote. Evaluation of a variable produces the value to which that variable is bound in the environment. If the variable is unbound, evaluation of it produces the exception `"Unbound variable"`. The let expression ``` let x = e1 in e2 ``` evaluates `e1` to a value `v`, then evaluates `e2` in an environment in which `x` is bound to `v`. <a name="fn_app"></a> #### Functions and application JoCalf functions are values, just as OCaml functions are. But unlike OCaml, JoCalf functions are truly multi-argument, and all arguments must be supplied at the point where the function is applied; partial application is not permitted in JoCalf. The anonymous function ``` fun (x1 ... xn) -> e ``` is a function value that takes `n` arguments named `x1`..`xn` and has a body `e` in which those arguments are bound. An anonymous function evaluates to a closure, which captures the environment where the function was defined. It is a syntax error for a function to have zero arguments, or to have arguments with duplicate names. *Implementation note:* The parser already detects those syntax errors for you. Application is written, as in OCaml, by juxtaposing the function and its arguments: ``` e0 e1 ... en ``` It is evaluated left-to-right. First `e0` is evaluated, and that must produce either a closure or an external function. If it does not, evaluation of the function application is done, and the exception `"Application: not a function"` is produced. If `e0` does produce a function (either a closure or an external), then `e1` ... `en` are evaluated left-to-right. If all of `e1` ... `en` evaluate to values `v1` ... `vn`, and if the function takes exactly `n` arguments, then the function is applied to those arguments, producing a result. If the function takes a different number of arguments than were provided, then the exception `"Application: wrong number of arguments"` is produced. The check for the number of arguments occurs before the arguments are themselves evaluated. <a name="control"></a> #### Control flow: if, sequence, while The if expression ``` if e1 then e2 else e3 ``` evaluates the guard `e1`, then conditionally evaluates a branch `e2` or `e3` based on the result of the guard. First, the guard `e1` is evaluated. If the guard is truthy, then `e2` is evaluated. If the guard is falsy, then `e3` is evaluated. It is also possible to omit the else branch: ``` if e1 then e2 ``` That syntactic form evaluates equivalently to ``` if e1 then e2 else undefined ``` The sequence of expressions ``` e1; e2 ``` evaluates left to right. First `e1` is evaluated, then `e2`. We then have two values, `v1` and `v2`. The result of the sequence is `v2`. The other value, `v1`, is ignored. Likewise, a longer sequence of expressions ``` e1; e2; ...; en ``` is evaluated left to right, resulting in the final value `vn` corresponding to `en`. The while loop ``` while e1 do e2 done ``` is evaluated by instead evaluating the longer expression ``` if e1 then (e2; while e1 do e2 done) ``` and returning whatever result that longer expression produces. Note how that recursively *unrolls* the while loop during interpretation for as many iterations as are necessary. <a name="operators"></a> #### Operators *If any of this section seems unnecessarily complicated, reflect on the fact that it is designed to be as faithful as possible to JavaScript, which is an immensely popular language. Also, for five minutes of PL-nerdy entertainment, watch the [Wat talk][wat] on even weider definitions in dynamically typed languages like Ruby and JavaScript.* [wat]: https://www.destroyallsoftware.com/talks/wat **Unary operators.** Application of a unary operator ``` uop e ``` first evaluates `e` to a value `v`. The result depends on the operator: * `not`: If `v` is truthy, then `not v` is `false`. If `v` is falsy, then `not v` is `true`. * `-`: Application of unary negation `- v` first converts `v` to an integer. That might result in an integer, or the `undefined` value. In the former case, the result of the operation is the result of applying OCaml's unary negation operator to the integer. The result of applying unary negation to `undefined` is `undefined`. *Implementation note*: A minus sign `-` followed in the JoCalf source code by an integer literal is itself an integer literal, not a unary negation operation. See the implementation note above on integers. * `typeof`: The typeof operator always produces a string. The type of `undefined` is `"undefined"`. The type of a Boolean is `"bool"`. The type of an integer is `"int"`. The type of a string is `"string"`. The type of an object is `"object"`. The type of a location is `"location"`. The type of a closure or an extern is `"closure"`&mdash;that is, typeof cannot distinguish between functions coded in JoCalf and functions coded in OCaml. **Binary operators.** Application of a binary operator ``` e1 bop e2 ``` proceeds left to right. First, it evaluates `e1`, then `e2`. We now have two values `v1` and `v2`, respectively, and the result depends on the operator: * `+`: First convert `v1` and `v2` to primitive values. If either one is a string, then convert the other to a string too, and the result is the application of OCaml's string append operator to those strings. Otherwise, further convert the primitives to integers, and the result is the application of OCaml's integer addition operator to those integers. But if conversion to integers causes either operand to convert to `undefined`, then the result of the `+` operation is likewise `undefined`. * `-`, `*`, `/`, `mod`: Convert `v1` and `v2` to integers. The result of the binary operation is the application of OCaml's same binary operator to those integers, with the following deviations: - If conversion to integers causes either operand to convert to `undefined`, then the result of the binary operation is likewise the `undefined` value. - If the binary operation is `/` or `mod` and `v2` is the integer `0`, then the result is the JoCalf exception `"Division by zero"`. * `<`, `<=`, `>`, `>=`: Convert `v1` and `v2` to primitives. If they are both strings, then the result is the application of OCaml's same comparison operator to those strings. (Recall, OCaml uses lexicographic comparison on strings.) Otherwise, further convert the primitives to integers, and the result is the application of OCaml's same comparison operator to those integers. But if conversion to integers causes either operand to convert to `undefined`, then the result of the comparison operation is `false`. Note that this is different behavior than `+` on undefined operands. * `=`: Follow these steps, in order. - If `v1` and `v2` both `undefined`, return `true`. - If `v1` and `v2` are both Booleans, or are both integers, or are both strings, then return the result of OCaml's `=` operator on those values. - If `v1` and `v2` are both locations, then dereference both locations, and recurse to determine whether the resulting values are equal. - If `v1` and `v2` are both objects, then return `true` if both objects have the same set of fields, and those fields are recursively equal in both objects; otherwise, return `false`. - If `v1` and `v2` are both closures or both externs, then return `false`. - If either of the two values is an integer, and the other value is either a string or a Boolean, then convert the other value to an integer (which might actually produce `undefined`), and recurse to compare them. - Otherwise, return `false`. * `!=`: Compare `v1` to `v2` using `=`, then apply JoCalf's unary `not` to that result. * `==`: Follow these steps. No type conversions are performed at any point during this operation. - If `v1` and `v2` are both `undefined`, return `true`. - If `v1` and `v2` are both an integer, or are both strings, or are both Booleans, then return whether they are equal according OCaml's `=` operator. - If `v1` and `v2` are both locations, then return whether they are the same location. Do not dereference the locations to check their contents. - If `v1` and `v2` are both objects, then return `true` if both objects have the same set of fields, and those fields are recursively equal (in the `==` sense) in both objects; otherwise, return `false`. - If `v1` and `v2` are both closures, return `false`. Do not check their code or environment; it doesn't matter whether they are equal or not. - Otherwise, return `false`. * `!==`: Compare `v1` to `v2` using `==`, then apply JoCalf's unary `not` to that result. **Short-circuit operators.** Application of a short-circuit operator ``` e1 && e2 e1 || e2 ``` works differently than application of a binary operator, because the right-hand operand might never be evaluated, depending on the result of the left-hand operand. Start by evaluating `e1` to a value `v1`. Evaluation then proceeds as follows: * `&&`: If `v1` is falsy, return `v1`. Do not evaluate `e2`. If `v1` is truthy, continue to evaluate `e2`. * `||`: If `v1` is truthy, return `v1`. Do not evaluate `e2`. If `v1` is falsy, continue to evaluate `e2`. If evaluation did not already return, now evaluate `e2` to a value `v2`. Return `v2` as the result of the short-circuit operation. Note that in all cases, the expression that finally determined whether the operator as a whole is truthy or falsy is what gets returned. Unlike OCaml's short-circuit operators, that expression need not itself be a Boolean. <a name="refs"></a> #### References Evaluation of a reference creation ``` ref e ``` first evaluates `e` to a value `v`. Then it allocates a new location `loc` in the store, and initializes that location to have value `v`. Evaluation of a dereference ``` !e ``` first evaluates `e` to a value `v`. If that value is not a location, the result of the dereference is the `undefined` value. But if `v` is a location, return the value stored at that location. If the location is not actually allocated in the store, then the result is unspecified (and in fact this case should be impossible.) Evaluation of an assignment ``` e1 := e2 ``` first evaluates `e1`, then evaluates `e2`. We now have two values, `v1` and `v2`. If `v1` is not a location, the result of the assignment is the exception `"Assignment to non-location"`. If `v1` is a location but is currently unbound in the store, then the result is unspecified (and in fact this case should be impossible). Otherwise, store `v2` at location `v1`, and return `v2` as the result of the assignment. <a name="exns"></a> #### Exceptions The throw expression ``` throw e ``` first evaluates `e`. If that evaluation produces an exception `exn v'`, then `exn v'` is the result of evaluating `throw e`. That is, the inner exception thrown by `e` itself is what actually propagates. But if `e` evaluates to a value `v`, then `throw e` evaluates to the exception `exn v`. The try expression ``` try e1 catch x handle e2 ``` first evaluates `e1`. If that produces a value `v`, then `v` is the result of evaluating the try expression. But if evaluation of `e1` produces an exception `exn v`, then evaluation continues by binding `x` to `v` in the environment, evaluating `e2` in that extended environment, and returning the result of that evaluation. The try-finally expression ``` try e1 catch x handle e2 finally e3 ``` first evaluates `try e1 catch x handle e2`, obtaining a result `r`. It then evaluates `e3`. If evaluation of `e3` produces an exception, then that exception is the result of the try-finally and `r` is ignored. Otherwise, `r` is the result and the value of `e3` is ignored. <a name="objs"></a> #### Objects The object expression ``` {s1:e1, ... sn:en} ``` evaluates its field expressions `e1`...`en` left to right, producing values `v1`...`vn`. The result of the object expression is the object value ``` {s1:v1, ..., sn:vn} ``` The get-field expression ``` e1[e2] ``` first evaluates `e1` and `e2` to values `v1` and `v2`. It then converts `v2` to a primitive, then converts that primitive to a string `s`. It then checks whether `v1` is an object; if not, the result of the get-field expression is the `undefined` value. But if `v1` is an object, then the field `s` is found in that object. If there is no such field, then the result of the get-field expression is the `undefined` value. If there is a field `s` and it is bound to a value `v`, then `v` is the result of the get-field expression. *Implementation note:* The alternative form `e.x` of the get-field expression is already handled for you by the parser, which transforms it into `e["x"]` for you. So you don't need to do anything extra to interpret this alternative form. The update-field expression ``` e1[e2] <- e3 ``` first evaluates `e1`, `e2`, and `e3` to values `v1`, `v2`, and `v3`. It then converts `v2` to a primitive, then converts that primitive to a string `s`. It then checks whether `v1` is an object; if not, the result of the update-field expression is `v3`. But if `v1` is an object, then its `s` field is bound to `v3`, and the updated object (not `v3`) is the result of the update-field expression. It doesn't matter whether `v1` already had an `s` field or not: if it didn't, the field is created; if it did, the field is updated. The delete-field expression ``` delete e1[e2] ``` first evaluates `e1` and `e2` to values `v1` and `v2`. It then converts `v2` to a primitive, then converts that primitive to a string `s`. It then checks whether `v1` is an object; if not, the result of the delete-field expression is `v1`. But if `v1` is an object, then its `s` field is removed, and the updated object is the result of the delete-field expression. It doesn't matter whether `v1` had an `s` field or not: if it didn't, the object doesn't change; if it did, the field is now deleted. <a name="externs"></a> ## External function library There are seven external functions that must be bound in the initial environment of every JoCalf program. They comprise the "pervasives" of JoCalf. * `is_int` takes one argument. If that argument is an integer, the function returns that integer. Otherwise, the function returns false. * `is_bool` takes one argument. If that argument is a Boolean, the function returns that Boolean. Otherwise, the function returns false. * `is_string` takes one argument. If that argument is a string, the function returns that string. Otherwise, the function returns false. * `is_defined` takes one argument. If that argument is the `undefined` value, the function returns false. Otherwise, the function returns the argument. * `is_prim` takes one argument. If that argument is an integer, string, Boolean, or the `undefined` value, the function returns that value. Otherwise, the function returns false. * `length` takes one argument. If that argument is a string, the function returns the length of that string, which is the number of characters in it. If the argument is not a string, the function returns `undefined`. * `has_field` takes two arguments. The first must be an object, and the second must be a string; if either of those fails to hold, the function returns `undefined`. Otherwise, the function returns true if the object has a field with that string as its name, and false if it does not. None of these external functions modifies the JoCalf state, though of course evaluation of their arguments could do so.