# JoCalf Language Manual
JoCalf is an language that is most similar to OCaml, but also inherits
features from JavaScript and Racket. From OCaml, JoCalf gets its
basic syntax, higher-order functions, and references. From Racket, an
untyped functional language in the Scheme family, JoCalf gets
functions that truly take multiple arguments (unlike OCaml functions,
which take just a single argument, despite syntactic sugar that makes
it appear otherwise). From JavaScript, JoCalf gets operators that
convert their arguments' run-time types, exceptions that can be throw
and caught, and objects with fields that can be mutated.
This informal semantics describes how evaluation occurs, but it mostly
elides the details of how exceptions propagate and how changes to the
state are threaded throughout evaluation. Those details can be found
in the [formal semantics][semantics].
[semantics]: semantics.txt
**Table of Contents:**
* [Tutorial](#tutorial)
- [Basics](#basics)
- [Functions](#functions)
- [Imperative features](#imperative)
- [Objects](#objects)
* [Reference](#reference)
- [Syntax](#syntax)
- [Dynamic semantics](#semantics)
- [Let definitions](#letdefn)
- [Constants](#constants)
- [Parenthesized expressions](#parens)
- [Variables and let expressions](#vars_letexpr)
- [Functions and application](#fn_app)
- [Control flow: if, sequence, while](#control)
- [Operators](#operators)
- [References](#refs)
- [Exceptions](#exns)
- [Objects](#objs)
* [External function library](#externs)
## Tutorial
This brief tutorial shows off some features of JoCalf using its
toplevel (aka REPL). Like the OCaml toplevel, JoCalf's toplevel
accepts *phrases*, which are either expressions or let definitions.
Terminating a phrase with double semicolon is optional.
#### Basics
JoCalf's operators are more flexible than OCaml's, allowing arguments
of various types to be combined. But sometimes the result is a
special value `undefined`.
```
# 1 + 1
2
# "1" + "1"
"11"
# 31 + "10"
"3110"
# 1 * "zzz"
undefined
```
Let expressions and definitions are much the same as OCaml.
```
# let x = 1+1 in x+x
4
# let x = 1
1
# x
1
# y
Exception: "Unbound variable"
```
If expressions are more flexible, allowing non-Boolean guard
expressions, missing else branches, and branches of differing types.
The short-circuit conjunction and disjunction operators return values
that might not be Boolean.
```
# if true then 42 else "forty two"
42
# if 3110 then "yay" else "boo"
"yay"
# if 0 then "yay"
undefined
# true && 1
1
# 1 && true
true
# "cool cool" || false
"cool cool"
```
#### Functions and external functions
Functions are always anonymous and permit multiple arguments.
Moreover, functions may not be partially applied; all arguments must
be supplied when the function is applied.
```
# let add = fun (x y) -> x + y
# add 2 3
5
# add 1
Exception: "Application: wrong number of arguments"
```
The parentheses around the arguments in a function expression
are mandatory.
```
# let add = fun x y -> x + y
Syntax error, line 1, characters 14-15: x
```
Let bindings may be used to create recursive functions.
```
# let rec fact (n) = if n = 0 then 1 else n * (fact (n-1))
# fact 5
120
```
External functions are functions that can be used in JoCalf but are
actually implemented in OCaml—similarly to how OCaml provides
some functions in its standard library that are actually implemented
in C. Externals are needed when the function would be impossible to
implement directly in JoCalf.
```
# length "hello"
5
# is_int 42
42
# is_int "42"
false
```
#### Imperative features
References, sequencing, and while loops are similar to OCaml.
Assignment returns the value of its right hand side.
```
# let inc = fun (r) -> r := !r + 1
# let x = ref 0
# x := 10
10
# inc x; inc x; inc x
13
# !x
13
# while !x > 0 do x := !x-1 done
undefined
# !x
0
```
Exceptions are an amalgam of JavaScript and OCaml:
```
# throw 42
Exception: 42
# try throw "oops" catch exc handle exc + " caught"
"oops caught"
# try throw 1 catch x handle throw 3 finally throw 2
Exception: 2
```
#### Objects
Finally, JoCalf has objects. The object system is relatively immature
(as befits a young language): it lacks methods, inheritance, and the
`this` keyword (implicit or explicit). Hence objects are mostly just
(functional) records. Their fields can be accessed with a couple different
syntactic forms. Field names are always strings, but the field to be
accessed can be computed at run time.
```
# let o = {"x": 1, "1": 42, "dbl": fun (z) -> 2*z}
# o["x"]
1
# o.x
1
# o["1"]
42
# o[3-2]
42
# o["d"+"bl"] 10
20
# let o' = {"x": 1, "f" : fun (y) -> x+y}
# o'.g
undefined (* because g is not a field of o' *)
# o'.f 2
Exception: "Unbound variable" (* because x is not bound in the function *)
```
## Reference
Next, we give a reference for each syntactic form of JoCalf,
explaining how each form evaluates.
#### Syntax
The abstract syntax of JoCalf is as follows:
```
(* expressions *)
e ::=
| i | b | s | undefined (* constants *)
| (e) | begin e end (* nesting of expressions *)
| x (* variables *)
| let x = e1 in e2 (* let expressions *)
| let rec f (x1 ... xn) = e1 in e2 (* recursive let expressions *)
| fun (x1 ... xn) -> e (* functions *)
| e0 ... en (* application *)
| if e1 then e2 [else e3] (* control flow: if, sequence, while *)
| e1; ...; en
| while e1 do e2 done
| uop e | e1 bop e2 (* unary and binary operators *)
| e1 && e2 | e1 || e2
| ref e | !e | e1 := e2 (* references *)
| throw e (* exceptions *)
| try e1 catch x
handle e2 [finally e3]
| {s1:e1, ..., sn:en} (* objects *)
| e1[e2] | e1.e2
| e1[e2] <- e3
| delete e1[e2]
(* definitions *)
d ::=
| let x = e (* let definition *)
| let rec f (x1 ... xn) = e1 in e2 (* recursive let definition *)
(* unary operators *)
uop ::=
| not | - | typeof
(* binary operators *)
bop ::=
| + | - | * | / | mod
| < | <= | > | >=
| = | != | == | !==
i ::= integers
b ::= booleans
s ::= strings
x ::= identifiers
```
#### Dynamic semantics
**Values.** JoCalf values are defined as follows:
```
(* values *)
v ::=
| i | s | b | undefined (* constant values *)
| {s1:v1, ..., sn:vn} (* object values *)
| loc | cl | extern (* unexpressible values *)
loc ::= memory locations
cl ::= closures
extern ::= external functions
```
The final three kinds of values—locations, closures, and
external functions—cannot be directly expressed in JoCalf, but
result from evaluation. *External functions* are functions
that are available for use by JoCalf programs but are not implemented
in JoCalf. Instead, they are implemented in OCaml as part of the
interpreter itself.
**Conversion of values.** JoCalf allows run-time conversion between
different types of values.
* To convert a value to a Boolean:
- The values `true`, any integer except `0`, any string except
the empty string `""`, and any object, location, extern, or closure all
convert to the Boolean `true`. We call these values **truthy**.
- The values `false`, the integer `0`, the empty string `""`, and
the `undefined` value all convert to the Boolean `false`. We call
these values **falsy**.
* To convert a value to an integer:
- An integer `i` converts to itself.
- The Boolean `true` converts to `1`, and `false` converts to `0`.
- A string `s` converts to `Pervasives.int_of_string s` if that
OCaml function returns an integer. But if that OCaml function
raises `Failure`, then the string converts to `undefined`.
- A location, object, extern, or closure converts to `undefined`.
- The value `undefined` remains `undefined`; it does not become
an integer.
* To convert a value to a string:
- **Note that this algorithm is *not* how the REPL nor how the
`interp` function in `Main` convert values to strings to
display them to the user.**
- A string `s` converts to itself.
- An integer `i` converts to `Pervasives.string_of_int i`.
- The Boolean `true` converts to `"true"`, and `false` converts
to `"false"`.
- A location, object, extern, or closure converts to `"undefined"`.
- The value `undefined` converts to `"undefined"`.
* To convert a value to a *primitive* value:
- Integer, Boolean, string, and `undefined` are already primitive values.
They convert to themselves.
- Locations, objects, externs, and closures all convert to the
primitive value `undefined`.
**Expressions.** Evaluation of a JoCalf expression requires two additional
pieces of information: the *environment*, giving a mapping
from free variables to their values, and the *state*, giving
a mapping from memory locations to their values.
Evaluation produces a *result*, which is either a value
or an exception, as well as an updated state.
```
(* results *)
r ::=
| v (* value *)
| exn v (* exception carrying a value *)
```
**Definitions.** Evaluation of a let definition in the toplevel is
much like evaluation of an expression, but it additionally introduces
new bindings into the environment.
**Phrases.** A phrase is either a definition or an expression, and
evaluates accordingly.
#### Let definitions
The let definition
```
let x = e
```
evaluates `e` to a value `v`, and binds `x` to `v` in the
environment.
The recursive let definition
```
let rec f (x1 ... xn) = e
```
evaluates to a closure whose environment binds `f` to the same
closure.
*Implementation note:* The description above is intentially
circular. Indeed, exploiting circularity is one simple way to
implement recursion. (Another common alternative is to use a fixpoint
combinator such as the famous Y combinator). To create the circular
structure described above, you can use OCaml's references. First
create a closure containing a (reference to) a dummy environment, then
use assignment to "backpatch" the environment with one where `f` maps
to the very same closure.
#### Constants
**Integers.**
JoCalf integers are mostly the same as OCaml's. On a 64-bit implementation,
that means they range from \\(−2^{62}\\) to \\(2^{62}−1\\). Integer
literals may be written in decimal, hexadecimal, octal, and binary:
```
# 42
42
# 0x2a
42
# 0o52
42
# 0b101010
42
```
*Implementation note:* the JoCalf parser produces an OCaml `string`,
not an `int`, for integers. It is up to your interpreter to instead
produce an integer. There are three problems to solve here.
1. The parser needs your help to recognize negative integer literals.
For example, given the JoCalf expression `-1`, the parser produces a
unary negation applied to `"1"`. A minus sign `-` followed in the JoCalf
source code by an integer literal should itself be an integer literal,
not a unary negation operation. So your AST factory should detect this
situation, eliminate the unary negation, and tranform the string to
`"-1"`.
2. Sometime before or during evaluation, those strings need to become
an OCaml `int`. You could simpy use OCaml's `int_of_string` to
transform the `string` to an `int`.
3. The parser needs your help to recognize whether integer literals
are between the bounds of `min_int` and `max_int`, inclusive.
Assuming you have solved the two problems above correctly,
`int_of_string` will take care of this for you.
**Strings.**
JoCalf strings are the same as OCaml's, but do not support Unicode.
There is some support for escaping provided by the parser:
```
# "\052" + "\050"
"42"
# "\n"
"\n"
```
**Booleans.**
JoCalf's Booleans are `true` and `false`.
**Undefined.**
JoCalf has a special value written `undefined` that is the only value
of its type. JoCalf uses `undefined` to represent the result of an
operation or language construct that does not produce a meaningful
value.
#### Parenthesized expressions
Expressions can be parenthesized, as in OCaml, with `( ... )`
or `begin ... end`.
*Implementation note:* You don't have to do anything to deal with
parenthesized expressions. The parser automatically handles them for
you. Remember that parentheses are not explicitly represented in the
AST; rather, they are implicitly represented by the shape of the tree.
#### Variables and let expressions
Let expressions bind variables to values. JoCalf
variable identifiers are a subset of OCaml identifiers:
the first character must be a lowercase letter or an underscore,
and any remaining letters must be a letter (of any case), a digit,
an underscore, or a single quote.
Evaluation of a variable produces the value to which that variable
is bound in the environment. If the variable is unbound, evaluation
of it produces the exception `"Unbound variable"`.
The let expression
```
let x = e1 in e2
```
evaluates `e1` to a value `v`, then evaluates `e2` in an environment
in which `x` is bound to `v`.
#### Functions and application
JoCalf functions are values, just as OCaml functions are.
But unlike OCaml, JoCalf functions are truly multi-argument,
and all arguments must be supplied at the point where the
function is applied; partial application is not permitted in JoCalf.
The anonymous function
```
fun (x1 ... xn) -> e
```
is a function value that takes `n` arguments named `x1`..`xn` and
has a body `e` in which those arguments are bound. An anonymous
function evaluates to a closure, which captures the environment
where the function was defined. It is a syntax error for a function
to have zero arguments, or to have arguments with duplicate names.
*Implementation note:* The parser already detects those syntax
errors for you.
Application is written, as in OCaml, by juxtaposing the function
and its arguments:
```
e0 e1 ... en
```
It is evaluated left-to-right. First `e0` is evaluated, and that
must produce either a closure or an external function. If it does not,
evaluation of the function application is done, and the exception
`"Application: not a function"` is produced.
If `e0` does produce a function (either a closure or an external),
then `e1` ... `en` are evaluated left-to-right.
If all of `e1` ... `en` evaluate to values `v1` ... `vn`,
and if the function takes exactly `n` arguments, then the function
is applied to those arguments, producing a result.
If the function takes a different number of arguments than were
provided, then the exception `"Application: wrong number of arguments"`
is produced. The check for the number of arguments occurs before
the arguments are themselves evaluated.
#### Control flow: if, sequence, while
The if expression
```
if e1 then e2 else e3
```
evaluates the guard `e1`, then conditionally evaluates a branch `e2` or `e3`
based on the result of the guard.
First, the guard `e1` is evaluated. If the guard is truthy,
then `e2` is evaluated. If the guard is falsy, then `e3` is evaluated.
It is also possible to omit the else branch:
```
if e1 then e2
```
That syntactic form evaluates equivalently to
```
if e1 then e2 else undefined
```
The sequence of expressions
```
e1; e2
```
evaluates left to right. First `e1` is evaluated, then `e2`.
We then have two values, `v1` and `v2`. The result
of the sequence is `v2`. The other value, `v1`, is ignored.
Likewise, a longer sequence of expressions
```
e1; e2; ...; en
```
is evaluated left to right, resulting in the final value `vn` corresponding
to `en`.
The while loop
```
while e1 do e2 done
```
is evaluated by instead evaluating the longer expression
```
if e1 then (e2; while e1 do e2 done)
```
and returning whatever result that longer expression produces. Note
how that recursively *unrolls* the while loop during interpretation
for as many iterations as are necessary.
#### Operators
*If any of this section seems unnecessarily complicated, reflect on
the fact that it is designed to be as faithful as possible to
JavaScript, which is an immensely popular language. Also, for five
minutes of PL-nerdy entertainment, watch the [Wat talk][wat] on even
weider definitions in dynamically typed languages like Ruby and
JavaScript.*
[wat]: https://www.destroyallsoftware.com/talks/wat
**Unary operators.** Application of a unary operator
```
uop e
```
first evaluates `e` to a value `v`. The result depends on the operator:
* `not`: If `v` is truthy, then `not v` is `false`. If `v` is
falsy, then `not v` is `true`.
* `-`: Application of unary negation `- v` first converts `v`
to an integer. That might result in an integer, or the `undefined`
value. In the former case, the result of the operation is
the result of applying OCaml's unary negation operator to the
integer. The result of applying unary negation to `undefined`
is `undefined`.
*Implementation note*: A minus sign `-` followed in the JoCalf source
code by an integer literal is itself an integer literal, not a unary
negation operation. See the implementation note above on integers.
* `typeof`: The typeof operator always produces a string.
The type of `undefined` is `"undefined"`. The type of a Boolean
is `"bool"`. The type of an integer is `"int"`. The type of
a string is `"string"`. The type of an object is `"object"`.
The type of a location is `"location"`. The type of
a closure or an extern is `"closure"`—that is,
typeof cannot distinguish between functions coded in JoCalf
and functions coded in OCaml.
**Binary operators.**
Application of a binary operator
```
e1 bop e2
```
proceeds left to right. First, it evaluates `e1`, then `e2`.
We now have two values `v1` and `v2`, respectively,
and the result depends on the operator:
* `+`: First convert `v1` and `v2` to primitive values. If either
one is a string, then convert the other to a string too, and the
result is the application of OCaml's string append operator to those
strings. Otherwise, further convert the primitives to integers, and
the result is the application of OCaml's integer addition operator to
those integers. But if conversion to integers causes either operand
to convert to `undefined`, then the result of the `+` operation is
likewise `undefined`.
* `-`, `*`, `/`, `mod`: Convert `v1` and `v2` to integers.
The result of the binary operation is the application of
OCaml's same binary operator to those integers, with the following
deviations:
- If conversion to integers causes either operand to convert
to `undefined`, then the result of the binary operation is
likewise the `undefined` value.
- If the binary operation is `/` or `mod` and `v2` is the integer
`0`, then the result is the JoCalf exception `"Division by zero"`.
* `<`, `<=`, `>`, `>=`: Convert `v1` and `v2` to primitives.
If they are both strings, then the result is the application of
OCaml's same comparison operator to those strings. (Recall,
OCaml uses lexicographic comparison on strings.) Otherwise,
further convert the primitives to integers, and the result is the
application of OCaml's same comparison operator to those integers.
But if conversion to integers causes either operand to convert to
`undefined`, then the result of the comparison operation is `false`.
Note that this is different behavior than `+` on undefined operands.
* `=`: Follow these steps, in order.
- If `v1` and `v2` both `undefined`, return `true`.
- If `v1` and `v2` are both Booleans, or are both integers, or
are both strings, then return the result of OCaml's `=` operator
on those values.
- If `v1` and `v2` are both locations, then dereference both locations,
and recurse to determine whether the resulting values are equal.
- If `v1` and `v2` are both objects, then return `true` if both
objects have the same set of fields, and those fields are recursively
equal in both objects; otherwise, return `false`.
- If `v1` and `v2` are both closures or both externs, then return `false`.
- If either of the two values is an integer, and the other value is
either a string or a Boolean, then convert the other value to an
integer (which might actually produce `undefined`), and recurse to
compare them.
- Otherwise, return `false`.
* `!=`: Compare `v1` to `v2` using `=`, then apply JoCalf's unary
`not` to that result.
* `==`: Follow these steps. No type conversions are performed at
any point during this operation.
- If `v1` and `v2` are both `undefined`, return `true`.
- If `v1` and `v2` are both an integer, or are both strings, or
are both Booleans, then return whether they are equal according
OCaml's `=` operator.
- If `v1` and `v2` are both locations, then return whether
they are the same location. Do not dereference the locations
to check their contents.
- If `v1` and `v2` are both objects, then return `true` if both
objects have the same set of fields, and those fields are
recursively equal (in the `==` sense) in both objects; otherwise,
return `false`.
- If `v1` and `v2` are both closures, return `false`. Do not
check their code or environment; it doesn't matter whether
they are equal or not.
- Otherwise, return `false`.
* `!==`: Compare `v1` to `v2` using `==`, then apply JoCalf's unary
`not` to that result.
**Short-circuit operators.**
Application of a short-circuit operator
```
e1 && e2
e1 || e2
```
works differently than application of a binary operator, because
the right-hand operand might never be evaluated, depending on the
result of the left-hand operand.
Start by evaluating `e1` to a value `v1`. Evaluation then proceeds
as follows:
* `&&`: If `v1` is falsy, return `v1`. Do not evaluate `e2`.
If `v1` is truthy, continue to evaluate `e2`.
* `||`: If `v1` is truthy, return `v1`. Do not evaluate `e2`.
If `v1` is falsy, continue to evaluate `e2`.
If evaluation did not already return, now evaluate `e2` to a value `v2`.
Return `v2` as the result of the short-circuit operation.
Note that in all cases, the expression that finally determined
whether the operator as a whole is truthy or falsy is what
gets returned. Unlike OCaml's short-circuit operators, that
expression need not itself be a Boolean.
#### References
Evaluation of a reference creation
```
ref e
```
first evaluates `e` to a value `v`. Then it allocates a new location
`loc` in the store, and initializes that location to have value `v`.
Evaluation of a dereference
```
!e
```
first evaluates `e` to a value `v`. If that value is not a location, the
result of the dereference is the `undefined` value. But if `v` is a
location, return the value stored at that location. If the location is
not actually allocated in the store, then the result is unspecified (and
in fact this case should be impossible.)
Evaluation of an assignment
```
e1 := e2
```
first evaluates `e1`, then evaluates `e2`. We now have two values, `v1`
and `v2`. If `v1` is not a location, the result of the assignment is
the exception `"Assignment to non-location"`. If `v1` is a location but
is currently unbound in the store, then the result is unspecified (and
in fact this case should be impossible). Otherwise, store `v2` at
location `v1`, and return `v2` as the result of the assignment.
#### Exceptions
The throw expression
```
throw e
```
first evaluates `e`. If that evaluation produces an exception `exn v'`,
then `exn v'` is the result of evaluating `throw e`. That is, the inner
exception thrown by `e` itself is what actually propagates. But if
`e` evaluates to a value `v`, then `throw e` evaluates to the
exception `exn v`.
The try expression
```
try e1 catch x handle e2
```
first evaluates `e1`. If that produces a value `v`, then `v` is
the result of evaluating the try expression. But if evaluation of
`e1` produces an exception `exn v`, then evaluation continues
by binding `x` to `v` in the environment, evaluating `e2` in
that extended environment, and returning the result of that evaluation.
The try-finally expression
```
try e1 catch x handle e2 finally e3
```
first evaluates `try e1 catch x handle e2`, obtaining a result `r`. It
then evaluates `e3`. If evaluation of `e3` produces an exception, then
that exception is the result of the try-finally and `r` is ignored.
Otherwise, `r` is the result and the value of `e3` is ignored.
#### Objects
The object expression
```
{s1:e1, ... sn:en}
```
evaluates its field expressions `e1`...`en` left to right,
producing values `v1`...`vn`. The result of the object
expression is the object value
```
{s1:v1, ..., sn:vn}
```
The get-field expression
```
e1[e2]
```
first evaluates `e1` and `e2` to values `v1` and `v2`. It then
converts `v2` to a primitive, then converts that primitive to a string
`s`. It then checks whether `v1` is an object; if not, the result of
the get-field expression is the `undefined` value. But if `v1` is an
object, then the field `s` is found in that object. If there is no
such field, then the result of the get-field expression is the
`undefined` value. If there is a field `s` and it is bound to a value
`v`, then `v` is the result of the get-field expression.
*Implementation note:* The alternative form `e.x`
of the get-field expression is already handled for you by the
parser, which transforms it into `e["x"]` for you. So you don't
need to do anything extra to interpret this alternative form.
The update-field expression
```
e1[e2] <- e3
```
first evaluates `e1`, `e2`, and `e3` to values `v1`, `v2`, and
`v3`. It then converts `v2` to a primitive, then converts that
primitive to a string `s`. It then checks whether `v1` is an object;
if not, the result of the update-field expression is `v3`. But if `v1`
is an object, then its `s` field is bound to `v3`, and the updated
object (not `v3`) is the result of the update-field expression. It
doesn't matter whether `v1` already had an `s` field or not: if it
didn't, the field is created; if it did, the field is updated.
The delete-field expression
```
delete e1[e2]
```
first evaluates `e1` and `e2` to values `v1` and `v2`. It then converts
`v2` to a primitive, then converts that primitive to a string `s`. It
then checks whether `v1` is an object; if not, the result of the
delete-field expression is `v1`. But if `v1` is an object, then its `s`
field is removed, and the updated object is the result of the
delete-field expression. It doesn't matter whether `v1` had an `s`
field or not: if it didn't, the object doesn't change; if it did, the
field is now deleted.
## External function library
There are seven external functions that must be bound in the initial
environment of every JoCalf program. They comprise the "pervasives"
of JoCalf.
* `is_int` takes one argument. If that argument is an integer,
the function returns that integer. Otherwise, the function returns false.
* `is_bool` takes one argument. If that argument is a Boolean,
the function returns that Boolean. Otherwise, the function returns false.
* `is_string` takes one argument. If that argument is a string,
the function returns that string. Otherwise, the function returns false.
* `is_defined` takes one argument. If that argument is the `undefined`
value, the function returns false. Otherwise, the function
returns the argument.
* `is_prim` takes one argument. If that argument is an integer, string,
Boolean, or the `undefined` value, the function returns that value.
Otherwise, the function returns false.
* `length` takes one argument. If that argument is a string,
the function returns the length of that string, which is the number
of characters in it. If the argument is not a string, the function
returns `undefined`.
* `has_field` takes two arguments. The first must be an object, and the
second must be a string; if either of those fails to hold, the function
returns `undefined`. Otherwise, the function returns true if the
object has a field with that string as its name, and false if it does not.
None of these external functions modifies the JoCalf state, though of course
evaluation of their arguments could do so.