JoCalf Language Manual

JoCalf is an language that is most similar to OCaml, but also inherits features from JavaScript and Racket. From OCaml, JoCalf gets its basic syntax, higher-order functions, and references. From Racket, an untyped functional language in the Scheme family, JoCalf gets functions that truly take multiple arguments (unlike OCaml functions, which take just a single argument, despite syntactic sugar that makes it appear otherwise). From JavaScript, JoCalf gets operators that convert their arguments’ run-time types, exceptions that can be thrown and caught, and objects with fields.

This manual informally describes evaluation, but it mostly elides the details of how exceptions and changes to the state propagate. Those details can be found in the formal semantics.

Table of Contents:

Tutorial

This brief tutorial shows off some features of JoCalf using its toplevel (aka REPL). Like the OCaml toplevel, JoCalf’s toplevel accepts phrases, which are either expressions or let definitions. Terminating a phrase with double semicolon is optional.

Basics

JoCalf’s operators are more flexible than OCaml’s, allowing arguments of various types to be combined. But sometimes the result is a special value undefined.

# 1 + 1
2

# "1" + "1"
"11"

# 31 + "10"
"3110"

# 1 * "zzz"
undefined

Let expressions and definitions are much the same as OCaml.

# let x = 1+1 in x+x
4

# let x = 1
1

# x
1

# y
Exception: "Unbound variable"

If expressions are more flexible, allowing non-Boolean guard expressions, missing else branches, and branches of differing types. The short-circuit conjunction and disjunction operators return values that might not be Boolean.

# if true then 42 else "forty two"
42

# if 3110 then "yay" else "boo"
"yay"

# if 0 then "yay"
undefined

# true && 1
1

# 1 && true
true

# "cool cool" || false
"cool cool"

Functions and external functions

Functions are always anonymous and permit multiple arguments. Moreover, functions may not be partially applied; all arguments must be supplied when the function is applied.

# let add = fun (x y) -> x + y
<closure>

# add 2 3
5

# add 1
Exception: "Application: wrong number of arguments"

The parentheses around the arguments in a function expression are mandatory.

# let add = fun x y -> x + y
Syntax error, line 1, characters 14-15: x

Let bindings may be used to create recursive functions.

# let rec fact (n) = if n = 0 then 1 else n * (fact (n-1))
<closure>

# fact 5
120

External functions are functions that can be used in JoCalf but are actually implemented in OCaml—similarly to how OCaml provides some functions in its standard library that are actually implemented in C. Externals are needed when the function would be impossible to implement directly in JoCalf.

# length "hello"
5

# is_int 42
42

# is_int "42"
false

Imperative features

References, sequencing, and while loops are similar to OCaml. Assignment returns the value of its right hand side.

# let inc = fun (r) -> r := !r + 1
<closure>

# let x = ref 0
<location>

# x := 10
10

# inc x; inc x; inc x
13

# !x
13

# while !x > 0 do x := !x-1 done
undefined

# !x
0

Exceptions are an amalgam of JavaScript and OCaml:

# throw 42
Exception: 42

# try throw "oops" catch exc handle exc + " caught"
"oops caught"

# try throw 1 catch x handle throw 3 finally throw 2
Exception: 2

Objects

Finally, JoCalf has objects. The object system is relatively immature (as befits a young language): it lacks methods, inheritance, and the this keyword (implicit or explicit). Hence objects are mostly just (functional) records. Their fields can be accessed with a couple different syntactic forms. Field names are always strings, but the field to be accessed can be computed at run time.

# let o = {"x": 1, "1": 42, "dbl": fun (z) -> 2*z}
<object>

# o["x"]
1

# o.x
1

# o["1"]
42

# o[3-2]
42

# o["d"+"bl"] 10
20

# let o' = {"x": 1, "f" : fun (y) -> x+y}
<object>

# o'.g
undefined  (* because g is not a field of o' *)

# o'.f 2
Exception: "Unbound variable"  (* because x is not bound in the function *)

Syntax

The abstract syntax of JoCalf is as follows:

(* expressions *)
e ::=
    | i | b | s | undefined   (* constants *)
    | (e) | begin e end       (* parenthesized expressions *)
    | x                       (* variables *)
    | let x = e1 in e2        (* let expressions *)
    | let rec f (x1 ... xn) = e1 in e2  (* recursive let expressions *)
    | fun (x1 ... xn) -> e    (* functions *)
    | e0 ... en               (* application *)
    | if e1 then e2 [else e3] (* control flow: if, sequence, while *)
    | e1; ...; en
    | while e1 do e2 done
    | uop e | e1 bop e2       (* unary and binary operators *)
    | e1 && e2 | e1 || e2
    | ref e | !e | e1 := e2   (* references *)
    | throw e                 (* exceptions *)
    | try e1 catch x
      handle e2 [finally e3]
    | {s1:e1, ..., sn:en}     (* objects *)
    | e1[e2] | e1.e2
    | e1[e2] <- e3
    | delete e1[e2]

(* definitions *)
d ::=
    | let x = e         (* let definition *)
    | let rec f (x1 ... xn) = e  (* recursive let definition *)

(* unary operators *)
uop ::=
    | not | - | typeof

(* binary operators *)
bop ::=
    | + | - | * | / | mod
    | < | <= | > | >=
    | = | != | == | !==

i ::= integers
b ::= booleans
s ::= strings
x ::= identifiers

Values

JoCalf values are defined as follows:

(* values *)
v ::=
    | i | s | b | undefined   (* constant values *)
    | {s1:v1, ..., sn:vn}     (* object values *)
    | loc | cl | extern       (* inexpressible values *)

loc    ::= memory locations
cl     ::= closures
extern ::= external functions

The final three kinds of values—locations, closures, and external functions—cannot be directly expressed in JoCalf, but result from evaluation. External functions are functions that are available for use by JoCalf programs but are not implemented in JoCalf. Instead, they are implemented in OCaml as part of the interpreter itself.

Conversion of values. JoCalf allows run-time conversion between different types of values.

Evaluation

There are three kinds of evaluation: evaluation of expressions, evaluation of definitions, and evaluation of phrases.

Expressions. Evaluation of a JoCalf expression requires two additional pieces of information: the environment, giving a mapping from free variables to their values, and the state, giving a mapping from memory locations to their values. Evaluation produces a result, which is either a value or an exception, as well as an updated state.

(* results *)
r ::=
    | v      (* value *)
    | exn v  (* exception carrying a value *)

Definitions. Evaluation of a let definition in the toplevel is much like evaluation of an expression, but it additionally introduces new bindings into the environment.

Phrases. A phrase is either a definition or an expression, and evaluates accordingly.

Constants

Integers. JoCalf integers are mostly the same as OCaml’s. On a 64-bit implementation, that means they range from \(−2^{62}\) to \(2^{62}−1\). Integer literals may be written in decimal, hexadecimal, octal, and binary:

# 42
42

# 0x2a
42

# 0o52
42

# 0b101010
42

Implementation note: the JoCalf parser produces an OCaml string, not an int, for integers. It is up to your interpreter to instead produce an integer. There are three problems to solve here.

  1. The parser needs your help to recognize negative integer literals. For example, given the JoCalf expression -1, the parser produces a unary negation applied to "1". A minus sign - followed in the JoCalf source code by an integer literal should itself be an integer literal, not a unary negation operation. So your AST factory should detect this situation, eliminate the unary negation, and tranform the string to "-1".

  2. Sometime before or during evaluation, those strings need to become an OCaml int. You could simpy use OCaml’s int_of_string to transform the string to an int.

  3. The parser needs your help to recognize whether integer literals are between the bounds of min_int and max_int, inclusive. Assuming you have solved the two problems above correctly, int_of_string will take care of this for you.

Strings. JoCalf strings are the same as OCaml’s, but do not support Unicode. There is some support for escaping provided by the parser:

# "\052" + "\050"
"42"

# "\n"
"\n"

Booleans. JoCalf’s Booleans are true and false.

Undefined. JoCalf has a special value written undefined that is the only value of its type. JoCalf uses undefined to represent the result of an operation or language construct that does not produce a meaningful value.

Parenthesized Expressions

Expressions can be parenthesized, as in OCaml, with ( ... ) or begin ... end.

Implementation note: You don’t have to do anything to deal with parenthesized expressions. The parser automatically handles them for you. Remember that parentheses are not explicitly represented in the AST; rather, they are implicitly represented by the shape of the tree.

Variables, Let Expressions, and Let Definitions

Let expressions bind variables to values. JoCalf variable identifiers are a subset of OCaml identifiers: the first character must be a lowercase letter or an underscore, and any remaining letters must be a letter (of any case), a digit, an underscore, or a single quote.

Evaluation of a variable produces the value to which that variable is bound in the environment. If the variable is unbound, evaluation of it produces the exception "Unbound variable".

The let expression

let x = e1 in e2

evaluates e1 to a value v, then evaluates e2 in an environment in which x is bound to v.

The recursive let expression

let rec f (x1 ... xn) = e1 in e2

creates a closure to represent the recursive function f, then evaluates e2 in an environment in which f is bound to that closure.

The let definition

let x = e

evaluates e to a value v, and binds x to v in the environment.

The recursive let definition

let rec f (x1 ... xn) = e

creates a closure to represent f, and binds f to that closure in the environment.

Implementation note: To create the circular structures needed by recursive functions, you can use OCaml’s references. First create a closure containing a (reference to) a dummy environment, then use assignment to change the environment into one where f maps to the very same closure. This technique is called backpatching.

Functions and Application

JoCalf functions are values, just as OCaml functions are. But unlike OCaml, JoCalf functions are truly multi-argument, and all arguments must be supplied at the point where the function is applied; partial application is not permitted in JoCalf.

The anonymous function

fun (x1 ... xn) -> e

is a function value that takes n arguments named x1..xn and has a body e in which those arguments are bound. An anonymous function evaluates to a closure, which captures the environment where the function was defined. It is a syntax error for a function to have zero arguments, or to have arguments with duplicate names.

Implementation note: The parser already detects those syntax errors for you.

Application is written, as in OCaml, by juxtaposing the function and its arguments:

e0 e1 ... en

It is evaluated left-to-right. First e0 is evaluated, and that must produce either a closure or an external function. If it does not, evaluation of the function application is done, and the exception "Application: not a function" is produced.

If e0 does produce a function (either a closure or an external), then e1en are evaluated left-to-right. If all of e1en evaluate to values v1vn, and if the function takes exactly n arguments, then the function is applied to those arguments, producing a result.

If the function takes a different number of arguments than were provided, then the exception "Application: wrong number of arguments" is produced. The check for the number of arguments occurs before the arguments are themselves evaluated.

Control Flow: If, Sequence, While

The if expression

if e1 then e2 else e3

evaluates the guard e1, then conditionally evaluates a branch e2 or e3 based on the result of the guard.

First, the guard e1 is evaluated. If the guard is truthy, then e2 is evaluated. If the guard is falsy, then e3 is evaluated.

It is also possible to omit the else branch:

if e1 then e2

That syntactic form evaluates equivalently to

if e1 then e2 else undefined

The sequence of expressions

e1; e2

evaluates left to right. First e1 is evaluated, then e2. We then have two values, v1 and v2. The result of the sequence is v2. The other value, v1, is ignored.

Likewise, a longer sequence of expressions

e1; e2; ...; en

is evaluated left to right, resulting in the final value vn corresponding to en.

The while loop

while e1 do e2 done

is evaluated by instead evaluating the longer expression

if e1 then (e2; while e1 do e2 done)

and returning whatever result that longer expression produces. Note how that recursively unrolls the while loop during interpretation for as many iterations as are necessary.

Operators

If any of this section seems unnecessarily complicated, reflect on the fact that it is designed to be as faithful as possible to JavaScript, which is an immensely popular language. Also, for five minutes of PL-nerdy entertainment, watch the Wat talk on even weider definitions in dynamically typed languages like Ruby and JavaScript.

Unary operators. Application of a unary operator

uop e

first evaluates e to a value v. The result depends on the operator:

Binary operators. Application of a binary operator

e1 bop e2

proceeds left to right. First, it evaluates e1, then e2. We now have two values v1 and v2, respectively, and the result depends on the operator:

Short-circuit operators. Application of a short-circuit operator

e1 && e2
e1 || e2

works differently than application of a binary operator, because the right-hand operand might never be evaluated, depending on the result of the left-hand operand.

Start by evaluating e1 to a value v1. Evaluation then proceeds as follows:

If evaluation did not already return, now evaluate e2 to a value v2. Return v2 as the result of the short-circuit operation.

Note that in all cases, the expression that finally determined whether the operator as a whole is truthy or falsy is what gets returned. Unlike OCaml’s short-circuit operators, that expression need not itself be a Boolean.

References

Evaluation of a reference creation

ref e

first evaluates e to a value v. Then it allocates a new location loc in the store, and initializes that location to have value v.

Evaluation of a dereference

!e

first evaluates e to a value v. If that value is not a location, the result of the dereference is the undefined value. But if v is a location, return the value stored at that location. If the location is not actually allocated in the store, then the result is unspecified (and in fact this case should be impossible.)

Evaluation of an assignment

e1 := e2

first evaluates e1, then evaluates e2. We now have two values, v1 and v2. If v1 is not a location, the result of the assignment is the exception "Assignment to non-location". If v1 is a location but is currently unbound in the store, then the result is unspecified (and in fact this case should be impossible). Otherwise, store v2 at location v1, and return v2 as the result of the assignment.

Exceptions

The throw expression

throw e

first evaluates e. If that evaluation produces an exception exn v', then exn v' is the result of evaluating throw e. That is, the inner exception thrown by e itself is what actually propagates. But if e evaluates to a value v, then throw e evaluates to the exception exn v.

The try expression

try e1 catch x handle e2

first evaluates e1. If that produces a value v, then v is the result of evaluating the try expression. But if evaluation of e1 produces an exception exn v, then evaluation continues by binding x to v in the environment, evaluating e2 in that extended environment, and returning the result of that evaluation.

The try-finally expression

try e1 catch x handle e2 finally e3

first evaluates try e1 catch x handle e2, obtaining a result r. It then evaluates e3. If evaluation of e3 produces an exception, then that exception is the result of the try-finally and r is ignored. Otherwise, r is the result and the value of e3 is ignored.

Objects

The object expression

{s1:e1, ... sn:en}

evaluates its field expressions e1en left to right, producing values v1vn. The result of the object expression is the object value

{s1:v1, ..., sn:vn}

The get-field expression

e1[e2]

first evaluates e1 and e2 to values v1 and v2. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the get-field expression is the undefined value. But if v1 is an object, then the field s is found in that object. If there is no such field, then the result of the get-field expression is the undefined value. If there is a field s and it is bound to a value v, then v is the result of the get-field expression.

Implementation note: The alternative form e.x of the get-field expression is already handled for you by the parser, which transforms it into e["x"] for you. So you don’t need to do anything extra to interpret this alternative form.

The update-field expression

e1[e2] <- e3

first evaluates e1, e2, and e3 to values v1, v2, and v3. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the update-field expression is v3. But if v1 is an object, then its s field is bound to v3, and the updated object (not v3) is the result of the update-field expression. It doesn’t matter whether v1 already had an s field or not: if it didn’t, the field is created; if it did, the field is updated.

The delete-field expression

delete e1[e2]

first evaluates e1 and e2 to values v1 and v2. It then converts v2 to a primitive, then converts that primitive to a string s. It then checks whether v1 is an object; if not, the result of the delete-field expression is v1. But if v1 is an object, then its s field is removed, and the updated object is the result of the delete-field expression. It doesn’t matter whether v1 had an s field or not: if it didn’t, the object doesn’t change; if it did, the field is now deleted.

External function library

There are seven external functions that must be bound in the initial environment of every JoCalf program. They comprise the “pervasives” of JoCalf.

None of these external functions modifies the JoCalf state, though of course evaluation of their arguments could do so.