The CS 6120 Course Blog

Bril Extension for Algebraic Data Types

by Will Smith

The result of my final project was the addition of support for product types (i.e., tuples) and sum types (i.e., variants) to Bril. I completed each of the following:

Additionally, I added support for the new instructions to the TypeScript, OCaml, and Rust libraries as required to do the above.

The code is available on the algebraic-types branch of my fork of the Bril repo.

The remainder of this blog post is devoted to documentation, design choices, and evaluation.

Design: algebraic types

A product type is a type that acts as a block containing zero or more other types (a Cartesian product). Commonly, the two case is called a pair and the zero case is called unit.

A sum type is a type that at any given time may contain any of one or more types, where which type a given value has may be inspected at run time (a tagged union). Each type in a sum type is referred to as a constructor. Note that the zero case is uninstantiable, so we do not allow it.

Product and sum types are represented in Bril as new kinds of parameterized types. In JSON, they are as follows:

In text, they are as follows:

I chose to make the sum types anonymous, as recommended by Prof. Sampson. Many high-level languages with sum types, like OCaml, have named variants with named constructors rather than anonymous sum types with anonymous constuctors, but adding named types directly to Bril would have been perhaps unnecessary and unnecessarily complicated.

I originally planned to support recursive types, like the following OCaml definition of a list type:

type intlist =
  | Nil
  | Cons of int * intlist

Unfortunately, it is not clear to me how to easily support recursive types without named sum types, let alone potential other complications with recursive types that I did not consider.

Here are some examples of types that are supported by my extension:

Note that while sum types with duplicate constructors (e.g., sum<int, int>) can be declared with the above type definitions, as we will see, it is not possible to differentiate between different constructors of the same type at run time, so such sum types are not useful.

I considered adding a subtyping rule for sum types that would permit using, for example, sum<int, bool> in places that require sum<int, bool, product<>>. I decided to leave this out because it seemed excessive for an IR and difficult to implement.

Design: new instructions

The basic idea was to add a pair of new instructions for both product and sum types; one for creating values of the type, and one for examining those values.

Product types

pack

The pack instruction is used to create a tuple. It takes in an arbitrary number of values to pack into the tuple.

The destination type of this instruction must be a product type with an arity equal to the number of arguments, and if the destination type is product<t1, ..., tn>, then the ith argument must be of type ti.

The JSON format of the pack instruction is a follows:

{
    "op": "pack",
    "args": ["a1", ..., "an"],
    "dest": "d",
    "type": {"product": [t1, ..., tn]}
}

The textual format of the pack instruction is as follows:

d: product<t1, ..., tn> = pack a1 ... an

unpack

Conversely, the unpack instruction is used to extract an individual value from a product-typed value. It takes two arguments: a variable (which must be a tuple), and the (zero-indexed) index of the value from the tuple to extract.

Given that the input tuple has type product<t1, ..., tn> and the index is i, the destination must have type ti.

The JSON format of the unpack instruction is as follows:

{
    "op": "unpack",
    "args": ["a", "i"],
    "dest": "d",
    "type": "t"
}

The text format of the unpack instruction is as follows:

d: t = unpack a i

Example

Here is a sample Bril program that uses tuples:

@main() {
  one: int = const 1;
  two: int = const 2;
  tuple: product<int, int> = pack one two;
  one_again = unpack tuple 0;
}

Sum types

construct

The construct instruction is used to construct sum types. It takes as its sole argument the value to use. The semantics of the instruction dictate that constructor to use is determined automatically by examining the run-time type of the argument.

The destination must have a sum type and the type of the argument must correspond to one of the constructors of that sum type.

The JSON format of the construct instruction is as follows:

{
    "op": "construct",
    "args": ["a"],
    "dest": "d",
    "type": {"sum": [t1, ..., tn]}
}

The text format of the construct instruction is as follows:

d: sum<t1, ..., tn> = construct a

In theory, it could have been possible to omit construct entirely if there were a sum type subtyping rule in place, but such a rule would probably have made things more, not less, confusing.

destruct

The destruct instruction is used to execute different code based on the constructor used to create a sum-typed value. It takes a sum-typed value as an argument and a number of labels, one for each constructor in the sum type.

The destination of the instruction is a variable name with a sum type. The semantics of the instruction dictate that the control flow jumps to the correct branch based on the argument and that the destination value gets assigned the value within the constructor.

The destination will in fact not have the same sum type as the argument; it will have the type carried by one of the constructors, which is different for each label. This is confusing because the destination is annotated with a type that it cannot have, but I chose this approach because the Bril interpreter needs to know the available constructors in order to select the correct label to jump to.

The JSON format of the destruct instruction is as follows:

{
    "op": "destruct",
    "args": ["a"],
    "labels": ["l1", ..., "ln"],
    "dest": "d",
    "type": {"product": [t1, ..., tn]}
}

The text format of the destruct instruction is as follows:

d: sum<t1, ..., tn> = destruct v .l1 ... .ln

A common use case is some way of representing an empty, or unit, value. For this it is recommended to use the empty product type, i.e., product<>. For example, to represent an optional int, one could use the following type: sum<int, product<>>.

I debated whether to even include a powerful instruction like destruct at all. I could have instead offered an instruction to get the index of the constructor used for a value and another instruction for casting a sum type down to one of the constructor-carried types, using a series of branch instructions to replicate the behavior of destruct.

The reason that I opted for the destruct approach is that it makes it possible, and not terribly difficult, to type-check the usage of sum types in a Bril program. With the alternative approach, it is not clear to me that it is even possible to perform type-checking of things like the casting instruction described above. destruct does seem rather large and clunky for an IR, especially compared to the other instructions (it's like a whole bunch of br instructions in one!), but I think the ability to perform type-checking makes it worthwhile.

Example

Here is a sample Bril program that uses sum types:

@main() {
  one: int = const 1;
  sum: sum<int, bool> = construct one
  val: sum<int, bool> = destruct sum .int .bool
.int:
  res: int = add val one
  jmp .end
.bool:
  res: int = const -1
.end
}

Implementation: type checker

The type checker is an OCaml program that accepts a JSON-formatted Bril program on standard input. It exits cleanly if the program is well-typed and crashes with a (possibly-helpful) error message if it is not well-typed.

The program can be built with make in the algebraic-types/type-check directory and run with ./main.byte from that directory.

The type checker does not support any extensions aside from algebraic types, but it is fully functional with all of core Bril.

Algorithm

The type checker algorithm is a dataflow analysis. I worked off of the dataflow analysis framework that I made for lesson 4.

The work set of the dataflow analysis is a mapping from variables to types. The meet operator reverts to knowing nothing about a variable in the event of conflicting types and the transfer function operates straightforwardly based on the semantics of each instruction.

The tricky part is handling the destruct instruction. The analysis works by storing a bit of extra information in the work set if the previous instruction was a destruct and then assigning the correct type to the variable based on the label that occurs next. This approach seems reliable with only minimal changes needed to the dataflow analysis framework.

The type checker does not support one label/basic block being the target of different match constructors with different type payloads. The type checker rejects all programs that attempt to do this because it was deemed not worth adding unnecessary things to the fancy destruct handling. In practice, real programs should never do this because it is not a good idea, so I do not consider this to be a problem.

Implementation: Rust compiler

The Rust compiler is a Rust program that accepts a Rust source file on standard input and outputs a JSON-formatted Bril program on standard output that implements the input Rust code. It supports a small but useful subset of the Rust language syntax.

The program can be built with cargo build from the bril-rs directory and run with cargo run from that directory.

Critically, the compiler makes no attempt to follow the semantics of Rust; for example, there is no type checker or borrow checker. The compiler just leverages Rust syntax to make it easier to write Bril programs.

The compiler is akin to the TypeScript compiler. I decided to make a new compiler in Rust rather than expanding on the existing TypeScript one because I wanted to compile a language that had named variant types. (I also wanted to write the compiler in a language that I preferred.)

The compiler uses the Rust syn package to parse the source code.

Supported Rust features

The Rust compiler supports the following Rust types:

Note that the compiler does not support variants with multiple constructors that have the same payload type. For example, here is an enum that is not permitted:

enum Bad {
    FirstConstructor(i32),
    SecondConstructor(i32),
}

The reason for this restriction is that the underlying Bril representation is an anonymous sum type and the interpreter can't tell the difference between two constructors unless their payloads have different types.

The Rust compiler supports the following Rust syntactic constructs:

Expressions can be arbitrarily nested because the compiler performs a primitive form of tiling, generating arbitrary temporary variables as necessary. The resulting Bril code is very bloated as a result and could benefit tremendously from some basic optimization passes.

Let bindings must have type annotations attached, for example:

let x: i32 = 42;

This restriction comes from the fact that all Bril value instructions require a type. It would be possible to add type inference to the compiler, but that would have been a lot more work.

Another effect of this restriction is that arbitrary sub-expressions are not allowed anywhere that the type cannot be determined (only a variable is permitted). This occurs with an expression that is being matched on, so the compiler does not support matching on arbitrary expressions. Note also that only single-level matching is supported (no patterns beyond constructors). The matching support is essentially a thin wrapper around the destruct instruction.

The compiler also does not support arbitrary expressions in println! macros, but that is because syn does not fully parse macro contents. Also note that println! macros do not use format strings; they aren't actually Rust println! macros, behind the scenes they just transform something like println!(a, b, b) into print a b c in Bril.

Sample

Here is a sample Rust program (safe_division.rs) that can be compiled to Bril:

fn main(x: i32, y: i32) {
    let res: Res = safe_divide(x, y);
    match res {
        Res::Res(res) => println!(res),
        Res::DivideByZero => {
            let pr: i32 = -1;
            println!(pr)
        }
    }
}

enum Res {
    Res(i32),
    DivideByZero,
}

fn safe_divide(x: i32, y: i32) -> Res {
    if y == 0 {
        return Res::DivideByZero;
    } else {
        return Res::Res(x / y);
    }
}

Challenges

The design for the new instructions seemed fairly obvious to me, so I formalized them but otherwise didn't spend much time working on them.

Adding brili support was also fairly straightforward, but I was frequently frustrated by TypeScript because it was new to me and I found some of the typing rules rather non-intuitive.

The type checker was also not terribly difficult, although it was a bit tedious because I had to add rules for every Bril instruction. Figuring out how to type-check match instructions within the dataflow analysis framework was a bit challenging.

The Rust compiler was by far the most difficult and time-consuming part of the project. First, I spent a fair amount of time trying to figure out how to parse Rust source code before settling on the syn package. Figuring out how to use the parsed AST was also an involved process because it's a full AST for a "real" language, not just a toy language like Bril. I also spent quite a bit of time figuring out how to recursively tile expressions and how to keep track of variant types. It's also worth mentioning that I'm still a little new to Rust, so I was occasionally roadblocked by things like borrowing issues, although I am definitely getting better.

Evaluation

Correctness testing

I created three sample Bril programs using algebraic types in the algebraic-types/samples directory as well as two sample Rust programs in the algebraic-types/samples-rs directory.

I tested the correctness of the brili additions by running the sample Bril programs and observing the output. The samples provide complete coverage of the added instructions, which isn't terribly difficult because there are only four instructions.

I tested the correctness of the type checker by running it on some of the provided Bril benchmarks as well as the Bril samples for testing algebraic types. I also ran it on modified versions of these programs that were designed to fail the type checker to verify that it failed accordingly. I probably have not achieved complete coverage with respect to the core Bril instructions, but I have tested extensively with the algebraic type instructions and definitely have achieved complete coverage there.

I tested the Rust compiler in several ways. I only tested it with two sample programs, but the two programs together cover all of the supported Rust syntax. I converted the output to the textual Bril format and examined the code to verify its correctess; I then ran it through the type checker to verify that the output programs have valid types; and finally, I ran it through brili to ascertain that the compiled programs output the expected values.

Reflection

I think that using a powerful match instruction was the right decision. Having the ability to perform type-checking made me a lot more confident in the correctness of the Rust compiler.

I do think that it would have been preferable to make the construct instruction require explicitly specifying the index of the constructor to construct. This approach would have been better for a number of reasons: it would have eliminated the dependency on run-time knowledge of types; it would have made the Rust compiler's job easier; it would have made the code more explicit and readable; and it would have made it possible to use variants with multiple constructors that carry the same payload type. When I realized this shortcoming, I didn't have enough time to go back and change the project, unfortunately. That being said, the current implementation is not horribly wrong, I think it is just sub-optimal.