Bril MLIR Dialect
This blog post is about Bril2MLIR, a project which translates between Bril JSON and a custom Bril MLIR dialect.
Code:
Overview
In our original project proposal, we wanted to implement and compare two different register allocation approaches in Bril, namely the linear scan register allocation algorithm and an SSA-based register allocation described in this paper. Without too much effort, we were able to familiarize ourselves with the linear scan and come up with our own implementation, utilizing our existing dataflow analysis and maintaining spilled variables (using id instructions before they were operated on). However, it turned out that the SSA-based register allocation mentioned in the paper was significantly harder to implement and ultimately we scrapped this project idea as a whole.
We decided to pivot our project to implementing translation of SSA-based Bril JSON programs into MLIR and vice versa. MLIR is a powerful and extensible compiler infrastructure which supports custom dialects and transformations across different levels of abstraction, making it a popular and flexible system. This pivot allowed us to explore the transformation from the lightweight Bril IR to the more involved MLIR form. To do this, we created a dialect for Bril consisting of custom operations for instructions such as id, nop, and print, and we built on existing MLIR dialects such as arith for common operations like add and sub. In doing so, we were able to represent all of the core Bril instructions, including all arithmetic and control flow. The first direction of our translation was from Bril JSON to MLIR, which involved parsing the input program into a traversable representation, constructing the corresponding MLIR operations, and emitting a completed ModuleOp with correctly assembled functions and basic blocks.
To organize our project, we created a fork of the llvm-project which was placed inside the bril repository in a new directory called bril-mlir. This structure allowed us to build our implementation with the tools provided by the LLVM Project in combination with the existing bril benchmarks and commands. We created a Dialect class to register our custom Bril MLIR operations, with definitions in a .td file. We used the MLIR C++ API to develop the MLIR generator and translator.
Bril to MLIR
The MLIR Toy tutorial provides a walkthrough of building a simple language using MLIR with parsing and AST construction. Using this as a starting point, we adapted a similar structure to create a dialect for Bril by defining Bril-specific operations in TableGen, implementing the dialect in C++, and extending the MLIR generation logic to emit those custom operations. For many of the core operations like arithmetic (add, mul, etc…), control flow, and comparisons, we leveraged existing MLIR dialects like arith and cf. These dialects already provided the correct semantics and functionality for our use cases, allowing us to focus our custom dialect on features unique to Bril. While creating custom operations for all Bril operations and defining transformations into standard dialect operations instead of just translating directly from Bril to standard-dialect operations could have been more idiomatic to MLIR, it didn’t have a significant impact on the MLIR construction or the implementation for translation back to Bril. Here is an example of the transformation of hamming.bril, which we choose as an example due to its use of multiple functions and phi-nodes in SSA form:
@lsb (val : int) : int {
two : int = const 2;
div : int = div val two;
mul : int = mul div two;
lsb : int = sub val mul;
ret lsb;
}
@hamm_lsb (a : int, b : int) : int {
zero : int = const 0;
one : int = const 1;
a_lsb : int = call @lsb a;
b_lsb : int = call @lsb b;
xor : int = add a_lsb b_lsb;
guard : bool = eq xor one;
br guard .if .else;
.if:
ret one;
.else:
ret zero;
}
@main (a : int, b : int) {
zero : int = const 0;
two : int = const 2;
dist : int = const 0;
.while:
a_done : bool = eq a zero;
b_done : bool = eq b zero;
done : bool = and a_done b_done;
br done .end .body;
.body:
hamm : int = call @hamm_lsb a b;
dist : int = add dist hamm;
a : int = div a two;
b : int = div b two;
jmp .while;
.end:
print dist;
ret;
}
We first transform the program into SSA form and run a TDCE+ pass:
@lsb(val: int): int {
.b1:
two.0: int = const 2;
div.0: int = div val two.0;
mul.0: int = mul div.0 two.0;
lsb.0: int = sub val mul.0;
ret lsb.0;
}
@hamm_lsb(a: int, b: int): int {
.b1:
zero.0: int = const 0;
one.0: int = const 1;
a_lsb.0: int = call @lsb a;
b_lsb.0: int = call @lsb b;
xor.0: int = add a_lsb.0 b_lsb.0;
guard.0: bool = eq xor.0 one.0;
br guard.0 .if .else;
.if:
ret one.0;
.else:
ret zero.0;
}
@main(a: int, b: int) {
.b1:
zero.0: int = const 0;
two.0: int = const 2;
dist.0: int = const 0;
jmp .while;
.while:
dist.1: int = phi dist.0 dist.2 .b1 .body;
b.0: int = phi b b.1 .b1 .body;
a.0: int = phi a a.1 .b1 .body;
a_done.1: bool = eq a.0 zero.0;
b_done.1: bool = eq b.0 zero.0;
done.1: bool = and a_done.1 b_done.1;
br done.1 .end .body;
.body:
hamm.1: int = call @hamm_lsb a.0 b.0;
dist.2: int = add dist.1 hamm.1;
a.1: int = div a.0 two.0;
b.1: int = div b.0 two.0;
jmp .while;
.end:
print dist.1;
ret;
}
We then transform this SSA-form Bril into our MLIR dialect:
module {
func.func @lsb(%arg0: i32) -> i32 {
cf.br ^bb1
^bb1: // pred: ^bb0
bril.nop
%c2_i32 = arith.constant 2 : i32
%0 = arith.divsi %arg0, %c2_i32 : i32
%1 = arith.muli %0, %c2_i32 : i32
%2 = arith.subi %arg0, %1 : i32
return %2 : i32
}
func.func @hamm_lsb(%arg0: i32, %arg1: i32) -> i32 {
cf.br ^bb1
^bb1: // pred: ^bb0
bril.nop
%c0_i32 = arith.constant 0 : i32
%c1_i32 = arith.constant 1 : i32
%0 = call @lsb(%arg0) : (i32) -> i32
%1 = call @lsb(%arg1) : (i32) -> i32
%2 = arith.addi %0, %1 : i32
%3 = arith.cmpi eq, %2, %c1_i32 : i32
cf.cond_br %3, ^bb2, ^bb3
^bb2: // pred: ^bb1
bril.nop
return %c1_i32 : i32
^bb3: // pred: ^bb1
bril.nop
return %c0_i32 : i32
}
func.func @main(%arg0: i32, %arg1: i32) {
cf.br ^bb1
^bb1: // pred: ^bb0
bril.nop
%c0_i32 = arith.constant 0 : i32
%c2_i32 = arith.constant 2 : i32
%c0_i32_0 = arith.constant 0 : i32
cf.br ^bb2(%c0_i32_0, %arg1, %arg0 : i32, i32, i32)
^bb2(%0: i32, %1: i32, %2: i32): // 2 preds: ^bb1, ^bb3
bril.nop
%3 = arith.cmpi eq, %2, %c0_i32 : i32
%4 = arith.cmpi eq, %1, %c0_i32 : i32
%5 = arith.andi %3, %4 : i1
cf.cond_br %5, ^bb4, ^bb3
^bb3: // pred: ^bb2
bril.nop
%6 = call @hamm_lsb(%2, %1) : (i32, i32) -> i32
%7 = arith.addi %0, %6 : i32
%8 = arith.divsi %2, %c2_i32 : i32
%9 = arith.divsi %1, %c2_i32 : i32
cf.br ^bb2(%7, %9, %8 : i32, i32, i32)
^bb4: // pred: ^bb2
bril.nop
bril.print %0 : i32
return
}
}
As can be seen, some inefficiencies are that we begin every function with an entry point basic block that just jumps to the next block, and each basic block starts with a nop instruction (to prevent empty basic blocks). However, the semantics of the program are preserved.
MLIR to Bril
To convert MLIR back to Bril JSON, we implemented a custom MLIRToJSONConverter that traverses the MLIR IR and reconstructs equivalent Bril instructions and structure. Starting from the ModuleOp, the converter iterates over each func, mapping MLIR blocks to Bril-style labels (“entry”, “block1”, etc.) and generating a consistent naming scheme for SSA values. This ensures that each variable and control-flow target is uniquely represented.
Within each function, operations are visited and translated one by one. Standard MLIR operations from dialects like arith, func, and cf are mapped to corresponding Bril instructions—for example, arith.addi becomes add, cmpi slt becomes lt, and cf.br becomes jmp. For control flow, the converter reconstructs Bril phi nodes by analyzing block arguments and tracking which values are passed via branches, a transformation necessary because MLIR uses block arguments instead of explicit phi instructions.
Custom Bril dialect operations like bril.id, bril.print, and bril.nop are directly mapped back to their Bril JSON equivalents, preserving semantics that aren’t natively represented in MLIR’s core dialects. Function arguments and return values are also converted, with types translated from MLIR’s type system into Bril’s “int”, “bool”, and “float” strings.
The final output is a complete Bril JSON program with labeled blocks, typed instructions, and SSA-correct variable naming. This approach ensures fidelity to the original semantics and enables round-tripping from Bril → MLIR → Bril for debugging, testing, or further analysis.
Continuing the above example with hamming.bril, here is the result of the transformation from MLIR back into SSA Bril:
@lsb(arg0: int): int {
arg0: int = phi;
jmp ..block0;
..block0:
nop;
v0: int = const 2;
v1: int = div arg0 v0;
v2: int = mul v1 v0;
v3: int = sub arg0 v2;
ret v3;
}
@hamm_lsb(arg0: int, arg1: int): int {
arg0: int = phi;
arg1: int = phi;
jmp ..block0;
..block0:
nop;
v0: int = const 0;
v1: int = const 1;
v2: int = call @lsb arg0;
v3: int = call @lsb arg1;
v4: int = add v2 v3;
v5: bool = eq v4 v1;
br v5 ..block1 ..block2;
..block1:
nop;
ret v1;
..block2:
nop;
ret v0;
}
@main(arg0: int, arg1: int) {
arg0: int = phi;
arg1: int = phi;
jmp ..block0;
..block0:
nop;
v0: int = const 0;
v1: int = const 2;
v2: int = const 0;
jmp ..block1;
..block1:
v3: int = phi v2 v4 ..block0 ..block2;
v5: int = phi arg1 v6 ..block0 ..block2;
v7: int = phi arg0 v8 ..block0 ..block2;
nop;
v9: bool = eq v7 v0;
v10: bool = eq v5 v0;
v11: bool = and v9 v10;
br v11 ..block3 ..block2;
..block2:
nop;
v12: int = call @hamm_lsb v7 v5;
v4: int = add v3 v12;
v8: int = div v7 v1;
v6: int = div v5 v1;
jmp ..block1;
..block3:
nop;
print v3;
ret;
}
And here is our final non-SSA Bril output, which can be directly run in the brili interpreter:
@lsb(arg0: int): int {
.b1:
jmp ..block0;
..block0:
nop;
v0: int = const 2;
v1: int = div arg0 v0;
v2: int = mul v1 v0;
v3: int = sub arg0 v2;
ret v3;
}
@hamm_lsb(arg0: int, arg1: int): int {
.b1:
jmp ..block0;
..block0:
nop;
v0: int = const 0;
v1: int = const 1;
v2: int = call @lsb arg0;
v3: int = call @lsb arg1;
v4: int = add v2 v3;
v5: bool = eq v4 v1;
br v5 ..block1 ..block2;
..block1:
nop;
ret v1;
..block2:
nop;
ret v0;
}
@main(arg0: int, arg1: int) {
.b1:
jmp ..block0;
..block0:
nop;
v0: int = const 0;
v1: int = const 2;
v2: int = const 0;
v3: int = id v2;
v5: int = id arg1;
v7: int = id arg0;
jmp ..block1;
..block1:
nop;
v9: bool = eq v7 v0;
v10: bool = eq v5 v0;
v11: bool = and v9 v10;
br v11 ..block3 ..block2;
..block2:
nop;
v12: int = call @hamm_lsb v7 v5;
v4: int = add v3 v12;
v8: int = div v7 v1;
v6: int = div v5 v1;
v3: int = id v4;
v5: int = id v6;
v7: int = id v8;
jmp ..block1;
..block3:
nop;
print v3;
ret;
}
Challenges
One difficult aspect of this project was properly converting Phi nodes in Bril into SSA form in MLIR. In particular, we used the old version of SSA which utilized phi instructions as opposed to get and set. A phi instruction consists of the source labels and variables for each predecessor, but in MLIR, this functionality is represented using block arguments. Predecessors provide the value of each of the block arguments before entering the block containing the Phi node. To correctly represent this behavior, we needed to add block arguments to MLIR blocks containing Phi nodes, and we needed to identify predecessor blocks in which to insert arguments for branches.
Another difficult aspect was correctly implementing control flow translation from Bril JSON into the MLIR block representation. To do this, we created a map from label names to MLIR block pointers, which we then used as the targets for jumps and branches. We also needed to ensure that the targets passed in to MLIR cf.br and cf.cond_br were correct, as we initially ran into difficulties identifying the correct labels since we did not properly construct the block to label mapping.
We also ran into a variety of build errors. Building MLIR for the first time took an incredibly long time, and it took a while to fully configure the CMake system to incorporate our dependencies and project structure. At first, we tried to have all of the Brill MLIR files colocated within the Bril repository, but we were not able to set up the projects in such a way that the necessary parts of LLVM and MLIR would be compiled with the source files for our transformations. Thus, we switched to have our codebase split between a fork of the Bril repository and the LLVM monorepo. Even after figuring out our build process, due to the large scale of the project, even type errors that should have been trivial to fix took a long time to figure out because we had to read a ton of documentation every time we ran into a new issue.
In our initial implementation, we correctly ran around 20 out of the 40 test programs after a round-trip transformation. There were various errors in our implementation, such as:
- Incoming edges into basic blocks not being correctly preserved in the MLIR to Bril transformation, causing Phi nodes to be incomplete
- Functions being assumed to have a return type, which would cause a heap buffer overflow on the second function if it didn’t have a return type
- Variables being encountered in the Bril to MLIR transformation before they are defined by a phi node in a later basic block, which was fixed by prepopulating our symbol table with all phi node destination variables
- The print instruction initially being assumed to only take one operand
After fixing these issues, all 40 test programs ran correctly.
Evaluation/Correctness
We checked the correctness of our implementation by running through all of the core benchmarks in the Bril repository. We compared the number of programs that ran correctly using the provided inputs in the benchmark files between a Bril-SSA-Bril round trip and a Bril-SSA-MLIR-SSA-Bril round trip, with TDCE+ running after the into-SSA transformation. Note that this TDCE+ pass is required to correctly transform a SSA Bril program into MLIR. Otherwise, it crashes. We used the SSA round trip as a baseline instead of a regular because the old Phi-based SSA form was known to be flawed, with some programs crashing just as a result of the round trip, even with dead code elimination passes. Of the 53 programs in the core benchmark suite, 40 benchmarks survived the SSA round trip and produced a result. Then, of those 40 benchmarks, 39 survived the SSA+MLIR round trip and produced the same result as the SSA round trip.
recfact.bril, the one program that did not have the same output resulted in a runtime error (a null pointer error). After further analysis, we determined it was due to the SSA program generated by the sample to_ssa.py being malformed. For some reason, the SSA form of the program had the following phi node: .endif.0: v4.0: int = phi __undefined v4.0 v4.1 .else.0 .b2 .b2;. This was odd because the phi node referenced itself with the incorrect label and also had two of the same label. We hypothesized this is because the original program had a jump instruction immediately following a return instruction (ret v4; jmp .endif.0;), which confused the transformation into SSA. After removing this jump instruction (which would never be reached), this core benchmark ran correctly and all 40 SSA-compatible core benchmarks passed.
Future Work
As things stand, we are very happy with the results of our transformation to and from the Bril MLIR dialect. We believe that the main next steps are writing an interpreter that directly operates on the MLIR dialect, creating a full Bril dialect which no longer utilizes preexisting dialects like arith and func, and supporting Bril extensions such as memory operations. After the Bril MLIR dialect is fully complete, supporting transformations into lower-level dialects such as LLVM would be a natural step, and would produce a hopefully more robust alternative to the current bril-llvm transpiler. Of course, it would also be nice to have a more robust transformation that doesn’t require TDCE+ to be run as well. Finally, a nice QoL change would be to have our brilc program and MLIR dialect be defined as a standalone project in the bril repository that can be built out-of-tree. Currently, our conversion code is modeled off of the Toy tutorial in the MLIR documentation which unfortunately requires all of LLVM to be cloned and built, not just the Bril-specific parts. There is a standalone example project in the MLIR examples folder that could serve as a nice starting point for an out-of-tree dialect. However, the project structure is different enough to make the effort non-trivial.