MLIR for Lox: Part 6 — The Complete Project — Every Dialect Operation in One Place
Five parts. A garbage collector. Root tracking. MLIR integration. Closures. Now you need to see how they fit together — not as individual chapters, but as a single running system. This part is the assembled view: file structure, object layout, API reference, compilation pipeline, debugging guide, and performance notes.
Read it straight through for the big picture. Jump to a section when you need a detail.
Complete File Structure
lox-mlir/
├── ast/ # AST type definitions
├── lexer/ # Tokenization
├── parser/ # Parsing
├── analysis/ # Closure capture analysis
├── codegen/ # MLIR code generation, Lox dialect, lowering
├── runtime/ # Object header, garbage collector, shadow stack
├── examples/ # Example Lox programs (simple, closures, GC stress)
└── tests/ # Integration tests
Complete Object Layout
Every object in Lox has this layout:
┌─────────────────────────────────────────────────────┐
│ ObjHeader (16 bytes) │
├─────────────────────────────────────────────────────┤
│ Field │ Offset │ Size │ Description │
├───────────────┼────────┼──────┼────────────────────┤
│ marked │ 0 │ 1 │ GC mark flag │
│ obj_type │ 1 │ 1 │ ObjType enum │
│ padding │ 2 │ 2 │ Alignment padding │
│ size │ 4 │ 4 │ Data size in bytes │
│ next │ 8 │ 8 │ Ptr to next obj │
└─────────────────────────────────────────────────────┘
Total header size: 16 bytes
Data follows immediately after header (offset +16).
Type-Specific Layouts
Number (ObjType::Number)
┌──────────────────────────────┐
│ Header (16 bytes) │
├──────────────────────────────┤
│ value: f64 (8 bytes) │
└──────────────────────────────┘
Total: 24 bytes
String (ObjType::String)
┌──────────────────────────────┐
│ Header (16 bytes) │
├──────────────────────────────┤
│ length: usize (8 bytes) │
│ hash: u64 (8 bytes) │
│ chars: [u8; length] │
└──────────────────────────────┘
Total: 32 + length bytes
Environment (ObjType::Environment)
┌──────────────────────────────┐
│ Header (16 bytes) │
├──────────────────────────────┤
│ enclosing: *mut Env (8) │
│ slot_count: usize (8) │
│ slots: [*mut u8; slot_count] │
└──────────────────────────────┘
Total: 32 + (8 × slot_count) bytes
Closure (ObjType::Closure)
┌──────────────────────────────┐
│ Header (16 bytes) │
├──────────────────────────────┤
│ function_index: usize (8) │
│ environment: *mut Env (8) │
└──────────────────────────────┘
Total: 32 bytes
Instance (ObjType::Instance)
┌──────────────────────────────┐
│ Header (16 bytes) │
├──────────────────────────────┤
│ class: *mut Class (8) │
│ field_count: usize (8) │
│ fields: [(*str, *val); n] │
└──────────────────────────────┘
Total: 32 + (16 × field_count) bytes
Complete GC API Reference
Runtime Functions (C ABI)
The C ABI convention here puts obj_type before size. If you worked through Part 5, you’ll notice this is the opposite order — Part 5 calls the Rust runtime directly with (size, obj_type), matching the Rust function signature. The extern "C" wrapper registered with the JIT reorders the arguments: the MLIR call site passes (obj_type, size), the C wrapper receives them in that order, then calls the Rust alloc(size, obj_type) with the arguments swapped back to the Rust function’s expected order. Both sections are internally consistent — Part 5 matches the Rust signature, and Part 7 matches the C ABI convention.
// Allocation
void* lox_runtime_alloc(uint8_t obj_type, size_t size);
void* lox_runtime_alloc_environment(size_t slot_count, void* enclosing);
void* lox_runtime_alloc_closure(size_t function_index, void* environment);
// Environment access
void lox_runtime_env_set(void* env, size_t index, void* value);
void lox_runtime_env_set_number(void* env, size_t index, double value);
void* lox_runtime_env_get(void* env, size_t index);
// Shadow stack
void** gc_push_frame(size_t root_count);
void gc_pop_frame(void);
void gc_set_root(void* frame, size_t index, void* value);
// Collection
void gc_collect(void);
size_t gc_object_count(void);
MLIR Operations
lox.alloc :: (obj_type: i8, size: i64) -> !llvm.ptr
Allocates a heap object with the given type and data size (not including header).
lox.push_frame :: (root_count: i32) -> !llvm.ptr
Pushes a shadow stack frame with the given number of root slots.
Returns pointer to roots array.
lox.pop_frame :: () -> ()
Pops the current shadow stack frame.
lox.set_root :: (index: i32, value: !llvm.ptr) -> ()
Sets a root in the current frame.
lox.store :: (obj: !llvm.ptr, value: f64) -> ()
Stores a number into a heap object's data field.
lox.load :: (obj: !llvm.ptr) -> f64
Loads a number from a heap object's data field.
lox.print :: (value: f64) -> ()
Prints a value via the runtime.
lox.alloc_environment :: (slot_count: i64, enclosing: !llvm.ptr) -> !llvm.ptr
Allocates an environment with the given number of variable slots.
`enclosing` is a null pointer for top-level environments, or a pointer to
the parent environment for nested closures.
lox.alloc_closure :: (function_index: i64, environment: !llvm.ptr) -> !llvm.ptr
Allocates a closure struct: a function index (position in the compiler's
function table) and a pointer to the closure's environment.
lox.env_set :: (env: !llvm.ptr, index: i64, value: !llvm.ptr) -> ()
Sets a slot in an environment. For the numbers-only model, use lox.env_set_number instead
(passing an f64 through a void* parameter violates the C calling convention — the callee
would read the wrong register).
lox.env_set_number :: (env: !llvm.ptr, index: i64, value: f64) -> ()
Sets a slot in an environment, boxing the f64 into a heap-allocated Number.
Numbers-only model only. The generic env_set takes a void* for the tagged-union model.
lox.env_get :: (env: !llvm.ptr, index: i64) -> !llvm.ptr
Gets a slot from an environment.
The Compilation Pipeline
┌─────────────────────────────────────────────────────────────┐
│ 1. LEXING │
│ "var x = 1 + 2;" → [VAR, IDENT, EQUAL, NUMBER, ...] │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 2. PARSING │
│ Tokens → AST │
│ VarStmt { name: "x", init: BinaryExpr { ... } } │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 3. ANALYSIS │
│ Find captured variables, compute root counts │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 4. CODE GENERATION │
│ AST → MLIR (Lox dialect) │
│ func.func @main() { │
│ %frame = lox.push_frame root_count = 1 │
│ %x = lox.alloc obj_type = 0, size = 8 │
│ lox.set_root index = 0, %x │
│ ... │
│ } │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 5. LOWERING │
│ MLIR (Lox dialect) → MLIR (LLVM dialect) │
│ lox.alloc → func.call @lox_runtime_alloc │
│ lox.push_frame → func.call @gc_push_frame │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 6. LLVM IR GENERATION │
│ MLIR → LLVM IR │
│ define ptr @lox_runtime_alloc(...) { ... } │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 7. NATIVE CODE GENERATION │
│ LLVM IR → Machine code │
│ Link with liblox_runtime.a │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ 8. EXECUTION │
│ ./program │
│ Runtime GC manages memory │
└─────────────────────────────────────────────────────────────┘
IR at Each Stage: A Concrete Example
The pipeline diagram above shows the flow, but it doesn’t show what the IR actually looks like at each stage. Let’s trace a concrete Lox program through the entire compilation pipeline.
The program:
var x = 1 + 2;
print x;
Trivial, but it exercises every stage. Here’s what happens:
Stage 1–2: Lexing and Parsing
The parser produces an AST like:
Program
VarDecl "x" = BinaryExpr(Num(1), Plus, Num(2))
PrintStmt Name("x")
Stage 3: Analysis
No closures or captured variables in this program. The analysis pass computes root counts: x is a local variable, so it needs one GC root.
Stage 4: Code Generation (Lox Dialect MLIR)
The code generator walks the AST and emits MLIR using our custom Lox dialect operations. This example uses the GC-aware representation (heap-allocated objects, shadow stack roots) from Parts 3–6 — not the simpler numbers-only model from Part 1. The difference shows up in the function signature: func.func @main() returns nothing here, while Part 1’s func.func @main() -> f64 returns the result of the last expression. Both are valid; the GC-aware version is what a real Lox compiler would produce.
module {
func.func @main() {
%frame = lox.push_frame root_count = 1 : i32
%one = arith.constant 1.0 : f64
%two = arith.constant 2.0 : f64
%sum = arith.addf %one, %two : f64
%x = lox.alloc obj_type = 0, size = 8 : i64
lox.set_root index = 0, %x : i32
lox.store %x, %sum : f64
%loaded = lox.load %x : f64
lox.print %loaded : f64
lox.pop_frame
func.return
}
}
Notice the mix: arith.addf is a standard MLIR arithmetic operation, but lox.push_frame, lox.alloc, lox.set_root, and lox.print are our custom dialect. At this stage, the IR is close to the source — the Lox-level operations are still visible.
Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)
The lowering pass replaces each lox.* operation with standard MLIR and runtime calls:
module {
func.func @main() {
%root_count = arith.constant 1 : i32
%frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
%one = arith.constant 1.0 : f64
%two = arith.constant 2.0 : f64
%sum = arith.addf %one, %two : f64
%obj_type = arith.constant 0 : i8 // ObjType::Number
%size = arith.constant 8 : i64 // data size only (f64 = 8 bytes; lox.alloc adds header)
%x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
%index = arith.constant 0 : i32
func.call @gc_set_root(%frame, %index, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
// Store the number into the object's data field
// %x is already the data pointer — alloc() returns ptr past the header
llvm.store %sum, %x : f64, !llvm.ptr
%loaded = llvm.load %x : !llvm.ptr -> f64
func.call @lox_print(%loaded) : (f64) -> ()
func.call @gc_pop_frame() : () -> ()
func.return
}
}
The Lox dialect is gone. Every lox.* operation has been replaced by a func.call to a runtime function or a standard LLVM operation. The scf.if and scf.while operations (from control flow in more complex programs) would be lowered to cf (control flow) dialect operations at this stage.
This is a simplification — the actual lowered IR has more detail around pointer arithmetic and type casting. But the pattern is: custom dialect operations become runtime calls and standard MLIR operations.
Note: The
llvm.storeandllvm.loadsyntax in Stage 5 examples is simplified. With opaque pointers (!llvm.ptr), storing anf64into a!llvm.ptrslot requires a two-step conversion (bitcastf64→i64, theninttoptr i64→!llvm.ptr), as shown in Part 4. We simplify the syntax here for readability — the real MLIR LLVM dialect requires the two-step conversion. The key point is the data flow: values are stored into and loaded from the right memory locations. Stage 6 (LLVM IR) uses the valid native syntax.
Stage 6: LLVM IR
After the MLIR-to-LLVM conversion pass, the IR becomes LLVM IR:
define void @main() {
entry:
%frame = call ptr @gc_push_frame(i32 1)
%x = call ptr @lox_runtime_alloc(i32 0, i64 8)
call void @gc_set_root(ptr %frame, i32 0, ptr %x)
store f64 3.0, ptr %x
%loaded = load f64, ptr %x
call void @lox_print(f64 %loaded)
call void @gc_pop_frame()
ret void
}
Notice that 1.0 + 2.0 has been folded into the constant 3.0 — this is a free optimization from MLIR’s arith dialect constant folding.
Stage 7–8: Native Code and Execution
LLVM compiles the IR to native machine code, links it with the runtime (liblox_runtime.a), and produces an executable. When you run it:
3
Why This Matters
Looking at the IR at each stage reveals something important: the Lox dialect is the only stage that’s specific to our language. Everything after Stage 5 is generic — standard MLIR operations, standard LLVM IR, standard machine code. The Lox dialect is the narrow waist of the hourglass.
This is the whole point of MLIR. You define a dialect that captures what makes your language unique (Lox’s GC-aware operations, in our case). Then you lower to standard dialects and let the framework handle the rest. You write the interesting part; MLIR handles the boring part.
IR at Each Stage: A Program with Control Flow
The var x = 1 + 2; print x; example only shows straight-line code. Let’s trace a program with an if statement to see how control flow appears in the IR.
var x = 10;
if (x > 5) {
print x;
} else {
print 0;
}
This introduces a comparison, a conditional branch, and two alternative paths. Here’s what the IR looks like at the key stages:
Stage 4: Code Generation (Lox Dialect MLIR)
module {
func.func @main() {
%frame = lox.push_frame root_count = 1 : i32
%ten = arith.constant 10.0 : f64
%x = lox.alloc obj_type = 0, size = 8 : i64
lox.set_root index = 0, %x : i32
lox.store %x, %ten : f64
%loaded = lox.load %x : f64
%five = arith.constant 5.0 : f64
%cond = arith.cmpf ogt, %loaded, %five : f64
scf.if %cond {
// then-block
%val = lox.load %x : f64
lox.print %val : f64
} else {
// else-block
%zero = arith.constant 0.0 : f64
lox.print %zero : f64
}
lox.pop_frame
func.return
}
}
Two new things appear: arith.cmpf (a floating-point comparison) and scf.if (structured control flow). The scf dialect is MLIR’s standard way to represent conditionals — it has a then-region and an else-region, each in their own { } block. Unlike LLVM’s basic-block branches, scf.if is structured: the then and else regions can’t be jumped into from outside. This makes the IR easier to analyze and transform.
Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)
module {
func.func @main() {
%root_count = arith.constant 1 : i32
%frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
%ten = arith.constant 10.0 : f64
%obj_type = arith.constant 0 : i8
%size = arith.constant 8 : i64
%x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
%idx0 = arith.constant 0 : i32
func.call @gc_set_root(%frame, %idx0, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
llvm.store %ten, %x : f64, !llvm.ptr
%loaded = llvm.load %x : !llvm.ptr -> f64
%five = arith.constant 5.0 : f64
%cond = arith.cmpf ogt, %loaded, %five : f64
scf.if %cond {
%val = llvm.load %x : !llvm.ptr -> f64
func.call @lox_print(%val) : (f64) -> ()
} else {
%zero = arith.constant 0.0 : f64
func.call @lox_print(%zero) : (f64) -> ()
}
func.call @gc_pop_frame() : () -> ()
func.return
}
}
The scf.if is still here — it doesn’t get lowered until the scf-to-cf conversion pass. The lox.* operations have been replaced by runtime calls and LLVM memory operations, but the structured control flow remains. This is by design: keeping scf.if as long as possible lets MLIR’s transformation passes reason about the control flow structure.
Later, the scf-to-cf pass will lower scf.if to cf.cond_br (a conditional branch between basic blocks), and the cf-to-llvm pass will lower that to LLVM’s br and cond_br instructions. But that happens after all the high-level transformations are done.
What’s different from straight-line code?
The scf.if operation creates regions — separate blocks of IR for the then and else paths. Each region can define its own SSA values. The arith.cmpf produces an i1 (boolean) value that drives the branch.
The Lox dialect doesn’t have its own if operation — we use MLIR’s standard scf.if directly. This is a design choice: some languages define their own conditional operation in their dialect, then lower it to scf.if. We skip that step because scf.if already says everything we need.
GC roots are shared across both branches. The %frame and %x values are available in both the then-region and else-region — MLIR regions can see values from their enclosing scope. This means we don’t need to push/pop roots for each branch; the pre-if roots cover both paths.
IR at Each Stage: A Program with Closures
Functions and closures are where the GC-aware model really shows its value. Let’s trace a program that defines a function, captures a variable, and calls the result:
fun makeAdder(n) {
fun add(x) {
return x + n;
}
return add;
}
var add5 = makeAdder(5);
print add5(3);
This program creates a closure (add) that captures n from its enclosing scope. The closure outlives makeAdder’s stack frame — that’s the whole reason we need heap-allocated closure environments and GC roots.
Stage 4: Code Generation (Lox Dialect MLIR)
module {
func.func @main() {
// One root: the closure returned by makeAdder. makeAdder itself is a
// top-level function — we call it directly, so it doesn't need a closure
// object. Only the returned closure (which captures n) needs to be rooted.
%frame = lox.push_frame root_count = 1 : i32
// Call makeAdder(5) — direct call, no closure allocation needed
%five = arith.constant 5.0 : f64
%add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
lox.set_root index = 0, %add5_closure : !llvm.ptr
// Call add5(3) — extract the environment from the closure
// (matches the calling convention from Part 5: closure functions
// receive the environment pointer, not the closure pointer)
%three = arith.constant 3.0 : f64
%env_ptr = llvm.load %add5_closure[8] : !llvm.ptr -> !llvm.ptr // shorthand (see note below)
%result = func.call @add(%three, %env_ptr) : (f64, !llvm.ptr) -> f64
lox.print %result : f64
lox.pop_frame
func.return
}
func.func @makeAdder(%n: f64) -> !llvm.ptr {
%frame = lox.push_frame root_count = 1 : i32
// Allocate environment with 1 slot for the captured variable 'n'
%null_env = llvm.mlir.nullptr : !llvm.ptr // no enclosing environment
%env = lox.alloc_environment slot_count = 1, enclosing = %null_env : !llvm.ptr
lox.set_root index = 0, %env : !llvm.ptr
// Store n into environment slot 0 (env_set_number boxes the f64 into a heap Number)
lox.env_set_number %env, index = 0, %n : !llvm.ptr, i64, f64
// Allocate closure for 'add', pointing to the environment
// The compiler assigns function indices in compilation order.
// @main and @makeAdder are called directly (no closure table entry
// needed). @add is the first closure — index 1. (Index 0 would be
// used if the first closure-creating function in compilation order
// needed a table entry.)
%c1 = arith.constant 1 : i64 // function_index for @add
%add_closure = lox.alloc_closure function_index = %c1, environment = %env : !llvm.ptr
lox.pop_frame
func.return %add_closure : !llvm.ptr
}
func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
// Load captured variable 'n' from the environment (slot 0)
// The environment pointer is received directly — the caller extracts
// it from the closure before calling us (see the calling convention
// in Part 5's "How Does the Runtime Call a Closure?" section).
%n_ptr = lox.env_get %env, index = 0 : (!llvm.ptr, i64) -> !llvm.ptr
%n = llvm.load %n_ptr : !llvm.ptr -> f64
%sum = arith.addf %x, %n : f64
func.return %sum : f64
}
}
%env_ptr = llvm.load %add5_closure[8] is a shorthand. The [8] isn’t valid MLIR syntax — it stands for llvm.getelementptr (to compute the address of the environment field, 8 bytes after the start of the closure’s data area) followed by llvm.load (to read the pointer at that address). Stage 5 shows the real two-step version. The shorthand here is for readability — the Lox dialect is our notation, and this operation maps to the two-step sequence during lowering.
lox.alloc_environment and lox.alloc_closure are separate heap objects. The environment holds the captured variables; the closure holds a function index and a pointer to the environment. This matches the runtime API from Part 5: alloc_environment creates a slot array, alloc_closure creates a closure struct.
lox.env_set_number and lox.env_get handle captured variable access through the environment API. env_set_number stores an f64 into a slot (the runtime boxes it into a heap Number internally). env_get returns a pointer to the slot’s contents. We use env_set_number instead of the generic env_set (which takes void*) because passing an f64 through a void* parameter violates the C calling convention — on x86-64, f64 is passed in an XMM register while void* goes in a general-purpose register.
The environment pointer is passed as a parameter. @add takes %env: !llvm.ptr as its second argument. The caller extracts the environment pointer from the closure (at data offset +8, after the function_index field) before calling @add. This matches the calling convention established in Part 5: when a closure is called, the runtime reads the environment field from the closure object and passes it to the function. The closure function receives the environment pointer directly — it doesn’t need the function_index because it already knows which function it is.
Why pass the environment pointer instead of the whole closure? The closure function doesn’t need its own
function_index— it already knows which function it is. The only thing it needs from the closure is the environment pointer, so that’s all the calling convention passes. A production compiler might pass the whole closure pointer and have the callee extract the environment, but the environment-only convention is simpler and matches how indirect calls work (the runtime reads the environment from the closure before the call).
Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)
module {
func.func @main() {
%root_count = arith.constant 1 : i32
%frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
// Call makeAdder directly — no closure allocation needed for top-level functions
%five = arith.constant 5.0 : f64
%add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
%idx0 = arith.constant 0 : i32
func.call @gc_set_root(%frame, %idx0, %add5_closure) : (!llvm.ptr, i32, !llvm.ptr) -> ()
%three = arith.constant 3.0 : f64
// Extract the environment from the closure before calling @add
// (matches the calling convention from Part 5)
%env_ptr_addr = llvm.getelementptr %add5_closure[8] : (!llvm.ptr) -> !llvm.ptr
%add5_env = llvm.load %env_ptr_addr : !llvm.ptr -> !llvm.ptr
%result = func.call @add(%three, %add5_env) : (f64, !llvm.ptr) -> f64
func.call @lox_print(%result) : (f64) -> ()
func.call @gc_pop_frame() : () -> ()
func.return
}
func.func @makeAdder(%n: f64) -> !llvm.ptr {
%root_count = arith.constant 1 : i32
%frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
// Allocate environment with 1 slot for captured variable 'n'
%slot_count = arith.constant 1 : i64
%null_env = llvm.mlir.nullptr : !llvm.ptr // no enclosing environment
%env = func.call @lox_runtime_alloc_environment(%slot_count, %null_env) : (i64, !llvm.ptr) -> !llvm.ptr
%idx0 = arith.constant 0 : i32
func.call @gc_set_root(%frame, %idx0, %env) : (!llvm.ptr, i32, !llvm.ptr) -> ()
// Store n into environment slot 0 (env_set_number boxes f64 into heap Number)
%slot_idx = arith.constant 0 : i64
func.call @lox_runtime_env_set_number(%env, %slot_idx, %n) : (!llvm.ptr, i64, f64) -> ()
// Allocate closure for 'add', pointing to the environment
%func_idx = arith.constant 1 : i64 // function_index for @add
%add_closure = func.call @lox_runtime_alloc_closure(%func_idx, %env) : (i64, !llvm.ptr) -> !llvm.ptr
func.call @gc_pop_frame() : () -> ()
func.return %add_closure : !llvm.ptr
}
func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
// The environment pointer is passed directly (see calling convention in Part 5)
// Load captured n from environment slot 0
%slot_idx = arith.constant 0 : i64
%n_ptr = func.call @lox_runtime_env_get(%env, %slot_idx) : (!llvm.ptr, i64) -> !llvm.ptr
%n = llvm.load %n_ptr : !llvm.ptr -> f64
%sum = arith.addf %x, %n : f64
func.return %sum : f64
}
}
The lox.* operations are gone, replaced by func.call to runtime functions and LLVM memory operations. Notice that the closure allocation now properly separates the environment (alloc_environment) from the closure struct (alloc_closure), matching the runtime API from Part 5. Captured variables go through env_set_number / env_get — not raw llvm.store into the closure’s data area.
Stage 6: LLVM IR
define ptr @makeAdder(double %n) {
entry:
%frame = call ptr @gc_push_frame(i32 1)
; Allocate environment with 1 slot for captured variable 'n'
%env = call ptr @lox_runtime_alloc_environment(i64 1, ptr null) ; 1 slot, no enclosing env
call void @gc_set_root(ptr %frame, i32 0, ptr %env)
; Store n into environment slot 0 (boxes f64 into heap Number)
call void @lox_runtime_env_set_number(ptr %env, i64 0, double %n)
; Allocate closure pointing to the environment
%closure = call ptr @lox_runtime_alloc_closure(i64 1, ptr %env) ; function_index=1 for @add
call void @gc_pop_frame()
ret ptr %closure
}
define double @add(double %x, ptr %env) {
entry:
; The environment pointer is passed directly (see calling convention in Part 5)
; Load captured n from environment slot 0
%n_ptr = call ptr @lox_runtime_env_get(ptr %env, i64 0)
%n = load double, ptr %n_ptr
%sum = fadd double %x, %n
ret double %sum
}
Notice how the closure becomes a simple pointer in LLVM IR. The %env argument to @add is a ptr — LLVM doesn’t know or care that it points to a GC-managed closure environment. The GC semantics are entirely in the runtime (gc_push_frame, gc_set_root, gc_pop_frame). LLVM sees a pointer, does pointer arithmetic, and moves on.
The LLVM IR now properly separates the environment from the closure. makeAdder allocates an environment, stores the captured variable into it, and then allocates a closure that points to the environment. In @add, the captured variable is accessed through env_get — not by writing directly into the closure’s data area. This matches the runtime API and avoids out-of-bounds writes: the closure struct only has two fields (function_index at offset +0 and environment at offset +8), and captured variables live in the separate environment object.
What’s different from the control flow example?
The closure example has three functions instead of one — each gets its own func.func, its own GC frame, and its own root set. makeAdder allocates an environment (rooting it before any further allocations), stores the captured variable into it, allocates a closure pointing to that environment, and returns the closure. The caller (main) roots the returned closure so the GC doesn’t collect it after makeAdder’s frame is popped, then extracts the environment pointer from the closure before calling @add — this matches the calling convention from Part 5, where the runtime reads the environment field from the closure object and passes it directly. The @add function receives the environment pointer as its second argument and uses env_get to access captured variables. Each function pushes its own GC frame and pops it before returning — roots are scoped to the function that registered them. makeAdder’s roots are popped when it returns; main must root the returned closure in its own frame.
Debugging Your GC
Common Bugs
1. Use After Free
Symptom: Random crashes, corrupted data
Cause: Object freed while still referenced
Fix: Check that all references are registered as roots
2. Memory Leaks
Symptom: Memory grows unbounded
Cause: Objects not being freed
Fix: Check mark phase visits all reachable objects
3. Premature Collection
Symptom: Object disappears mid-function
Cause: Root not registered in shadow stack
Fix: Ensure gc_set_root is called for all local references
Debugging Tools
// Pseudocode — debug utilities implemented in C, matching the runtime
// (liblox_runtime.a). These walk the same global data structures the GC
// uses: the all-objects linked list and the shadow stack.
/* Print all objects in the heap (for debugging) */
void gc_debug_print_heap(void) {
ObjHeader *current = all_objects; // global linked list head
printf("=== HEAP CONTENTS ===\n");
while (current != NULL) {
printf(" %p: type=%d, marked=%d, size=%zu\n",
(void*)current,
current->obj_type,
current->marked,
current->size);
current = current->next;
}
}
/* Print the shadow stack (for debugging) */
void gc_debug_print_stack(void) {
ShadowFrame *current = shadow_stack_head; // global stack head
int frame_num = 0;
printf("=== SHADOW STACK ===\n");
while (current != NULL) {
printf(" Frame %d (%zu roots):\n", frame_num, current->root_count);
for (size_t i = 0; i < current->root_count; i++) {
printf(" [%zu] = %p\n", i, (void*)current->roots[i]);
}
current = current->next;
frame_num++;
}
}
Performance Considerations
Allocation Overhead
| Operation | Time | Notes |
|---|---|---|
alloc() | O(1) | Plus possible GC trigger |
gc_push_frame() | O(1) | Allocate and link |
gc_pop_frame() | O(1) | Unlink and free |
gc_collect() | O(live objects) | Mark + sweep |
Tuning Parameters
#![allow(unused)]
fn main() {
// Adjust these based on your workload
/// Trigger GC after this many allocations
const GC_THRESHOLD: usize = 1024;
/// Maximum objects before forced collection
const GC_MAX_OBJECTS: usize = 10000;
/// Enable debug output
const GC_DEBUG: bool = false;
}
Optimization Opportunities
- Generational GC: Most objects die young. Only scan recent allocations.
- Incremental GC: Don’t pause for full collection. Do it in small chunks.
- Parallel GC: Use multiple threads for mark/sweep.
- Escape Analysis: If an object doesn’t escape a function, put it on the stack.
What’s Next?
You now have a complete garbage collector. The immediate path forward is getting it running — compile the runtime, write a simple test program, and verify the GC handles the edge cases. Then add more object types: arrays, maps, and classes (which the next part covers in detail).
Beyond that, there are four directions for the collector itself. Generational collection splits the heap into a young generation (frequent, small collections) and an old generation (rare, large collections) — most objects die young, so collecting only the young generation is fast. Incremental collection breaks mark/sweep into small chunks to reduce pause times. Compacting collection moves objects to eliminate fragmentation — this requires precise GC, which we already have. Concurrent collection runs the GC in parallel with program execution, which is complex but powerful.
For further reading, Crafting Interpreters covers the GC chapter we skipped, the GC Handbook is the definitive reference, and MLIR’s documentation covers advanced dialect features. If you want relocating GC, LLVM’s statepoints documentation is the place to start.
Appendix A: Quick Reference Card
┌──────────────────────────────────────────────────────────────┐
│ GC QUICK REFERENCE │
├──────────────────────────────────────────────────────────────┤
│ │
│ MARK-SWEEP ALGORITHM │
│ ───────────────────── │
│ 1. Mark all roots (shadow stack) │
│ 2. Recursively mark referenced objects │
│ 3. Sweep: free all unmarked objects │
│ │
│ ROOTS │
│ ───── │
│ • Global variables │
│ • Local variables (in shadow stack frames) │
│ • Temporaries (also in shadow stack) │
│ │
│ OBJECT LIFECYCLE │
│ ───────────────── │
│ 1. alloc() → Object on heap │
│ 2. gc_set_root() → Register in shadow stack │
│ 3. Use object │
│ 4. gc_pop_frame() → Remove from shadow stack │
│ 5. gc_collect() → Free if unreachable │
│ │
│ CLOSURES │
│ ──────── │
│ • Captured vars → Environment (heap) │
│ • Closure = function pointer + environment pointer │
│ • Mark closure → Mark environment → Mark captured vars │
│ │
│ MLIR OPS (conceptual) │
│ ──────────────────── │
│ lox.alloc → Allocate object │
│ lox.push_frame → Push shadow stack frame │
│ lox.pop_frame → Pop shadow stack frame │
│ lox.set_root → Register root │
│ │
│ These are the Lox dialect operations (defined in │
│ lox_dialect.rs). Each lowers to a func.call to a runtime │
│ function (e.g., lox.alloc → @lox_runtime_alloc) during │
│ the lowering pass. │
└──────────────────────────────────────────────────────────────┘
Appendix B: Error Messages
| Error | Cause | Fix |
|---|---|---|
No frame to pop! | gc_pop_frame called without matching gc_push_frame | Check function prologue/epilogue pairing |
Out of memory! | Allocation failed | Check for memory leaks, tune GC threshold |
Variable not found: x | Capture analysis failed | Check scope tracking in analyzer |
Object not marked during sweep | Internal GC assertion (not user-facing) | Check mark phase — all reachable objects should be marked before sweep |
Appendix C: Study Roadmap
Use this checklist to track your progress through the GC tutorial:
Phase 1: Understanding (Read-Only)
- Part 2: Understand the GC problem, mark-sweep algorithm, and object allocation
- Part 3: Understand shadow stacks, root tracking, and the full mark/sweep cycle
- Part 4: Understand the MLIR dialect, lowering, and function code generation
- Part 5: Understand closure capture, environments, and closure code generation
Phase 2: Implementation (Hands-On)
- Create the runtime module structure
- Implement
ObjHeaderandObjType - Implement
alloc()function - Implement shadow stack (
StackFrame,gc_push_frame,gc_pop_frame) - Implement mark phase (
gc_mark,mark_object,mark_references) - Implement sweep phase (
gc_sweep) - Implement
gc_collect()entry point - Test allocation without GC
- Test mark phase with simple objects
- Test full GC cycle
- Add debug utilities (
gc_debug_print_heap, etc.) - Implement environment allocation
- Implement closure allocation
- Test closure GC
Phase 3: MLIR Integration
- Set up Melior dependency
- Define Lox dialect operations
- Implement lowering pass
- Generate MLIR for simple functions
- Lower to LLVM IR
- Link with runtime
- Test end-to-end execution
Phase 4: Advanced Features
- Add string type
- Add instance type
- Implement nested closure environments
- Add generational GC (optional)
- Add incremental collection (optional)
Appendix D: Glossary
| Term | Definition |
|---|---|
| Heap | The pool of memory where objects are allocated |
| Stack | Memory for function calls and local variables |
| Root | A pointer that the GC treats as always live (globals, locals) |
| Reachable | An object that can be found by following pointers from roots |
| Mark | To flag an object as “still live” during GC |
| Sweep | To free all objects that weren’t marked |
| Shadow Stack | A separate stack that tracks GC roots for each function call |
| Environment | A heap-allocated structure holding captured variables for closures |
| Closure | A function together with its captured environment |
| Dialect | A set of operations in MLIR for a specific domain |
| Lowering | Converting high-level IR to lower-level IR |
Appendix E: Further Reading
Books
- “Crafting Interpreters” by Bob Nystrom - Chapter on garbage collection
- “The Garbage Collection Handbook” by Jones et al. - Comprehensive GC reference
- “Engineering a Compiler” by Cooper & Torczon - GC in the context of compilers
Papers
- “Uniprocessor Garbage Collection Techniques” - Wilson (1992) - Classic survey
- “A Unified Theory of Garbage Collection” - Bacon et al. (2004) - Theoretical foundation
Code
- Lua 5.4 source - Simple, readable GC implementation
- MicroPython - Mark-sweep GC for embedded systems
- mruby - Lightweight Ruby with simple GC
MLIR/LLVM
- MLIR Documentation - https://mlir.llvm.org/docs/
- Melior Repository - Rust bindings for MLIR
- LLVM GC Documentation - https://llvm.org/docs/GarbageCollection.html
Conclusion
The GC’s algorithm is three steps: track what’s reachable, mark everything reachable, free everything else. Everything else — generational collection, incremental collection, compacting collectors — is optimization on top of those three steps.
But the GC is only one piece. This chapter showed the whole system: source text becomes an AST, the AST becomes Lox-dialect MLIR, the Lox dialect lowers to runtime calls and standard MLIR, standard MLIR lowers to LLVM IR, and LLVM IR becomes machine code. The Lox dialect is the narrow waist — the only part that knows about Lox’s semantics. Everything below it is generic infrastructure. That’s the payoff for the layered approach: you write the interesting part, and MLIR handles the boring part.
File Summary
| File | Content |
|---|---|
| Part 2 | GC concepts, object headers, allocation |
| Part 3 | Shadow stack, mark phase, sweep phase |
| Part 4 | MLIR dialect, lowering to LLVM |
| Part 5 | Closures and environments |
You now have the full picture. Time to add the types that make Lox more than a calculator.
Next: Part 7 — Classes and Instances — Every value in our compiler is an f64. That works for numbers, but Lox has classes, instances, and strings — and they can’t be represented as a single float. It’s time to graduate to tagged unions: an (i8, i64) struct where the tag says what the value is and the payload holds the bits. Every arith.addf becomes tag-check-extract-compute-repack — but only when the tag says it’s a number.