Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

MLIR for Lox: Part 6 — The Complete Project — Every Dialect Operation in One Place

Five parts. A garbage collector. Root tracking. MLIR integration. Closures. Now you need to see how they fit together — not as individual chapters, but as a single running system. This part is the assembled view: file structure, object layout, API reference, compilation pipeline, debugging guide, and performance notes.

Read it straight through for the big picture. Jump to a section when you need a detail.

Complete File Structure

lox-mlir/
├── ast/           # AST type definitions
├── lexer/         # Tokenization
├── parser/        # Parsing
├── analysis/      # Closure capture analysis
├── codegen/       # MLIR code generation, Lox dialect, lowering
├── runtime/       # Object header, garbage collector, shadow stack
├── examples/      # Example Lox programs (simple, closures, GC stress)
└── tests/         # Integration tests

Complete Object Layout

Every object in Lox has this layout:

┌─────────────────────────────────────────────────────┐
│ ObjHeader (16 bytes)                                │
├─────────────────────────────────────────────────────┤
│ Field         │ Offset │ Size │ Description        │
├───────────────┼────────┼──────┼────────────────────┤
│ marked        │ 0      │ 1    │ GC mark flag       │
│ obj_type      │ 1      │ 1    │ ObjType enum       │
│ padding       │ 2      │ 2    │ Alignment padding  │
│ size          │ 4      │ 4    │ Data size in bytes │
│ next          │ 8      │ 8    │ Ptr to next obj   │
└─────────────────────────────────────────────────────┘

Total header size: 16 bytes
Data follows immediately after header (offset +16).

Type-Specific Layouts

Number (ObjType::Number)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ value: f64 (8 bytes)         │
└──────────────────────────────┘
Total: 24 bytes

String (ObjType::String)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ length: usize (8 bytes)      │
│ hash: u64 (8 bytes)          │
│ chars: [u8; length]          │
└──────────────────────────────┘
Total: 32 + length bytes

Environment (ObjType::Environment)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ enclosing: *mut Env (8)      │
│ slot_count: usize (8)        │
│ slots: [*mut u8; slot_count] │
└──────────────────────────────┘
Total: 32 + (8 × slot_count) bytes

Closure (ObjType::Closure)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ function_index: usize (8)    │
│ environment: *mut Env (8)    │
└──────────────────────────────┘
Total: 32 bytes

Instance (ObjType::Instance)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ class: *mut Class (8)        │
│ field_count: usize (8)       │
│ fields: [(*str, *val); n]    │
└──────────────────────────────┘
Total: 32 + (16 × field_count) bytes

Complete GC API Reference

Runtime Functions (C ABI)

The C ABI convention here puts obj_type before size. If you worked through Part 5, you’ll notice this is the opposite order — Part 5 calls the Rust runtime directly with (size, obj_type), matching the Rust function signature. The extern "C" wrapper registered with the JIT reorders the arguments: the MLIR call site passes (obj_type, size), the C wrapper receives them in that order, then calls the Rust alloc(size, obj_type) with the arguments swapped back to the Rust function’s expected order. Both sections are internally consistent — Part 5 matches the Rust signature, and Part 7 matches the C ABI convention.

// Allocation
void* lox_runtime_alloc(uint8_t obj_type, size_t size);
void* lox_runtime_alloc_environment(size_t slot_count, void* enclosing);
void* lox_runtime_alloc_closure(size_t function_index, void* environment);

// Environment access
void lox_runtime_env_set(void* env, size_t index, void* value);
void lox_runtime_env_set_number(void* env, size_t index, double value);
void* lox_runtime_env_get(void* env, size_t index);

// Shadow stack
void** gc_push_frame(size_t root_count);
void gc_pop_frame(void);
void gc_set_root(void* frame, size_t index, void* value);

// Collection
void gc_collect(void);
size_t gc_object_count(void);

MLIR Operations

lox.alloc :: (obj_type: i8, size: i64) -> !llvm.ptr
  Allocates a heap object with the given type and data size (not including header).

lox.push_frame :: (root_count: i32) -> !llvm.ptr
  Pushes a shadow stack frame with the given number of root slots.
  Returns pointer to roots array.

lox.pop_frame :: () -> ()
  Pops the current shadow stack frame.

lox.set_root :: (index: i32, value: !llvm.ptr) -> ()
  Sets a root in the current frame.

lox.store :: (obj: !llvm.ptr, value: f64) -> ()
  Stores a number into a heap object's data field.

lox.load :: (obj: !llvm.ptr) -> f64
  Loads a number from a heap object's data field.

lox.print :: (value: f64) -> ()
  Prints a value via the runtime.

lox.alloc_environment :: (slot_count: i64, enclosing: !llvm.ptr) -> !llvm.ptr
  Allocates an environment with the given number of variable slots.
  `enclosing` is a null pointer for top-level environments, or a pointer to
  the parent environment for nested closures.

lox.alloc_closure :: (function_index: i64, environment: !llvm.ptr) -> !llvm.ptr
  Allocates a closure struct: a function index (position in the compiler's
  function table) and a pointer to the closure's environment.

lox.env_set :: (env: !llvm.ptr, index: i64, value: !llvm.ptr) -> ()
  Sets a slot in an environment. For the numbers-only model, use lox.env_set_number instead
  (passing an f64 through a void* parameter violates the C calling convention — the callee
  would read the wrong register).

lox.env_set_number :: (env: !llvm.ptr, index: i64, value: f64) -> ()
  Sets a slot in an environment, boxing the f64 into a heap-allocated Number.
  Numbers-only model only. The generic env_set takes a void* for the tagged-union model.

lox.env_get :: (env: !llvm.ptr, index: i64) -> !llvm.ptr
  Gets a slot from an environment.

The Compilation Pipeline

┌─────────────────────────────────────────────────────────────┐
│ 1. LEXING                                                   │
│    "var x = 1 + 2;" → [VAR, IDENT, EQUAL, NUMBER, ...]      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 2. PARSING                                                  │
│    Tokens → AST                                             │
│    VarStmt { name: "x", init: BinaryExpr { ... } }          │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 3. ANALYSIS                                                 │
│    Find captured variables, compute root counts             │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 4. CODE GENERATION                                          │
│    AST → MLIR (Lox dialect)                                 │
│    func.func @main() {                                      │
│      %frame = lox.push_frame root_count = 1                 │
│      %x = lox.alloc obj_type = 0, size = 8                  │
│      lox.set_root index = 0, %x                             │
│      ...                                                    │
│    }                                                        │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 5. LOWERING                                                 │
│    MLIR (Lox dialect) → MLIR (LLVM dialect)                 │
│    lox.alloc → func.call @lox_runtime_alloc                 │
│    lox.push_frame → func.call @gc_push_frame                │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 6. LLVM IR GENERATION                                       │
│    MLIR → LLVM IR                                           │
│    define ptr @lox_runtime_alloc(...) { ... }               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 7. NATIVE CODE GENERATION                                   │
│    LLVM IR → Machine code                                   │
│    Link with liblox_runtime.a                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 8. EXECUTION                                                │
│    ./program                                                │
│    Runtime GC manages memory                                │
└─────────────────────────────────────────────────────────────┘

IR at Each Stage: A Concrete Example

The pipeline diagram above shows the flow, but it doesn’t show what the IR actually looks like at each stage. Let’s trace a concrete Lox program through the entire compilation pipeline.

The program:

var x = 1 + 2;
print x;

Trivial, but it exercises every stage. Here’s what happens:

Stage 1–2: Lexing and Parsing

The parser produces an AST like:

Program
  VarDecl "x" = BinaryExpr(Num(1), Plus, Num(2))
  PrintStmt Name("x")

Stage 3: Analysis

No closures or captured variables in this program. The analysis pass computes root counts: x is a local variable, so it needs one GC root.

Stage 4: Code Generation (Lox Dialect MLIR)

The code generator walks the AST and emits MLIR using our custom Lox dialect operations. This example uses the GC-aware representation (heap-allocated objects, shadow stack roots) from Parts 3–6 — not the simpler numbers-only model from Part 1. The difference shows up in the function signature: func.func @main() returns nothing here, while Part 1’s func.func @main() -> f64 returns the result of the last expression. Both are valid; the GC-aware version is what a real Lox compiler would produce.

module {
  func.func @main() {
    %frame = lox.push_frame root_count = 1 : i32
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %sum = arith.addf %one, %two : f64
    %x = lox.alloc obj_type = 0, size = 8 : i64
    lox.set_root index = 0, %x : i32
    lox.store %x, %sum : f64
    %loaded = lox.load %x : f64
    lox.print %loaded : f64
    lox.pop_frame
    func.return
  }
}

Notice the mix: arith.addf is a standard MLIR arithmetic operation, but lox.push_frame, lox.alloc, lox.set_root, and lox.print are our custom dialect. At this stage, the IR is close to the source — the Lox-level operations are still visible.

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

The lowering pass replaces each lox.* operation with standard MLIR and runtime calls:

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %sum = arith.addf %one, %two : f64
    %obj_type = arith.constant 0 : i8    // ObjType::Number
    %size = arith.constant 8 : i64      // data size only (f64 = 8 bytes; lox.alloc adds header)
    %x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
    %index = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %index, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
    // Store the number into the object's data field
    // %x is already the data pointer — alloc() returns ptr past the header
    llvm.store %sum, %x : f64, !llvm.ptr
    %loaded = llvm.load %x : !llvm.ptr -> f64
    func.call @lox_print(%loaded) : (f64) -> ()
    func.call @gc_pop_frame() : () -> ()
    func.return
  }
}

The Lox dialect is gone. Every lox.* operation has been replaced by a func.call to a runtime function or a standard LLVM operation. The scf.if and scf.while operations (from control flow in more complex programs) would be lowered to cf (control flow) dialect operations at this stage.

This is a simplification — the actual lowered IR has more detail around pointer arithmetic and type casting. But the pattern is: custom dialect operations become runtime calls and standard MLIR operations.

Note: The llvm.store and llvm.load syntax in Stage 5 examples is simplified. With opaque pointers (!llvm.ptr), storing an f64 into a !llvm.ptr slot requires a two-step conversion (bitcast f64i64, then inttoptr i64!llvm.ptr), as shown in Part 4. We simplify the syntax here for readability — the real MLIR LLVM dialect requires the two-step conversion. The key point is the data flow: values are stored into and loaded from the right memory locations. Stage 6 (LLVM IR) uses the valid native syntax.

Stage 6: LLVM IR

After the MLIR-to-LLVM conversion pass, the IR becomes LLVM IR:

define void @main() {
entry:
  %frame = call ptr @gc_push_frame(i32 1)
  %x = call ptr @lox_runtime_alloc(i32 0, i64 8)
  call void @gc_set_root(ptr %frame, i32 0, ptr %x)
  store f64 3.0, ptr %x
  %loaded = load f64, ptr %x
  call void @lox_print(f64 %loaded)
  call void @gc_pop_frame()
  ret void
}

Notice that 1.0 + 2.0 has been folded into the constant 3.0 — this is a free optimization from MLIR’s arith dialect constant folding.

Stage 7–8: Native Code and Execution

LLVM compiles the IR to native machine code, links it with the runtime (liblox_runtime.a), and produces an executable. When you run it:

3

Why This Matters

Looking at the IR at each stage reveals something important: the Lox dialect is the only stage that’s specific to our language. Everything after Stage 5 is generic — standard MLIR operations, standard LLVM IR, standard machine code. The Lox dialect is the narrow waist of the hourglass.

This is the whole point of MLIR. You define a dialect that captures what makes your language unique (Lox’s GC-aware operations, in our case). Then you lower to standard dialects and let the framework handle the rest. You write the interesting part; MLIR handles the boring part.

IR at Each Stage: A Program with Control Flow

The var x = 1 + 2; print x; example only shows straight-line code. Let’s trace a program with an if statement to see how control flow appears in the IR.

var x = 10;
if (x > 5) {
  print x;
} else {
  print 0;
}

This introduces a comparison, a conditional branch, and two alternative paths. Here’s what the IR looks like at the key stages:

Stage 4: Code Generation (Lox Dialect MLIR)

module {
  func.func @main() {
    %frame = lox.push_frame root_count = 1 : i32
    %ten = arith.constant 10.0 : f64
    %x = lox.alloc obj_type = 0, size = 8 : i64
    lox.set_root index = 0, %x : i32
    lox.store %x, %ten : f64

    %loaded = lox.load %x : f64
    %five = arith.constant 5.0 : f64
    %cond = arith.cmpf ogt, %loaded, %five : f64

    scf.if %cond {
      // then-block
      %val = lox.load %x : f64
      lox.print %val : f64
    } else {
      // else-block
      %zero = arith.constant 0.0 : f64
      lox.print %zero : f64
    }

    lox.pop_frame
    func.return
  }
}

Two new things appear: arith.cmpf (a floating-point comparison) and scf.if (structured control flow). The scf dialect is MLIR’s standard way to represent conditionals — it has a then-region and an else-region, each in their own { } block. Unlike LLVM’s basic-block branches, scf.if is structured: the then and else regions can’t be jumped into from outside. This makes the IR easier to analyze and transform.

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
    %ten = arith.constant 10.0 : f64
    %obj_type = arith.constant 0 : i8
    %size = arith.constant 8 : i64
    %x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
    llvm.store %ten, %x : f64, !llvm.ptr
    %loaded = llvm.load %x : !llvm.ptr -> f64
    %five = arith.constant 5.0 : f64
    %cond = arith.cmpf ogt, %loaded, %five : f64

    scf.if %cond {
      %val = llvm.load %x : !llvm.ptr -> f64
      func.call @lox_print(%val) : (f64) -> ()
    } else {
      %zero = arith.constant 0.0 : f64
      func.call @lox_print(%zero) : (f64) -> ()
    }

    func.call @gc_pop_frame() : () -> ()
    func.return
  }
}

The scf.if is still here — it doesn’t get lowered until the scf-to-cf conversion pass. The lox.* operations have been replaced by runtime calls and LLVM memory operations, but the structured control flow remains. This is by design: keeping scf.if as long as possible lets MLIR’s transformation passes reason about the control flow structure.

Later, the scf-to-cf pass will lower scf.if to cf.cond_br (a conditional branch between basic blocks), and the cf-to-llvm pass will lower that to LLVM’s br and cond_br instructions. But that happens after all the high-level transformations are done.

What’s different from straight-line code?

The scf.if operation creates regions — separate blocks of IR for the then and else paths. Each region can define its own SSA values. The arith.cmpf produces an i1 (boolean) value that drives the branch.

The Lox dialect doesn’t have its own if operation — we use MLIR’s standard scf.if directly. This is a design choice: some languages define their own conditional operation in their dialect, then lower it to scf.if. We skip that step because scf.if already says everything we need.

GC roots are shared across both branches. The %frame and %x values are available in both the then-region and else-region — MLIR regions can see values from their enclosing scope. This means we don’t need to push/pop roots for each branch; the pre-if roots cover both paths.


IR at Each Stage: A Program with Closures

Functions and closures are where the GC-aware model really shows its value. Let’s trace a program that defines a function, captures a variable, and calls the result:

fun makeAdder(n) {
  fun add(x) {
    return x + n;
  }
  return add;
}

var add5 = makeAdder(5);
print add5(3);

This program creates a closure (add) that captures n from its enclosing scope. The closure outlives makeAdder’s stack frame — that’s the whole reason we need heap-allocated closure environments and GC roots.

Stage 4: Code Generation (Lox Dialect MLIR)

module {
  func.func @main() {
    // One root: the closure returned by makeAdder. makeAdder itself is a
    // top-level function — we call it directly, so it doesn't need a closure
    // object. Only the returned closure (which captures n) needs to be rooted.
    %frame = lox.push_frame root_count = 1 : i32

    // Call makeAdder(5) — direct call, no closure allocation needed
    %five = arith.constant 5.0 : f64
    %add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
    lox.set_root index = 0, %add5_closure : !llvm.ptr

    // Call add5(3) — extract the environment from the closure
    // (matches the calling convention from Part 5: closure functions
    // receive the environment pointer, not the closure pointer)
    %three = arith.constant 3.0 : f64
    %env_ptr = llvm.load %add5_closure[8] : !llvm.ptr -> !llvm.ptr  // shorthand (see note below)
    %result = func.call @add(%three, %env_ptr) : (f64, !llvm.ptr) -> f64
    lox.print %result : f64

    lox.pop_frame
    func.return
  }

  func.func @makeAdder(%n: f64) -> !llvm.ptr {
    %frame = lox.push_frame root_count = 1 : i32

    // Allocate environment with 1 slot for the captured variable 'n'
    %null_env = llvm.mlir.nullptr : !llvm.ptr  // no enclosing environment
    %env = lox.alloc_environment slot_count = 1, enclosing = %null_env : !llvm.ptr
    lox.set_root index = 0, %env : !llvm.ptr

    // Store n into environment slot 0 (env_set_number boxes the f64 into a heap Number)
    lox.env_set_number %env, index = 0, %n : !llvm.ptr, i64, f64

    // Allocate closure for 'add', pointing to the environment
    // The compiler assigns function indices in compilation order.
    // @main and @makeAdder are called directly (no closure table entry
    // needed). @add is the first closure — index 1. (Index 0 would be
    // used if the first closure-creating function in compilation order
    // needed a table entry.)
    %c1 = arith.constant 1 : i64  // function_index for @add
    %add_closure = lox.alloc_closure function_index = %c1, environment = %env : !llvm.ptr

    lox.pop_frame
    func.return %add_closure : !llvm.ptr
  }

  func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
    // Load captured variable 'n' from the environment (slot 0)
    // The environment pointer is received directly — the caller extracts
    // it from the closure before calling us (see the calling convention
    // in Part 5's "How Does the Runtime Call a Closure?" section).
    %n_ptr = lox.env_get %env, index = 0 : (!llvm.ptr, i64) -> !llvm.ptr
    %n = llvm.load %n_ptr : !llvm.ptr -> f64
    %sum = arith.addf %x, %n : f64
    func.return %sum : f64
  }
}

%env_ptr = llvm.load %add5_closure[8] is a shorthand. The [8] isn’t valid MLIR syntax — it stands for llvm.getelementptr (to compute the address of the environment field, 8 bytes after the start of the closure’s data area) followed by llvm.load (to read the pointer at that address). Stage 5 shows the real two-step version. The shorthand here is for readability — the Lox dialect is our notation, and this operation maps to the two-step sequence during lowering.

lox.alloc_environment and lox.alloc_closure are separate heap objects. The environment holds the captured variables; the closure holds a function index and a pointer to the environment. This matches the runtime API from Part 5: alloc_environment creates a slot array, alloc_closure creates a closure struct.

lox.env_set_number and lox.env_get handle captured variable access through the environment API. env_set_number stores an f64 into a slot (the runtime boxes it into a heap Number internally). env_get returns a pointer to the slot’s contents. We use env_set_number instead of the generic env_set (which takes void*) because passing an f64 through a void* parameter violates the C calling convention — on x86-64, f64 is passed in an XMM register while void* goes in a general-purpose register.

The environment pointer is passed as a parameter. @add takes %env: !llvm.ptr as its second argument. The caller extracts the environment pointer from the closure (at data offset +8, after the function_index field) before calling @add. This matches the calling convention established in Part 5: when a closure is called, the runtime reads the environment field from the closure object and passes it to the function. The closure function receives the environment pointer directly — it doesn’t need the function_index because it already knows which function it is.

Why pass the environment pointer instead of the whole closure? The closure function doesn’t need its own function_index — it already knows which function it is. The only thing it needs from the closure is the environment pointer, so that’s all the calling convention passes. A production compiler might pass the whole closure pointer and have the callee extract the environment, but the environment-only convention is simpler and matches how indirect calls work (the runtime reads the environment from the closure before the call).

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr

    // Call makeAdder directly — no closure allocation needed for top-level functions
    %five = arith.constant 5.0 : f64
    %add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %add5_closure) : (!llvm.ptr, i32, !llvm.ptr) -> ()

    %three = arith.constant 3.0 : f64
    // Extract the environment from the closure before calling @add
    // (matches the calling convention from Part 5)
    %env_ptr_addr = llvm.getelementptr %add5_closure[8] : (!llvm.ptr) -> !llvm.ptr
    %add5_env = llvm.load %env_ptr_addr : !llvm.ptr -> !llvm.ptr
    %result = func.call @add(%three, %add5_env) : (f64, !llvm.ptr) -> f64
    func.call @lox_print(%result) : (f64) -> ()

    func.call @gc_pop_frame() : () -> ()
    func.return
  }

  func.func @makeAdder(%n: f64) -> !llvm.ptr {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr

    // Allocate environment with 1 slot for captured variable 'n'
    %slot_count = arith.constant 1 : i64
    %null_env = llvm.mlir.nullptr : !llvm.ptr  // no enclosing environment
    %env = func.call @lox_runtime_alloc_environment(%slot_count, %null_env) : (i64, !llvm.ptr) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %env) : (!llvm.ptr, i32, !llvm.ptr) -> ()

    // Store n into environment slot 0 (env_set_number boxes f64 into heap Number)
    %slot_idx = arith.constant 0 : i64
    func.call @lox_runtime_env_set_number(%env, %slot_idx, %n) : (!llvm.ptr, i64, f64) -> ()

    // Allocate closure for 'add', pointing to the environment
    %func_idx = arith.constant 1 : i64  // function_index for @add
    %add_closure = func.call @lox_runtime_alloc_closure(%func_idx, %env) : (i64, !llvm.ptr) -> !llvm.ptr

    func.call @gc_pop_frame() : () -> ()
    func.return %add_closure : !llvm.ptr
  }

  func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
    // The environment pointer is passed directly (see calling convention in Part 5)
    // Load captured n from environment slot 0
    %slot_idx = arith.constant 0 : i64
    %n_ptr = func.call @lox_runtime_env_get(%env, %slot_idx) : (!llvm.ptr, i64) -> !llvm.ptr
    %n = llvm.load %n_ptr : !llvm.ptr -> f64
    %sum = arith.addf %x, %n : f64
    func.return %sum : f64
  }
}

The lox.* operations are gone, replaced by func.call to runtime functions and LLVM memory operations. Notice that the closure allocation now properly separates the environment (alloc_environment) from the closure struct (alloc_closure), matching the runtime API from Part 5. Captured variables go through env_set_number / env_get — not raw llvm.store into the closure’s data area.

Stage 6: LLVM IR

define ptr @makeAdder(double %n) {
entry:
  %frame = call ptr @gc_push_frame(i32 1)
  ; Allocate environment with 1 slot for captured variable 'n'
  %env = call ptr @lox_runtime_alloc_environment(i64 1, ptr null)  ; 1 slot, no enclosing env
  call void @gc_set_root(ptr %frame, i32 0, ptr %env)
  ; Store n into environment slot 0 (boxes f64 into heap Number)
  call void @lox_runtime_env_set_number(ptr %env, i64 0, double %n)
  ; Allocate closure pointing to the environment
  %closure = call ptr @lox_runtime_alloc_closure(i64 1, ptr %env)  ; function_index=1 for @add
  call void @gc_pop_frame()
  ret ptr %closure
}

define double @add(double %x, ptr %env) {
entry:
  ; The environment pointer is passed directly (see calling convention in Part 5)
  ; Load captured n from environment slot 0
  %n_ptr = call ptr @lox_runtime_env_get(ptr %env, i64 0)
  %n = load double, ptr %n_ptr
  %sum = fadd double %x, %n
  ret double %sum
}

Notice how the closure becomes a simple pointer in LLVM IR. The %env argument to @add is a ptr — LLVM doesn’t know or care that it points to a GC-managed closure environment. The GC semantics are entirely in the runtime (gc_push_frame, gc_set_root, gc_pop_frame). LLVM sees a pointer, does pointer arithmetic, and moves on.

The LLVM IR now properly separates the environment from the closure. makeAdder allocates an environment, stores the captured variable into it, and then allocates a closure that points to the environment. In @add, the captured variable is accessed through env_get — not by writing directly into the closure’s data area. This matches the runtime API and avoids out-of-bounds writes: the closure struct only has two fields (function_index at offset +0 and environment at offset +8), and captured variables live in the separate environment object.

What’s different from the control flow example?

The closure example has three functions instead of one — each gets its own func.func, its own GC frame, and its own root set. makeAdder allocates an environment (rooting it before any further allocations), stores the captured variable into it, allocates a closure pointing to that environment, and returns the closure. The caller (main) roots the returned closure so the GC doesn’t collect it after makeAdder’s frame is popped, then extracts the environment pointer from the closure before calling @add — this matches the calling convention from Part 5, where the runtime reads the environment field from the closure object and passes it directly. The @add function receives the environment pointer as its second argument and uses env_get to access captured variables. Each function pushes its own GC frame and pops it before returning — roots are scoped to the function that registered them. makeAdder’s roots are popped when it returns; main must root the returned closure in its own frame.


Debugging Your GC

Common Bugs

1. Use After Free

Symptom: Random crashes, corrupted data
Cause: Object freed while still referenced
Fix: Check that all references are registered as roots

2. Memory Leaks

Symptom: Memory grows unbounded
Cause: Objects not being freed
Fix: Check mark phase visits all reachable objects

3. Premature Collection

Symptom: Object disappears mid-function
Cause: Root not registered in shadow stack
Fix: Ensure gc_set_root is called for all local references

Debugging Tools

// Pseudocode — debug utilities implemented in C, matching the runtime
// (liblox_runtime.a). These walk the same global data structures the GC
// uses: the all-objects linked list and the shadow stack.

/* Print all objects in the heap (for debugging) */
void gc_debug_print_heap(void) {
    ObjHeader *current = all_objects;  // global linked list head
    printf("=== HEAP CONTENTS ===\n");
    while (current != NULL) {
        printf("  %p: type=%d, marked=%d, size=%zu\n",
               (void*)current,
               current->obj_type,
               current->marked,
               current->size);
        current = current->next;
    }
}

/* Print the shadow stack (for debugging) */
void gc_debug_print_stack(void) {
    ShadowFrame *current = shadow_stack_head;  // global stack head
    int frame_num = 0;
    printf("=== SHADOW STACK ===\n");
    while (current != NULL) {
        printf("  Frame %d (%zu roots):\n", frame_num, current->root_count);
        for (size_t i = 0; i < current->root_count; i++) {
            printf("    [%zu] = %p\n", i, (void*)current->roots[i]);
        }
        current = current->next;
        frame_num++;
    }
}

Performance Considerations

Allocation Overhead

OperationTimeNotes
alloc()O(1)Plus possible GC trigger
gc_push_frame()O(1)Allocate and link
gc_pop_frame()O(1)Unlink and free
gc_collect()O(live objects)Mark + sweep

Tuning Parameters

#![allow(unused)]
fn main() {
// Adjust these based on your workload

/// Trigger GC after this many allocations
const GC_THRESHOLD: usize = 1024;

/// Maximum objects before forced collection
const GC_MAX_OBJECTS: usize = 10000;

/// Enable debug output
const GC_DEBUG: bool = false;
}

Optimization Opportunities

  1. Generational GC: Most objects die young. Only scan recent allocations.
  2. Incremental GC: Don’t pause for full collection. Do it in small chunks.
  3. Parallel GC: Use multiple threads for mark/sweep.
  4. Escape Analysis: If an object doesn’t escape a function, put it on the stack.

What’s Next?

You now have a complete garbage collector. The immediate path forward is getting it running — compile the runtime, write a simple test program, and verify the GC handles the edge cases. Then add more object types: arrays, maps, and classes (which the next part covers in detail).

Beyond that, there are four directions for the collector itself. Generational collection splits the heap into a young generation (frequent, small collections) and an old generation (rare, large collections) — most objects die young, so collecting only the young generation is fast. Incremental collection breaks mark/sweep into small chunks to reduce pause times. Compacting collection moves objects to eliminate fragmentation — this requires precise GC, which we already have. Concurrent collection runs the GC in parallel with program execution, which is complex but powerful.

For further reading, Crafting Interpreters covers the GC chapter we skipped, the GC Handbook is the definitive reference, and MLIR’s documentation covers advanced dialect features. If you want relocating GC, LLVM’s statepoints documentation is the place to start.


Appendix A: Quick Reference Card

┌──────────────────────────────────────────────────────────────┐
│                    GC QUICK REFERENCE                        │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  MARK-SWEEP ALGORITHM                                        │
│  ─────────────────────                                       │
│  1. Mark all roots (shadow stack)                            │
│  2. Recursively mark referenced objects                      │
│  3. Sweep: free all unmarked objects                         │
│                                                              │
│  ROOTS                                                       │
│  ─────                                                       │
│  • Global variables                                          │
│  • Local variables (in shadow stack frames)                  │
│  • Temporaries (also in shadow stack)                        │
│                                                              │
│  OBJECT LIFECYCLE                                            │
│  ─────────────────                                           │
│  1. alloc() → Object on heap                                 │
│  2. gc_set_root() → Register in shadow stack                 │
│  3. Use object                                               │
│  4. gc_pop_frame() → Remove from shadow stack                │
│  5. gc_collect() → Free if unreachable                       │
│                                                              │
│  CLOSURES                                                    │
│  ────────                                                    │
│  • Captured vars → Environment (heap)                        │
│  • Closure = function pointer + environment pointer          │
│  • Mark closure → Mark environment → Mark captured vars      │
│                                                              │
│  MLIR OPS (conceptual)                                       │
│  ────────────────────                                        │
│  lox.alloc         → Allocate object                         │
│  lox.push_frame    → Push shadow stack frame                 │
│  lox.pop_frame     → Pop shadow stack frame                  │
│  lox.set_root      → Register root                           │
│                                                              │
│  These are the Lox dialect operations (defined in            │
│  lox_dialect.rs). Each lowers to a func.call to a runtime   │
│  function (e.g., lox.alloc → @lox_runtime_alloc) during      │
│  the lowering pass.                                           │
└──────────────────────────────────────────────────────────────┘

Appendix B: Error Messages

ErrorCauseFix
No frame to pop!gc_pop_frame called without matching gc_push_frameCheck function prologue/epilogue pairing
Out of memory!Allocation failedCheck for memory leaks, tune GC threshold
Variable not found: xCapture analysis failedCheck scope tracking in analyzer
Object not marked during sweepInternal GC assertion (not user-facing)Check mark phase — all reachable objects should be marked before sweep

Appendix C: Study Roadmap

Use this checklist to track your progress through the GC tutorial:

Phase 1: Understanding (Read-Only)

  • Part 2: Understand the GC problem, mark-sweep algorithm, and object allocation
  • Part 3: Understand shadow stacks, root tracking, and the full mark/sweep cycle
  • Part 4: Understand the MLIR dialect, lowering, and function code generation
  • Part 5: Understand closure capture, environments, and closure code generation

Phase 2: Implementation (Hands-On)

  • Create the runtime module structure
  • Implement ObjHeader and ObjType
  • Implement alloc() function
  • Implement shadow stack (StackFrame, gc_push_frame, gc_pop_frame)
  • Implement mark phase (gc_mark, mark_object, mark_references)
  • Implement sweep phase (gc_sweep)
  • Implement gc_collect() entry point
  • Test allocation without GC
  • Test mark phase with simple objects
  • Test full GC cycle
  • Add debug utilities (gc_debug_print_heap, etc.)
  • Implement environment allocation
  • Implement closure allocation
  • Test closure GC

Phase 3: MLIR Integration

  • Set up Melior dependency
  • Define Lox dialect operations
  • Implement lowering pass
  • Generate MLIR for simple functions
  • Lower to LLVM IR
  • Link with runtime
  • Test end-to-end execution

Phase 4: Advanced Features

  • Add string type
  • Add instance type
  • Implement nested closure environments
  • Add generational GC (optional)
  • Add incremental collection (optional)

Appendix D: Glossary

TermDefinition
HeapThe pool of memory where objects are allocated
StackMemory for function calls and local variables
RootA pointer that the GC treats as always live (globals, locals)
ReachableAn object that can be found by following pointers from roots
MarkTo flag an object as “still live” during GC
SweepTo free all objects that weren’t marked
Shadow StackA separate stack that tracks GC roots for each function call
EnvironmentA heap-allocated structure holding captured variables for closures
ClosureA function together with its captured environment
DialectA set of operations in MLIR for a specific domain
LoweringConverting high-level IR to lower-level IR

Appendix E: Further Reading

Books

  • “Crafting Interpreters” by Bob Nystrom - Chapter on garbage collection
  • “The Garbage Collection Handbook” by Jones et al. - Comprehensive GC reference
  • “Engineering a Compiler” by Cooper & Torczon - GC in the context of compilers

Papers

  • “Uniprocessor Garbage Collection Techniques” - Wilson (1992) - Classic survey
  • “A Unified Theory of Garbage Collection” - Bacon et al. (2004) - Theoretical foundation

Code

  • Lua 5.4 source - Simple, readable GC implementation
  • MicroPython - Mark-sweep GC for embedded systems
  • mruby - Lightweight Ruby with simple GC

MLIR/LLVM

  • MLIR Documentation - https://mlir.llvm.org/docs/
  • Melior Repository - Rust bindings for MLIR
  • LLVM GC Documentation - https://llvm.org/docs/GarbageCollection.html

Conclusion

The GC’s algorithm is three steps: track what’s reachable, mark everything reachable, free everything else. Everything else — generational collection, incremental collection, compacting collectors — is optimization on top of those three steps.

But the GC is only one piece. This chapter showed the whole system: source text becomes an AST, the AST becomes Lox-dialect MLIR, the Lox dialect lowers to runtime calls and standard MLIR, standard MLIR lowers to LLVM IR, and LLVM IR becomes machine code. The Lox dialect is the narrow waist — the only part that knows about Lox’s semantics. Everything below it is generic infrastructure. That’s the payoff for the layered approach: you write the interesting part, and MLIR handles the boring part.


File Summary

FileContent
Part 2GC concepts, object headers, allocation
Part 3Shadow stack, mark phase, sweep phase
Part 4MLIR dialect, lowering to LLVM
Part 5Closures and environments

You now have the full picture. Time to add the types that make Lox more than a calculator.


Next: Part 7 — Classes and Instances — Every value in our compiler is an f64. That works for numbers, but Lox has classes, instances, and strings — and they can’t be represented as a single float. It’s time to graduate to tagged unions: an (i8, i64) struct where the tag says what the value is and the payload holds the bits. Every arith.addf becomes tag-check-extract-compute-repack — but only when the tag says it’s a number.