MLIR for Lox: Part 4 — Wiring the GC into MLIR — Dialect Operations That Manage Memory for You

You’ve built a garbage collector from scratch. It allocates objects, tracks roots via a shadow stack, and frees unreachable memory. It works — but it’s conceptual code in Rust. The Lox compiler doesn’t know about it yet.

This part bridges the gap. We’ll define a custom MLIR dialect with operations like lox.alloc and lox.set_root, wire those operations into the code generator from Part 1, and lower them to calls into the GC runtime. By the end, the MLIR code generator will produce IR that automatically manages memory through our GC — no manual malloc or free in sight.

The Runtime Module

First, let’s organize our runtime into a proper Rust module:

#![allow(unused)]
fn main() {
// src/runtime/mod.rs

pub mod object;
pub mod shadow_stack;
pub mod gc;

// Re-export the public API
pub use object::{ObjHeader, ObjType};
pub use gc::{alloc, gc_collect};
pub use shadow_stack::{gc_push_frame, gc_pop_frame, gc_set_root};
}

And the Cargo.toml needs to build it as a static library:

# Cargo.toml

[package]
name = "lox-runtime"
version = "0.1.0"
edition = "2021"

[lib]
name = "lox_runtime"
crate-type = ["staticlib", "cdylib"]

Why both? The static library (.a) is what the linker uses when we compile the final binary — clang output.o -llox_runtime. The dynamic library (.so) is needed if you want to load the runtime at runtime with dlopen, which some JIT setups require. We build both so the same compiled runtime works for either linking approach.

The Lox MLIR Dialect

We need MLIR operations that correspond to our GC operations:

Operation	Meaning	LLVM Lowering
`lox.alloc`	Allocate a heap object	Call `lox_runtime::alloc`
`lox.set_root`	Register a value as a GC root	Store in shadow stack slot
`lox.push_frame`	Push a stack frame	Call `gc_push_frame`
`lox.pop_frame`	Pop a stack frame	Call `gc_pop_frame`

Defining Operations in Melior

Melior doesn’t use TableGen like C++ MLIR. We define operations directly in Rust:

#![allow(unused)]
fn main() {
// src/codegen/lox_dialect.rs

use melior::{
    Context, Location,
    dialect::Dialect,
    ir::{Operation, OperationBuilder, OperationLike, Region, RegionLike, Type, Value},
};

/// The Lox dialect namespace
pub const DIALECT_NAME: &str = "lox";

/// Operation names
pub mod ops {
    // These four are used in this part:
    pub const ALLOC: &str = "lox.alloc";
    pub const PUSH_FRAME: &str = "lox.push_frame";
    pub const POP_FRAME: &str = "lox.pop_frame";
    pub const SET_ROOT: &str = "lox.set_root";
    // These two are defined here for completeness.
    // Part 6 (Complete Reference) shows them in action — they
    // load and store values from heap-allocated environments
    // and instance fields, which don't exist yet in the
    // numbers-only model.
    pub const LOAD: &str = "lox.load";
    pub const STORE: &str = "lox.store";
}

/// Create a lox.alloc operation
/// 
/// This allocates a heap object of the given type.
/// Returns a pointer to the object data (after the header).
pub fn create_alloc<'c>(
    context: &'c Context,
    obj_type: u8,      // ObjType enum value
    size: usize,       // Size of object data in bytes
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    // The lox.alloc operation takes:
    //   - type: i8 (the ObjType enum)
    //   - size: i64 (allocation size)
    // And returns:
    //   - ptr: !llvm.ptr (pointer to object data)
    
    OperationBuilder::new(ops::ALLOC, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "obj_type"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i8").unwrap(), obj_type as i64).into()),
            (melior::ir::Identifier::new(context, "size"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), size as i64).into()),
        ])
        .add_results(&[Type::parse(context, "!llvm.ptr").unwrap()])
        .build()
        .unwrap()
}

/// Create a lox.push_frame operation
/// 
/// This pushes a new shadow stack frame with the given number of root slots.
/// Returns a pointer to the roots array.
pub fn create_push_frame<'c>(
    context: &'c Context,
    root_count: usize,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::PUSH_FRAME, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "root_count"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), root_count as i64).into()),
        ])
        .add_results(&[Type::parse(context, "!llvm.ptr").unwrap()])
        .build()
        .unwrap()
}

/// Create a lox.pop_frame operation
pub fn create_pop_frame<'c>(
    context: &'c Context,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::POP_FRAME, location)
        .build()
        .unwrap()
}

/// Create a lox.set_root operation
/// 
/// Sets a root in the current shadow stack frame.
pub fn create_set_root<'c>(
    context: &'c Context,
    root_index: usize,
    value: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::SET_ROOT, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "index"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), root_index as i64).into()),
        ])
        .add_operands(&[value])
        .build()
        .unwrap()
}
}

Lowering to LLVM

Our lox.* operations can’t be executed directly. MLIR doesn’t know what lox.alloc means — it’s a symbol we invented. To produce runnable code, every lox.* operation has to become something LLVM understands: function calls, memory operations, and LLVM dialect instructions.

This is called lowering, and it happens in stages:

Lox dialect          Standard MLIR dialects        LLVM dialect
─────────────    →    ──────────────────────    →    ────────────
lox.alloc              func.call @lox_alloc          llvm.call @lox_alloc
lox.push_frame         func.call @gc_push_frame      llvm.call @gc_push_frame
lox.set_root           func.call @gc_set_root        llvm.call @gc_set_root
arith.addf             arith.addf                    llvm.fadd
scf.if                 scf.if                        cf.cond_br + llvm.blockaddr
scf.if                 cf.cond_br (after scf→cf)     cf.cond_br (after cf→llvm)

Each arrow is a conversion pass. The Lox dialect lowers first (our custom pass). Then the standard MLIR dialects lower in sequence: scf → cf, cf → llvm, arith → llvm, func → llvm. The order matters — scf.if can’t lower directly to LLVM. It first becomes cf.cond_br (a branch), and then cf.cond_br becomes an LLVM branch.

Why not lower everything in one pass? Because MLIR dialects are designed to compose. The arith.addf inside a scf.if is the same operation regardless of whether it’s inside a conditional. By lowering one dialect at a time, each pass only needs to know about two things: the source dialect and the target dialect. A single monolithic pass would need to understand every dialect’s semantics — and would break the moment anyone adds a new dialect.

Now let’s see the code that does it:

#![allow(unused)]
fn main() {
// src/codegen/lowering.rs

use melior::{
    Context, Location, PassManager,
    ir::{Block, BlockLike, Module, Operation, OperationLike, Region, RegionLike, Value},
    dialect::{arith, func, llvm},
    pass::Pass,
};

// ========================================================================
// Constant helpers
// ========================================================================
// These create `arith.constant` operations for i8 and i64 values. They
// extract the result immediately without appending the operation to a
// block — the resulting Value is used as an *argument* to `func::call`
// operations that the lowering functions append later. This works in
// Melior because the Operation is an owned value and its results are
// accessible right away, even before the operation is placed in a block.
// (The `func::call` operations in the lowering functions below *are*
// appended to the block first, then their results are extracted — that's
// the append-then-extract pattern. The constant helpers skip the append
// step because they produce inputs, not standalone operations.)

/// Create an `arith.constant` with an i8 value.
fn create_const_i8<'c>(context: &'c Context, value: i8) -> melior::ir::Value<'c, 'c> {
    let location = Location::unknown(context);
    let op = arith::constant(
        context,
        melior::ir::attribute::IntegerAttribute::new(
            melior::ir::Type::parse(context, "i8").unwrap(),
            value as i64,
        ).into(),
        location,
    );
    op.result(0).unwrap().into()
}

/// Create an `arith.constant` with an i64 value.
fn create_const_i64<'c>(context: &'c Context, value: i64) -> melior::ir::Value<'c, 'c> {
    let location = Location::unknown(context);
    let op = arith::constant(
        context,
        melior::ir::attribute::IntegerAttribute::new(
            melior::ir::Type::parse(context, "i64").unwrap(),
            value,
        ).into(),
        location,
    );
    op.result(0).unwrap().into()
}

/// Lower lox.alloc to a runtime call
fn lower_alloc(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    // Get attributes
    // In melior 0.27, use IntegerAttribute::try_from to convert the
    // generic Attribute, then call .value() to get the i64.
    let obj_type = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("obj_type").unwrap()
    ).unwrap().value() as i64;
    let size = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("size").unwrap()
    ).unwrap().value() as i64;
    
    // Create constants for arguments
    let obj_type_val = create_const_i8(context, obj_type as i8);
    let size_val = create_const_i64(context, size);
    
    // Call lox_runtime_alloc(size, obj_type) — size first, obj_type second
    // This matches the Rust runtime's parameter order: alloc(size: usize, obj_type: ObjType)
    let call = block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "lox_runtime_alloc"),
        &[size_val, obj_type_val],
        &[Type::parse(context, "!llvm.ptr").unwrap()],
        location,
    ));
    
    // Replace uses of the original result with the call's result
    let new_result = call.result(0).unwrap();
    
    // (In real code, we'd track and replace all uses of the lox.alloc result)
}

/// Lower lox.push_frame to a runtime call
fn lower_push_frame(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    let root_count = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("root_count").unwrap()
    ).unwrap().value() as i64;
    
    let count_val = create_const_i64(context, root_count);
    
    // Call gc_push_frame(root_count)
    let call = block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "gc_push_frame"),
        &[count_val],
        &[Type::parse(context, "!llvm.ptr").unwrap()],
        location,
    ));
    
    // Replace uses of the original lox.push_frame result
    let new_result = call.result(0).unwrap();
}

/// Lower lox.pop_frame to a runtime call
fn lower_pop_frame(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    // Call gc_pop_frame()
    block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "gc_pop_frame"),
        &[],
        &[],
        location,
    ));
}

/// Lower lox.set_root to a store instruction
fn lower_set_root(op: &Operation, block: &mut Block, context: &Context, frame_ptr: Value<'c, 'c>) {
    let location = op.location();
    
    let root_index = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("index").unwrap()
    ).unwrap().value() as i64;
    let value = op.operand(0).unwrap();
    
    // Compute the address: frame_ptr[root_index]
    //
    // The MLIR-level lox.set_root can take either f64 or !llvm.ptr values
    // (our representation stores both numbers and heap pointers as f64).
    // At the LLVM level, the shadow stack is an array of pointers (!llvm.ptr),
    // so the element type for GEP is !llvm.ptr. When the root value is f64,
    // the conversion is a two-step process: bitcast f64 → i64 (reinterpret
    // the bits as an integer), then inttoptr i64 → !llvm.ptr (convert the
    // integer to a pointer). LLVM's bitcast cannot convert f64 → ptr
    // directly. On a 64-bit target the byte offsets are the same either
    // way (8 bytes per pointer, 8 bytes per f64), but the semantics
    // matter for 32-bit targets or if the value type ever changes.
    //
    // Note: the Rust code below calls `llvm::store(value, ...)` without
    // an explicit bitcast or inttoptr. Whether this works depends on
    // the Melior version — some versions of `llvm::store` accept
    // mismatched operand types and inject the conversion during
    // lowering, while others require the caller to produce the correct
    // type. The MLIR assembly above shows the explicit two-step
    // conversion that works in all versions. If `llvm::store` in your
    // version rejects the type mismatch, add the bitcast and inttoptr
    // before the store, following the pattern in the MLIR assembly.
    // In opaque-pointer mode (MLIR 17+), GEP carries the element type as
    // a parameter — no bitcast needed for the GEP itself.
    let index_const = create_const_i64(context, root_index);
    let root_ptr = block.append_operation(llvm::getelementptr(
        context,
        Type::parse(context, "!llvm.ptr").unwrap(), // pointer type
        Type::parse(context, "!llvm.ptr").unwrap(),  // element type — shadow stack holds pointers
        frame_ptr,
        &[index_const],
        location,
    ));
    let root_ptr_val = root_ptr.result(0).unwrap().into();
    
    // Store the value at the computed address
    block.append_operation(llvm::store(value, root_ptr_val, location));
}
}

Code Generation for Functions

Now we modify our function code generator to use the shadow stack:

#![allow(unused)]
fn main() {
// src/codegen/generator.rs (modified)
use std::collections::HashMap;

impl<'c> CodeGenerator<'c> {
    
    /// Compile a function with GC support
    fn compile_function(&self, func: &FunctionStmt, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = self.loc(func.location);
        
        // === STEP 1: Count roots ===
        // Roots = parameters + local variables
        let root_count = self.count_roots(func);
        
        // === STEP 2: Create function type ===
        let float_type = Type::float64(self.context);
        let param_types: Vec<Type> = func.params.iter().map(|_| float_type).collect();
        let function_type = FunctionType::new(self.context, &param_types, &[float_type]);
        
        // === STEP 3: Create function body ===
        let block = Block::new(
            &param_types.iter().map(|&t| (t, location)).collect::<Vec<_>>()
        );
        
        // === STEP 4: Push shadow stack frame ===
        let push_frame = block.append_operation(
            create_push_frame(self.context, root_count, location)
        );
        
        // The result is a pointer to the roots array
        let roots_ptr = push_frame.result(0).unwrap().into();
        
        // === STEP 5: Store parameters as roots ===
        for (i, param_name) in func.params.iter().enumerate() {
            let arg = block.argument(i).unwrap();
            
            // Store the parameter value in roots[i]
            // (In a real implementation, we'd handle this based on type)
            self.set_root(&block, i, arg.into(), location);
            
            // Also track in our local variables map
            // (Note: variables are passed as a parameter, not stored in self.
            //  See Part 1's "Why Separate Context from State?" section.)
            variables.insert(param_name.clone(), arg.into());
        }
        
        // === STEP 6: Compile function body ===
        // Note: we pass the block as a parameter (not stored in self),
        // following the block-ownership pattern from Part 1.
        // Note: In a complete compiler, each var declaration in the body
        // would consume the next root slot (starting from params.len()).
        // For now, the root_count passed to push_frame accounts for
        // parameters + all local variables, so the frame is large enough.
        
        for stmt in &func.body {
            self.compile_statement(stmt, &block, variables);
        }
        
        // === STEP 7: Add implicit return if needed ===
        // ...
        
        // === STEP 8: Pop shadow stack frame ===
        let pop_frame = create_pop_frame(self.context, location);
        block.append_operation(pop_frame);
        
        // === STEP 9: Create the function ===
        let region = Region::new();
        region.append_block(block);  // block is consumed here
        self.module.body().append_operation(func::func(
            self.context,
            StringAttribute::new(self.context, &func.name),
            TypeAttribute::new(function_type.into()),
            region,
            &[],
            location,
        ));
    }
    
    /// Count the total number of roots needed for a function
    fn count_roots(&self, func: &FunctionStmt) -> usize {
        // Parameters are roots
        let mut count = func.params.len();
        
        // Local variables are roots
        for stmt in &func.body {
            count += self.count_roots_in_stmt(stmt);
        }
        
        count
    }
    
    /// Count roots introduced by a statement
    fn count_roots_in_stmt(&self, stmt: &Stmt) -> usize {
        match stmt {
            Stmt::Var(v) => 1,  // Each var declaration is a root
            Stmt::Block(b) => b.statements.iter()
                .map(|s| self.count_roots_in_stmt(s))
                .sum(),
            Stmt::If(i) => {
                i.then_branch.iter().map(|s| self.count_roots_in_stmt(s)).sum::<usize>() +
                i.else_branch.iter().map(|s| self.count_roots_in_stmt(s)).sum::<usize>()
            }
            Stmt::While(w) => w.body.iter().map(|s| self.count_roots_in_stmt(s)).sum(),
            // Stmt::Function is handled as 0 here because closures
            // aren't in the numbers-only model yet. When Part 5 adds
            // closures, a Function declaration creates a closure value
            // that needs a root slot — the count becomes 1 for each
            // nested function.
            _ => 0,
        }
    }
    
    /// Set a root in the shadow stack
    fn set_root(&self, block: &Block<'c>, index: usize, value: Value<'c, 'c>, location: Location<'c>) {
        let set_root_op = create_set_root(self.context, index, value, location);
        block.append_operation(set_root_op);
    }
}
}

The Lowering Pass

We need a pass that converts lox.* operations to LLVM calls. This pass has two parts: a pass manager that runs conversions in the right order, and a custom Lox-to-LLVM pass that handles our dialect-specific operations.

There’s a design tension in Melior 0.27 that shows up here. The individual lowering functions we wrote earlier — lower_alloc, lower_push_frame, etc. — need two things: the Block to append new operations to, and the Context to create types and attributes. The lowering functions take both as parameters. But Operation::walk() gives us each operation in isolation. In Melior 0.27, there’s no op.parent_block() or op.context() — you can’t navigate from an operation back to its block or context inside the walk closure.

The fix is to walk blocks, not operations. Each Block knows its parent Region, and each Region knows its parent Operation. From a block, you have access to the Context (via the block’s parent chain), and you can iterate the block’s operations directly. This is the pattern that compiles and runs:

#![allow(unused)]
fn main() {
// src/codegen/lowering_pass.rs

use melior::{
    Context, Module, PassManager,
    pass::Pass,
};

// Standard Melior conversion passes — these come from melior::pass::conversion, not our local module.
use melior::pass::conversion::{
    create_scf_to_control_flow,
    create_control_flow_to_llvm,
    create_arith_to_llvm,
    create_func_to_llvm,
};

/// Create the lowering pass manager
pub fn create_lowering_pass_manager(context: &Context) -> PassManager {
    let pm = PassManager::new(context);
    
    // Lower Lox dialect to LLVM (our custom pass, defined below)
    pm.add_pass(convert_lox_to_llvm());
    
    // Lower standard dialects to LLVM (Melior built-in passes)
    pm.add_pass(create_scf_to_control_flow());
    pm.add_pass(create_control_flow_to_llvm());
    pm.add_pass(create_arith_to_llvm());
    pm.add_pass(create_func_to_llvm());
    
    pm
}

/// Our custom Lox-to-LLVM pass
mod pass {
    use super::*;
    
    pub fn convert_lox_to_llvm() -> Pass {
        Pass::from_info("lox-to-llvm", |module: &Module| {
            let context = module.context();
            
            // Walk each block in the module. Walking blocks (instead of
            // operations) gives us access to the Block and Context that the
            // lowering functions need. Each block is mutable — we can append
            // replacement operations directly.
            for region in module.as_operation().regions() {
                for mut block in region.blocks() {
                    let mut current_frame_ptr: Option<Value> = None;
                    let mut ops_to_process: Vec<Operation> = Vec::new();
                    
                    // Collect Lox operations first, then process them.
                    // We can't modify the block while iterating it, so we
                    // gather the operations first and handle them after.
                    for op in block.operations() {
                        let name = op.name().to_string();
                        if name.starts_with("lox.") {
                            ops_to_process.push(op.clone());
                        }
                    }
                    
                    for op in ops_to_process {
                        match op.name() {
                            "lox.alloc" => {
                                lower_alloc(&op, &mut block, &context);
                            }
                            "lox.push_frame" => {
                                lower_push_frame(&op, &mut block, &context);
                                // ⚠️ Simplification: op.result(0) returns the
                                // result of the *original* lox.push_frame operation,
                                // which lower_push_frame replaces with a func.call
                                // to @gc_push_frame. In a production compiler,
                                // current_frame_ptr should point to the *new*
                                // call's result — but since our gc_push_frame
                                // returns the frame pointer in the same way,
                                // the values are equivalent here.
                                current_frame_ptr = Some(op.result(0).unwrap().into());
                            }
                            "lox.pop_frame" => {
                                lower_pop_frame(&op, &mut block, &context);
                            }
                            "lox.set_root" => {
                                let frame_ptr = current_frame_ptr
                                    .expect("set_root without a preceding push_frame");
                                lower_set_root(&op, &mut block, &context, frame_ptr);
                            }
                            _ => {}
                        }
                    }
                }
            }
        })
    }
}
}

Linking Everything Together

Now we need to link the MLIR-generated code with our Rust runtime:

Step 1: Compile the Runtime

cd lox-mlir
cargo build --release

This produces target/release/liblox_runtime.a (static lib) and liblox_runtime.so (dynamic lib).

Step 2: Generate MLIR

#![allow(unused)]
fn main() {
// Compile Lox source to MLIR
let mlir = compile_to_mlir(source)?;
println!("{}", mlir);
}

Output:

module {
  func.func @example() -> f64 {
    %0 = lox.push_frame root_count = 3 : !llvm.ptr
    
    // Allocate a string object
    %1 = lox.alloc obj_type = 1, size = 5 : !llvm.ptr
    
    // Store as root 0
    lox.set_root index = 0, %1 : !llvm.ptr
    
    // ... function body ...
    
    lox.pop_frame
    func.return %result : f64
  }
}

Step 3: Lower to LLVM IR

# Lower MLIR to LLVM IR
mlir-translate output.mlir --mlir-to-llvmir -o output.ll

Step 4: Compile to Object File

# Compile LLVM IR to object file
llc output.ll -filetype=obj -o output.o

# Link with runtime
clang output.o -L./target/release -llox_runtime -o output

Step 5: Run

./output

Why We Root Numbers

You’ll notice something odd in the MLIR examples below: we register f64 values as GC roots, even though numbers aren’t heap objects. Why?

In the numbers-only model we’re using right now, every value is an f64 — no heap objects exist, and the GC has nothing to trace. So registering numbers as roots doesn’t do anything useful yet. But we do it anyway for two reasons:

The code generator doesn’t know which variables will hold heap objects. Right now, every variable is a number. But when we add strings, closures, and instances in Part 7, some variables will hold f64-sized heap pointers instead of numbers. The lox.set_root operation treats both identically — it stores the raw bits in the shadow stack. The GC will need to distinguish them later, but the code generator doesn’t need to.

It keeps the code generator simple now and correct later. If we skip rooting for number variables, we’d need to go back and add it for every variable type that might hold a pointer. That’s error-prone — miss one and the GC frees a live object. By rooting every variable uniformly, we get correctness by default. The cost is that the GC does an unnecessary type check on number roots at collection time. For a tutorial interpreter, that tradeoff is fine.

When Part 7 introduces the tagged union model, the GC walks the shadow stack and checks each root: is this a plain number (skip it) or a heap pointer (trace it)? The shadow stack doesn’t know in advance — it stores raw f64 bits — so the GC distinguishes them at runtime using the tagged union’s type tag. (Part 9 discusses an alternative that separates pointers from numbers at the type level, so the GC can skip roots that are provably not pointers.)

With that out of the way, let’s see the full pipeline in action.

Complete MLIR Example

Let’s trace through the complete compilation of a simple Lox function:

Input: Lox Source

fun add(a, b) {
    return a + b;
}

print add(1, 2);

Stage 1: MLIR (Lox Dialect)

module {
  // The 'add' function
  func.func @add(%arg0: f64, %arg1: f64) -> f64 
      attributes {sym_name = "add"} 
  {
    // Push shadow stack frame with 2 roots (for parameters)
    %frame = lox.push_frame root_count = 2 : !llvm.ptr
    
    // Register parameters as roots
    lox.set_root index = 0, %arg0 : f64
    lox.set_root index = 1, %arg1 : f64
    
    // Every Lox value is f64 — numbers and heap pointers alike.
    // The GC checks at runtime whether each root is a pointer or a plain
    // number. (See "Why We Root Numbers" above for the full explanation.)
    
    // The addition
    %sum = arith.addf %arg0, %arg1 : f64
    
    // Pop frame before return
    lox.pop_frame
    
    // Return
    func.return %sum : f64
  }
  
  // The main entry point
  func.func @main() -> i32 {
    %frame = lox.push_frame root_count = 0 : !llvm.ptr
    
    // Call add(1, 2)
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %result = func.call @add(%one, %two) : (f64, f64) -> f64
    
    // Print the result
    // (simplified - would call a print runtime function)
    
    lox.pop_frame
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

Stage 2: After Lowering (LLVM Dialect)

module {
  llvm.func @add(%arg0: f64, %arg1: f64) -> f64 {
    // Constants for root count and slot indices
    %c2_i64 = llvm.mlir.constant(2 : i64) : i64
    %c0_i64 = llvm.mlir.constant(0 : i64) : i64
    %c1_i64 = llvm.mlir.constant(1 : i64) : i64

    // gc_push_frame(2)
    %frame = llvm.call @gc_push_frame(%c2_i64) : (i64) -> !llvm.ptr
    
    // gc_set_root(0, arg0) - converted to bitcast + inttoptr + store
    // The frame is an opaque !llvm.ptr. We use getelementptr to compute
    // the address of each root slot. In opaque-pointer mode (MLIR 17+),
    // GEP carries the element type as a parameter — no bitcast needed
    // for the GEP itself. But the shadow stack stores pointers, and arg0
    // is an f64. LLVM's bitcast cannot convert f64 → ptr directly (it only
    // works between pointer types, or vectors of the same size). Instead,
    // we do a two-step conversion: bitcast f64 → i64 (reinterpret the bits
    // as an integer, which is valid since both are 8-byte first-class
    // types), then inttoptr i64 → !llvm.ptr (convert the integer to a
    // pointer). The bits are the same through both steps; the type changes.
    // (When the root value is already !llvm.ptr — a heap object, not a
    // number — no conversion is needed.)
    %root0_ptr = llvm.getelementptr %frame[%c0_i64] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.ptr
    %arg0_as_i64 = llvm.bitcast %arg0 : f64 to i64
    %arg0_as_ptr = llvm.inttoptr %arg0_as_i64 : i64 to !llvm.ptr
    llvm.store %arg0_as_ptr, %root0_ptr : !llvm.ptr, !llvm.ptr
    
    // gc_set_root(1, arg1)
    %root1_ptr = llvm.getelementptr %frame[%c1_i64] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.ptr
    %arg1_as_i64 = llvm.bitcast %arg1 : f64 to i64
    %arg1_as_ptr = llvm.inttoptr %arg1_as_i64 : i64 to !llvm.ptr
    llvm.store %arg1_as_ptr, %root1_ptr : !llvm.ptr, !llvm.ptr
    
    // Addition
    %sum = llvm.fadd %arg0, %arg1 : f64
    
    // gc_pop_frame()
    llvm.call @gc_pop_frame() : () -> ()
    
    llvm.return %sum : f64
  }
  
  llvm.func @main() -> i32 {
    // ... similar lowering ...
    llvm.return %c0_i32 : i32
  }
  
  // External declarations for runtime
  llvm.func @gc_push_frame(i64) -> !llvm.ptr
  llvm.func @gc_pop_frame() -> ()
  llvm.func @lox_runtime_alloc(i64, i8) -> !llvm.ptr
}

Stage 3: LLVM IR

define double @add(double %0, double %1) {
entry:
  ; Push shadow stack frame
  %frame = call ptr @gc_push_frame(i64 2)
  
  ; Store parameters as roots
  ; With opaque pointers (LLVM 15+), LLVM IR allows storing a double
  ; directly into a ptr-typed slot — the store instruction specifies
  ; the value type, and the 8-byte double fits in the 8-byte pointer
  ; slot. The MLIR LLVM dialect is stricter: it requires matching types
  ; for the store operand, so a two-step conversion is needed at the
  ; MLIR level (bitcast f64 → i64, then inttoptr i64 → !llvm.ptr).
  ; LLVM IR handles the type coercion implicitly in the store. The
  ; conversion steps from the MLIR stage vanish here because LLVM's
  ; opaque-pointer model (LLVM 15+) treats store as a typed operation
  ; — the value type is carried by the instruction, not the pointer
  ; — so the store doesn't need the pointer type to match the value
  ; type. This is the same underlying mechanism; the MLIR LLVM
  ; dialect doesn't relax the type rules as far as LLVM IR
  ; does.
  %root0_ptr = getelementptr ptr, ptr %frame, i64 0
  store double %0, ptr %root0_ptr
  
  %root1_ptr = getelementptr ptr, ptr %frame, i64 1
  store double %1, ptr %root1_ptr
  
  ; Addition
  %sum = fadd double %0, %1
  
  ; Pop frame
  call void @gc_pop_frame()
  
  ret double %sum
}

define i32 @main() {
entry:
  %result = call double @add(double 1.0, double 2.0)
  ; ... print result ...
  ret i32 0
}

; External runtime functions
declare ptr @gc_push_frame(i64)
declare void @gc_pop_frame()
declare ptr @lox_runtime_alloc(i64, i8)

Stage 4: Assembly (x86-64, simplified)

add:
    push   rbp
    mov    rbp, rsp
    
    ; gc_push_frame(2)
    mov    rdi, 2
    call   gc_push_frame
    ; gc_push_frame returns the frame pointer in rax
    ; (the call already placed it there — no move needed)
    
    ; Store roots (simplified)
    movsd  [rax], xmm0       ; store %0
    movsd  [rax + 8], xmm1   ; store %1
    
    ; Addition
    addsd  xmm0, xmm1        ; %sum = %0 + %1
    
    ; gc_pop_frame()
    call   gc_pop_frame
    ; NOTE: In a real compiler, gc_pop_frame is a call that may clobber
    ; xmm0 (a caller-saved register on x86-64). Since addsd already put the
    ; result in xmm0, the call would destroy it. The fix: either save the
    ; result to a callee-saved register before the call, or reorder the
    ; call before the addition. The reorder is safe here because addsd on
    ; f64 values can't trigger GC — no allocation occurs, so the roots
    ; in the popped frame are already past their last use. A production
    ; compiler's register allocator would handle this automatically.
    
    ; Return
    pop    rbp
    ret

main:
    ; ...
    movsd  xmm0, 1.0
    movsd  xmm1, 2.0
    call   add
    ; ...

Handling Different Object Types

Part 2 defined ObjType discriminants: String = 1, Environment = 2, Closure = 3. The lox.alloc operation takes an obj_type attribute matching one of these values. Let’s see how we generate code for different object types:

String Allocation

var s = "hello";

Generated MLIR:

// Allocate string object (ObjType.String = 1, size = 5)
%str = lox.alloc obj_type = 1, size = 5 : !llvm.ptr

// Initialize string data
// (would fill in length, hash, and characters)

// Store as root
lox.set_root index = 0, %str : !llvm.ptr

Closure Allocation

fun makeCounter() {
    var count = 0;
    fun counter() {
        count = count + 1;
        return count;
    }
    return counter;
}

Generated MLIR (simplified):

func.func @makeCounter() -> !llvm.ptr {
    // 1. Allocate environment for captured 'count'
    %env = lox.alloc obj_type = 2, size = 16 : !llvm.ptr
    // Environment layout: [enclosing, count_slots...]
    
    // 2. Initialize count = 0 in environment
    // (store at offset)
    
    // 3. Allocate closure
    %closure = lox.alloc obj_type = 3, size = 16 : !llvm.ptr
    // Closure layout: [function_index, environment_ptr]
    
    // 4. Link closure to environment
    // (store function index and env pointer)
    
    func.return %closure : !llvm.ptr
}

Practice Exercises

Exercise 1: Generate MLIR for a Function

Write the MLIR (Lox dialect) for this Lox function:

fun multiply(x, y) {
    var result = x * y;
    return result;
}

Click to reveal answer

func.func @multiply(%arg0: f64, %arg1: f64) -> f64 {
    // 3 roots: x, y, result
    %frame = lox.push_frame root_count = 3 : !llvm.ptr
    
    // Register parameters
    lox.set_root index = 0, %arg0 : f64
    lox.set_root index = 1, %arg1 : f64
    
    // (See "Why We Root Numbers" above for the full explanation.)
    
    // Compute x * y
    %product = arith.mulf %arg0, %arg1 : f64
    
    // Register 'result' as a root (it's a plain number in the
    // numbers-only model — no heap allocation needed, but we still
    // root it so the GC knows about it)
    lox.set_root index = 2, %product : f64
    
    lox.pop_frame
    func.return %product : f64
}

Note: In the numbers-only model, local variables that hold numbers don’t need heap allocation — we root the f64 value directly in the shadow stack. Object-typed variables (strings, closures) would need lox.alloc. (See “Why We Root Numbers” above for why this works.)

Exercise 2: Trace the Compilation Pipeline

Given this Lox code:

var x = 1;
var y = 2;
print x + y;

What does each stage produce?

Click to reveal answer

MLIR (Lox Dialect):

func.func @main() -> i32 {
    %frame = lox.push_frame root_count = 2 : !llvm.ptr
    
    // var x = 1
    %x_val = arith.constant 1.0 : f64
    lox.set_root index = 0, %x_val : f64
    
    // var y = 2
    %y_val = arith.constant 2.0 : f64
    lox.set_root index = 1, %y_val : f64
    
    // (See "Why We Root Numbers" above for the full explanation.)
    
    // x + y
    %sum = arith.addf %x_val, %y_val : f64
    
    // print (simplified)
    // call print_runtime(%sum)
    
    lox.pop_frame
    func.return %c0_i32 : i32
}

After Lowering: The lox.push_frame becomes func.call @gc_push_frame, etc.

LLVM IR: Standard LLVM with calls to runtime functions.

Assembly: Native x86-64 or ARM code.

Exercise 3: Why Separate Dialects?

Why do we have a separate lox dialect instead of generating LLVM IR directly?

Click to reveal answer

Abstraction Level: Lox dialect captures Lox semantics (allocation, GC roots) at a high level. We can optimize at this level before lowering.
Target Independence: MLIR can target WebAssembly, GPUs, or other backends. We don’t lock ourselves into LLVM.
Debugging: We can inspect the IR at each stage (Lox dialect → LLVM dialect → LLVM IR).
Custom Optimizations: We can write passes that understand Lox semantics (e.g., eliminate unnecessary allocations).
Incremental Lowering: We can do some optimizations in the Lox dialect, then lower to LLVM for target-specific work.

Next: Part 5 — Closures — Our GC can allocate and collect objects. Our code generator can emit MLIR for arithmetic and control flow. But there’s a Lox feature we’ve been ignoring: functions that capture variables from their enclosing scope. Closures need heap-allocated environments and an indirect call pattern — and they change everything about how the GC finds roots.

Keyboard shortcuts

MLIR for Lox: A Compiler Tutorial