MLIR for Lox

Building a compiler is hard. Building a compiler correctly is harder. You need to generate code, manage memory, handle types, support closures — and each of these problems interacts with the others.

MLIR is a compiler framework that lets you define multiple intermediate representations — each one simpler, each one independently verifiable. Instead of jumping straight from your source language to machine code, you lower through these representations. At every step, MLIR gives you tools to verify, optimize, and debug your compiler.

This tutorial series builds a Lox compiler using MLIR via the Melior crate (Rust bindings for MLIR). We go all the way from parsing to a working runtime with garbage collection, closures, and classes.

What You’ll Find Here

Part	Topic	The Hard Problem
Part 1	Setup, AST, parser, code generation	Getting MLIR to emit working code
Part 2	Garbage collection from scratch	Managing memory without a runtime
Part 3	Finding roots	Tracking live objects through the stack
Part 4	MLIR integration	Connecting GC to the generated code
Part 5	Closures	Capturing variables that outlive their scope
Part 6	Complete reference	Everything in one place
Part 7	Classes and instances	Object-oriented features on tagged unions
Part 8	What we built, what we skipped, and why	Knowing what to simplify — and being honest about it
Part 9	Standard library and runtime	The `print` and `clock` built-ins
Part 10	Error reporting and debugging	Source locations, runtime errors, diagnostics
Part 11	Cross-module linking	Connecting functions across multiple Lox files

Each part builds on the last.

Version Note

This tutorial uses melior 0.27 (LLVM 22). Melior’s Rust API changes significantly between versions — the same MLIR concepts work across all versions, but method signatures, type names, and module locations shift. If you’re on a different LLVM version, see the version table in Part 1 for the corresponding melior version.

Next: MLIR for Lox: Part 1 — From Lox Source to MLIR Dialect — We start from scratch: parse Lox source, build an AST, and emit it as an MLIR dialect. No prior MLIR experience required.

MLIR for Lox: Part 1 — From Lox Source to MLIR Dialect

Here’s the problem with compiling a dynamic language to LLVM: LLVM doesn’t know what a Lox value is. It knows about 64-bit floats, integers, and pointers. But a + b in Lox could be number addition, string concatenation, or a runtime type error — and you can’t express that in a single fadd instruction. You need to check the type tag, extract the payload, dispatch to the right operation, and re-box the result. In LLVM, that’s 15 instructions where you wanted one. And if you ever change how Lox values are represented — say, from a (i8, i64) struct to NaN-boxed floats — you rewrite every one of those 15-instruction sequences.

MLIR solves this by letting you define a dialect that represents Lox semantics directly. lox.add means “add two Lox values, with tag checking and dispatch” — a single operation that captures the semantics, not the implementation. Then you lower that dialect through progressively simpler representations until you reach LLVM IR. Each level is independently verifiable, and a change to the value representation only affects the lowering pass, not every operation that uses values. This is how modern language compilers work: Swift and Fortran (via Flang) use MLIR this way in production, and there are research efforts to bring MLIR to Julia as well — define your semantics, then lower them.

Our compiler takes a pragmatic approach: we use MLIR’s built-in arith and func dialects for the numbers-only model, and define custom lox.* operations only when the built-in dialects can’t express what we need (garbage collection, heap allocation, root tracking). This is the same tradeoff real compilers make — use standard dialects when you can, custom dialects when you must.

This guide builds a Lox compiler using Rust and the Melior crate. If you know Crafting Interpreters, this should feel familiar — with MLIR instead of tree-walk interpretation.

Why Rust and Melior? Memory safety without a garbage collector — important when you’re building one. Pattern matching for clean AST traversal. And no TableGen: Melior builds MLIR dialects directly in Rust, which means one fewer language to learn. These are the same reasons Rust compiler projects chose this stack.

Setup

Dependencies

# Cargo.toml
[package]
name = "lox-mlir"
version = "0.1.0"
edition = "2021"

[dependencies]
melior = "0.27"
anyhow = "1"
# anyhow provides flexible error handling — we use Result<T, anyhow::Error>
# throughout the codebase instead of String-based errors

Melior and LLVM Versions

Melior binds to a specific LLVM version through the mlir-sys crate. The version numbers line up directly: mlir-sys 220.x requires LLVM 22, 210.x requires LLVM 21, and so on. You can’t mix versions — mlir-sys 220.x won’t find an LLVM 20 installation no matter what environment variables you set.

melior	mlir-sys	LLVM required	`LLVM_SYS_*_PREFIX` variable
0.27	220.x	LLVM 22	`LLVM_SYS_220_PREFIX`
0.26	210.x	LLVM 21	`LLVM_SYS_210_PREFIX`
0.23–0.25	0.5.x	LLVM 20	`LLVM_SYS_200_PREFIX`
0.20–0.22	0.4.x	LLVM 19	`LLVM_SYS_190_PREFIX`
0.19	0.3.x	LLVM 18	`LLVM_SYS_180_PREFIX`

If you can’t install LLVM 22 on your system, pick the melior version that matches the LLVM you have. The MLIR operations and lowering passes are the same across versions — but the Melior Rust API changes significantly between versions. This tutorial’s code examples require melior 0.27.

Install MLIR

macOS

brew install llvm@22

# Add to your shell config (~/.zshrc or ~/.bashrc):
export LLVM_SYS_220_PREFIX=/opt/homebrew/opt/llvm@22
export PATH="/opt/homebrew/opt/llvm@22/bin:$PATH"

Linux (Ubuntu/Debian)

Ubuntu’s default repos ship LLVM 20 (on 24.04/24.10). LLVM 22 is available through the official LLVM apt repository at apt.llvm.org.

Option A: LLVM 22 via apt.llvm.org (recommended, matches melior 0.27)

# Use the official LLVM install script
wget https://apt.llvm.org/llvm.sh
chmod +x llvm.sh
sudo ./llvm.sh 22 all

# Install MLIR development headers and libraries
sudo apt install llvm-22-dev libmlir-22-dev clang-22 libpolly-22-dev libzstd-dev

# Add to your shell config (~/.bashrc or ~/.zshrc):
export LLVM_SYS_220_PREFIX=/usr/lib/llvm-22
export PATH=/usr/lib/llvm-22/bin:$PATH
export BINDGEN_EXTRA_CLANG_ARGS="-I/usr/lib/llvm-22/lib/clang/22/include"

That BINDGEN_EXTRA_CLANG_ARGS variable tells bindgen where to find Clang’s built-in headers. Without it, you’ll get compilation errors about missing stddef.h or stdarg.h when mlir-sys generates Rust bindings from C headers.

Option B: LLVM 20 from Ubuntu repos — not recommended for this tutorial

Ubuntu’s default repos ship LLVM 20, which works with melior 0.23–0.25. However, this tutorial’s code is written against the melior 0.27 API, and older melior versions have significant API differences (different method signatures, renamed types, moved modules). Downgrading produces dozens of compilation errors.

If you want to use LLVM 20 anyway — perhaps for a different project — here are the packages and variables you’d need:

sudo apt install llvm-20-dev libmlir-20-dev clang-20 libpolly-20-dev libzstd-dev

# In Cargo.toml, change to:
#   melior = "0.25"

export LLVM_SYS_200_PREFIX=/usr/lib/llvm-20
export PATH=/usr/lib/llvm-20/bin:$PATH
export BINDGEN_EXTRA_CLANG_ARGS="-I/usr/lib/llvm-20/lib/clang/20/include"

For this tutorial, use Option A (LLVM 22 via apt.llvm.org) instead.

Verify Your Setup

# Check that mlir-opt is available and reports the right version
mlir-opt --version
# Should print: LLVM 22.x.x (if using LLVM 22)

# Check that the environment variable is set
echo $LLVM_SYS_220_PREFIX
# Should print: /usr/lib/llvm-22  (or /opt/homebrew/opt/llvm@22 on macOS)

If mlir-opt --version reports a different LLVM than what your melior version expects, cargo build will fail with linking errors. Double-check that the right LLVM_SYS_*_PREFIX is set and that mlir-opt comes from the same LLVM installation.

Common Build Errors

could not find native static library mlir — You’re missing libmlir-XX-dev. Install it with sudo apt install libmlir-22-dev (replace 22 with your LLVM version).

fatal error: 'stddef.h' file not found during mlir-sys build — Set BINDGEN_EXTRA_CLANG_ARGS as shown above. This tells bindgen where Clang’s built-in headers live.

linking with cc failed: exit code: 1 with undefined MLIR symbols — Your LLVM_SYS_*_PREFIX doesn’t match the LLVM version your melior expects. Check the version table above and make sure they align.

The Lox AST

Hand-written Rust — not generated. This is exactly what you’d write following Crafting Interpreters. If you’ve read Crafting Interpreters, you can skim this section — the important parts are the LoxValue enum and the Expr/Stmt sum types. The details of each variant don’t matter until we compile them in the MLIR Code Generation section.

Design choice: Why separate BinaryExpr, UnaryExpr, etc. instead of an enum with inline variants? Because each variant needs named fields and location tracking. An enum with inline variants would require Binary(Box<Expr>, BinaryOp, Box<Expr>, Location) — and you’d have to remember which field is which. Named structs are self-documenting.

#![allow(unused)]
fn main() {
// src/ast.rs
/// Source location for error messages
#[derive(Debug, Clone, Copy)]
pub struct Location {
    pub line: usize,
    pub column: usize,
}

/// A Lox value (dynamically typed)
#[derive(Debug, Clone)]
pub enum LoxValue {
    Nil,
    Bool(bool),
    Number(f64),
    String(String), // Heap-allocated — fine for the AST, but a compiler
                    // generating MLIR would use interned strings instead.
                    // We'll deal with this below in "String Constants."
}

// ========================================================================
// Expressions
// ========================================================================

#[derive(Debug, Clone)]
pub enum Expr {
    Binary(BinaryExpr),
    Unary(UnaryExpr),
    Literal(LiteralExpr),
    Grouping(GroupingExpr),
    Variable(VariableExpr),
    Assign(AssignExpr),
    Call(CallExpr),
    Logical(LogicalExpr),
}

impl Expr {
    pub fn location(&self) -> Location {
        match self {
            Expr::Binary(e) => e.location,
            Expr::Unary(e) => e.location,
            Expr::Literal(e) => e.location,
            Expr::Grouping(e) => e.location,
            Expr::Variable(e) => e.location,
            Expr::Assign(e) => e.location,
            Expr::Call(e) => e.location,
            Expr::Logical(e) => e.location,
        }
    }
}

#[derive(Debug, Clone)]
pub struct BinaryExpr {
    pub location: Location,
    pub left: Box<Expr>,
    pub op: BinaryOp,
    pub right: Box<Expr>,
}

#[derive(Debug, Clone, Copy)]
pub enum BinaryOp {
    Add, Sub, Mul, Div,
    Less, LessEqual, Greater, GreaterEqual,
    Equal, NotEqual,
}

#[derive(Debug, Clone)]
pub struct UnaryExpr {
    pub location: Location,
    pub op: UnaryOp,
    pub right: Box<Expr>,
}

#[derive(Debug, Clone, Copy)]
pub enum UnaryOp {
    Negate, Not,
}

#[derive(Debug, Clone)]
pub struct LiteralExpr {
    pub location: Location,
    pub value: LoxValue,
}

#[derive(Debug, Clone)]
pub struct GroupingExpr {
    pub location: Location,
    pub expr: Box<Expr>,
}

#[derive(Debug, Clone)]
pub struct VariableExpr {
    pub location: Location,
    pub name: String,
}

#[derive(Debug, Clone)]
pub struct AssignExpr {
    pub location: Location,
    pub name: String,
    pub value: Box<Expr>,
}

#[derive(Debug, Clone)]
pub struct CallExpr {
    pub location: Location,
    pub callee: Box<Expr>,
    pub arguments: Vec<Expr>,
}

#[derive(Debug, Clone)]
pub struct LogicalExpr {
    pub location: Location,
    pub left: Box<Expr>,
    pub op: LogicalOp,
    pub right: Box<Expr>,
}

#[derive(Debug, Clone, Copy)]
pub enum LogicalOp {
    And, Or,
}

// ========================================================================
// Statements
// ========================================================================

#[derive(Debug, Clone)]
pub enum Stmt {
    Function(FunctionStmt),
    Return(ReturnStmt),
    Var(VarStmt),
    If(IfStmt),
    While(WhileStmt),
    Print(PrintStmt),
    Block(BlockStmt),
    Expression(ExpressionStmt),
}

#[derive(Debug, Clone)]
pub struct FunctionStmt {
    pub location: Location,  // Location of 'fun' keyword
    pub name: String,
    pub name_location: Location,  // Location of the function name
    pub params: Vec<String>,
    pub param_locations: Vec<Location>,  // Location of each parameter name
    pub body: Vec<Stmt>,
}

#[derive(Debug, Clone)]
pub struct ReturnStmt {
    pub location: Location,  // Location of 'return' keyword
    pub value: Option<Expr>,
}

#[derive(Debug, Clone)]
pub struct VarStmt {
    pub location: Location,  // Location of 'var' keyword
    pub name: String,
    pub name_location: Location,  // Location of the variable name
    pub init: Expr,
}

#[derive(Debug, Clone)]
pub struct IfStmt {
    pub location: Location,  // Location of 'if' keyword
    pub condition: Expr,
    pub then_branch: Vec<Stmt>,
    pub else_branch: Vec<Stmt>,
}

#[derive(Debug, Clone)]
pub struct WhileStmt {
    pub location: Location,  // Location of 'while' keyword
    pub condition: Expr,
    pub body: Vec<Stmt>,
}

#[derive(Debug, Clone)]
pub struct PrintStmt {
    pub location: Location,  // Location of 'print' keyword
    pub value: Expr,
}

#[derive(Debug, Clone)]
pub struct BlockStmt {
    pub location: Location,  // Location of opening '{'
    pub statements: Vec<Stmt>,
}

#[derive(Debug, Clone)]
pub struct ExpressionStmt {
    pub location: Location,  // Start of the expression
    pub expr: Expr,
}

// ========================================================================
// Program
// ========================================================================

/// A Lox program is a list of top-level statements
#[derive(Debug, Clone)]
pub struct Program {
    pub statements: Vec<Stmt>,
}

// ========================================================================
// Helper trait for getting locations
// ========================================================================

impl Stmt {
    /// Get the primary location of this statement
    pub fn location(&self) -> Location {
        match self {
            Stmt::Function(f) => f.location,
            Stmt::Return(r) => r.location,
            Stmt::Var(v) => v.location,
            Stmt::If(i) => i.location,
            Stmt::While(w) => w.location,
            Stmt::Print(p) => p.location,
            Stmt::Block(b) => b.location,
            Stmt::Expression(e) => e.location,
        }
    }
}
}

The Parser

The parser is a standard recursive-descent parser with precedence climbing. It follows Crafting Interpreters closely — if you’ve read that book, this will feel familiar. The scanner (lexer) is also straight out of Crafting Interpreters Chapter 4.

What matters for this tutorial is the output: the AST we defined above. The parser is how we get there, but it’s not what we’re here to study. Here’s the shape of the thing.

Tokens

The scanner turns source text into a Vec<Token>. Each token carries its type, the raw lexeme, an optional literal value, and a source location:

#![allow(unused)]
fn main() {
// src/lexer.rs
use anyhow::Result;
use crate::ast::Location;

#[derive(Debug, Clone)]
pub struct Token {
    pub token_type: TokenType,
    pub lexeme: String,
    pub literal: Option<LexValue>,
    pub location: Location,
}

#[derive(Debug, Clone, Copy, PartialEq)]
pub enum TokenType {
    // Single-character tokens
    LeftParen, RightParen, LeftBrace, RightBrace,
    Comma, Dot, Minus, Plus, Semicolon, Slash, Star,
    // One or two character tokens
    Bang, BangEqual, Equal, EqualEqual, Greater, GreaterEqual,
    Less, LessEqual,
    // Literals
    Identifier, String, Number,
    // Keywords
    And, Class, Else, False, True, Fun, For, If, Nil, Or,
    Print, Return, Super, This, Var, While,
    // Special
    Eof,
}

#[derive(Debug, Clone)]
pub enum LexValue {
    Boolean(bool),
    F64(f64),
    Str(String),
}
}

The scanner itself is ~300 lines of character-by-character processing. It’s not the focus of this tutorial — see Crafting Interpreters Chapter 4 for a complete implementation. The important thing is the output: a Vec<Token> that the parser consumes.

The Parser Structure

The parser walks the token stream and builds the AST. Two things to understand about its structure:

Statements are dispatched by keyword. fun → function declaration, var → variable declaration, print → print statement, if → if statement, and so on. Each statement parser consumes tokens until it has a complete Stmt node.

Expressions use precedence climbing. Each binary operator level has its own method — equality() handles ==/!=, comparison() handles >/>=/</<=, term() handles +/-, factor() handles *//. Each method parses its level, then calls the next-tighter level. The result is a left-associative tree: 1 + 2 * 3 parses as Binary(1, Add, Binary(2, Mul, 3)).

The precedence ladder, from loosest to tightest:

assignment  →  or  →  and  →  equality  →  comparison  →  term  →  factor  →  unary  →  primary

Here’s the skeleton of the parser — the parts that matter for understanding what the code generator receives:

#![allow(unused)]
fn main() {
// src/parser.rs
use crate::ast::*;
use crate::lexer::{Token, TokenType, LexValue};

#[derive(Debug)]
pub struct ParseError {
    pub message: String,
    pub location: Location,
}

pub struct Parser {
    tokens: Vec<Token>,
    current: usize,
}

impl Parser {
    pub fn new(tokens: Vec<Token>) -> Self {
        Self { tokens, current: 0 }
    }

    pub fn parse(&mut self) -> Result<Program, ParseError> {
        let mut statements = Vec::new();
        while !self.is_at_end() {
            statements.push(self.declaration()?);
        }
        Ok(Program { statements })
    }

    fn declaration(&mut self) -> Result<Stmt, ParseError> {
        if self.match_token(TokenType::Fun) {
            return self.function_declaration();
        }
        if self.match_token(TokenType::Var) {
            return self.var_declaration();
        }
        self.statement()
    }

    // ... statement parsers (if, while, print, return, block, expression)
    // Each one consumes tokens and produces the corresponding Stmt variant.
    // The pattern is the same: match the keyword, parse the sub-expressions,
    // build the Stmt node.

    fn expression(&mut self) -> Result<Expr, ParseError> {
        self.assignment()
    }

    // Each binary operator level follows the same pattern:
    //   1. Parse the next-tighter level
    //   2. While the current token is one of our operators, consume it,
    //      parse the right operand, and build a BinaryExpr node
    //
    // For example, term() handles + and -:
    fn term(&mut self) -> Result<Expr, ParseError> {
        let mut expr = self.factor()?;  // next-tighter level
        while self.match_any(&[TokenType::Plus, TokenType::Minus]) {
            let location = self.previous().location;
            let op = match self.previous().token_type {
                TokenType::Plus => BinaryOp::Add,
                TokenType::Minus => BinaryOp::Sub,
                _ => unreachable!(),
            };
            let right = self.factor()?;
            expr = Expr::Binary(BinaryExpr {
                location,
                left: Box::new(expr),
                op,
                right: Box::new(right),
            });
        }
        Ok(expr)
    }

    // ... equality, comparison, factor, unary, primary follow the same pattern
    // ... assignment and logical operators (or, and) handle right-associativity
    //     and short-circuit evaluation respectively

    // Helper methods: match_token, match_any, check, advance, peek, previous, consume
    // Standard recursive-descent utilities — see Crafting Interpreters Chapter 6
}
}

The full parser is about 500 lines of straightforward recursive-descent code. It’s not reproduced here in full because it’s not the focus — the code generator doesn’t care how the AST was built, only what shape it is. If you’re building along, follow Crafting Interpreters Chapter 6 and adapt the AST nodes to match the ones we defined above. The key difference from the book’s Java implementation: our AST carries Location fields on every node, because MLIR’s error diagnostics need source positions.

MLIR Code Generation with Melior

The core of the compiler — walks the AST and emits MLIR.

Scope note: In this part, we compile a subset of Lox that only supports numbers and arithmetic. This isn’t a limitation of MLIR — it’s a pedagogical choice. Dynamic typing with tagged unions adds 3-4x more code for every operation (check the tag, dispatch, unbox, compute, re-box). The “Dynamic Typing with Tagged Unions” subsection below explains the representation conceptually. Part 7 (Classes and Instances) shows the working implementation — every arith.addf becomes tag-check-extract-compute-repack. For now, every value is an f64.

What this means in practice:

All values are f64 — true is 1.0, false is 0.0, nil is 0.0

Arithmetic operations use arith.addf, arith.mulf, etc.

Comparisons use arith.cmpf

Logical operators use scf.if for short-circuit evaluation

lox.print calls a runtime function that prints a float

This is a simplification. A production Lox compiler would use tagged unions. But starting with “everything is a float” lets us focus on MLIR concepts without drowning in type-tag boilerplate.

How Melior’s Block Ownership Works

Before we write any codegen, there’s one thing you need to understand about Melior’s ownership model — it’s different from what you might expect.

In Melior, Block<'c> is an owned value. When you call region.append_block(block), the block is consumed — moved into the region. After that, you can’t use the Block anymore. Instead, append_block returns a BlockRef<'c, '_> — a borrowed reference to the block inside the region.

This means you can’t store a Block in a struct field, compile some operations into it, and then append it to a region later. The block gets moved when you append it, and the struct field is left empty.

Block lifecycle:

  Create              Append ops              Move into region
  ──────              ──────────              ─────────────────
  let block =         block.append_operation  region.append_block(block)
    Block::new(...);    (some_op);           // block is consumed — gone
                       block.append_operation  // you get a BlockRef back,
                         (another_op);        // but usually don't need it

  You own it.         Still own it.           Region owns it now.
  Add anything         No restrictions.        Can't modify the Block
  you want.                                    directly anymore.

  ❌ Can't do this:
  self.current_block = Some(block);   // store it
  region.append_block(block);          // move it — DOUBLE USE

  ✅ Instead: pass blocks as parameters.
  Each method takes &Block, emits operations,
  and returns. No struct fields, no double-moves.

The pattern that works: Create a Block, append operations to it (it’s not in any region yet, so you own it freely), then move it into a Region. After that, the Block is gone — you get a BlockRef back, but you usually don’t need it because the region is complete.

#![allow(unused)]
fn main() {
// ✅ Works: build the block, then move it into a region
let block = Block::new(&[(float_type, location)]);
block.append_operation(some_op);
block.append_operation(another_op);

let region = Region::new();
region.append_block(block);  // block is consumed here
}

For our code generator, this means we pass the current block as a parameter to compilation methods, rather than storing it in a struct. When a statement needs to create a new region (like scf.if), it builds the inner blocks inline, moves them into a region, and then creates the operation — all without touching the outer block.

We also don’t store the variable map in the struct — for a different reason. See the next section for why.

Why Separate Context from State?

You might notice something unusual about our CodeGenerator: it doesn’t have a variables field. Most compiler tutorials store the variable map in the code generator struct. We pass it as a separate &mut HashMap parameter instead.

Here’s why: in Melior 0.27, Value has two lifetime parameters — Value<'c, 'a>. The 'c is the context lifetime, and 'a is tied to the operation that produced the value. When you store Value<'c, 'a> in a struct and then call a method that takes &mut self, the returned Value keeps &mut self borrowed — and then the same method can’t access self.context or do anything else with self without conflicting with that borrow.

This isn’t a workaround. It’s a real consequence of how Rust’s borrow checker works with MLIR’s ownership model. The fix is to recognize that our code generator has two kinds of state:

Immutable context — the MLIR Context and Module. These never change during compilation.

Mutable compilation state — the variable map, which grows as we declare variables.

By putting immutable context in the struct (accessed via &self) and passing mutable state as a parameter (accessed via &mut HashMap), we make the borrow structure explicit. compile_expression takes &self because it only needs self.context. The variables HashMap gets passed separately because it needs mutable access.

This is actually cleaner than the alternative. It forces each method to declare exactly what it needs — and the borrow checker enforces that declaration.

Basic Code Generator

Version note: The code examples below use Melior 0.27’s API. Melior’s API changes frequently between minor versions — helper function availability and signatures differ between releases. In melior 0.27.0, the melior::dialect::arith module only exports constant, cmpf, cmpi, and select as Rust helper functions. There are no addf, subf, mulf, divf, or negf helpers in that module. The code below defines local helper functions that use OperationBuilder to construct arithmetic operations — the same approach melior itself uses for the operations it does export (see the source of arith::constant and arith::cmpf). The concepts (which operations to use, how regions work) are stable across versions — only the Rust API surface changes. If something doesn’t compile, check Melior’s docs for the exact API (replace latest with your version in the URL if needed).

ODS alternative: melior::dialect::ods::arith auto-generates helpers from MLIR’s TableGen definitions, including addi, andi, ori, addf, subf, mulf, divf, negf, and many more. These work, but the ODS module is marked experimental and its generated signatures may change between releases. If you prefer the ODS helpers, replace the local helpers with calls like ods::arith::addf(lhs, rhs, location) and add use melior::dialect::ods; to your imports. Note that andi (used in Part 10) is only available via this ODS path — dialect::arith doesn’t export it.

Version-resilient alternative: If you’re working with a different Melior version and the helper functions don’t compile, you can use OperationBuilder instead. This constructs the MLIR operation directly by name, bypassing the version-specific Rust wrappers entirely. We already use this approach for scf.if, scf.while, scf.condition, and scf.yield (which don’t have stable helpers across versions). The same pattern works for func.func and func.call:
#![allow(unused)]
fn main() {
// Instead of func::func(context, name_attr, type_attr, region, attrs, location):
let func_op = OperationBuilder::new("func.func", location)
    .add_attributes(&[
        (
            Identifier::new(context, "sym_name"),
            StringAttribute::new(context, &func.name).into(),
        ),
        (
            Identifier::new(context, "function_type"),
            TypeAttribute::new(function_type.into()).into(),
        ),
    ])
    .add_regions([region])
    .build()
    .expect("valid func.func");
module.body().append_operation(func_op);

// Instead of func::call(context, symbol_ref, args, result_types, location):
let call_op = OperationBuilder::new("func.call", location)
    .add_attributes(&[(
        Identifier::new(context, "callee"),
        FlatSymbolRefAttribute::new(context, "lox_print").into(),
    )])
    .add_operands(&args)
    .add_results(&result_types)
    .build()
    .expect("valid func.call");
block.append_operation(call_op);
}
The OperationBuilder approach is more verbose but works across Melior versions because it targets the stable MLIR operation names ("func.func", "func.call") rather than the version-specific Rust function signatures. Later parts of this tutorial use the helper functions (func::func, func::call) for readability, but if you hit a compilation error, substitute the OperationBuilder version above.

#![allow(unused)]
fn main() {
// src/codegen/mod.rs
mod generator;

pub use generator::generate_module;
}

#![allow(unused)]
fn main() {
// src/codegen/generator.rs
use crate::ast::*;
use melior::{
    Context,
    dialect::{arith, func, DialectRegistry},
    ir::{
        attribute::{StringAttribute, TypeAttribute, FloatAttribute, FlatSymbolRefAttribute},
        r#type::FunctionType,
        Location, Module, Region, Block, Type, Value, ValueLike,
        operation::{OperationBuilder, OperationLike},
        block::BlockLike, RegionLike,
    },
    utility::register_all_dialects,
};
use std::collections::HashMap;

// ========================================================================
// Arith helper functions
// ========================================================================
// Melior 0.27's arith module only provides helpers for `constant`, `cmpf`,
// `cmpi`, and `select`. The arithmetic operations (addf, subf, mulf, divf,
// negf) must be built with OperationBuilder. These helpers match the
// signature style of the built-in helpers for consistency.

/// Create an `arith.addf` operation.
fn arith_addf<'c>(
    lhs: Value<'c, 'c>,
    rhs: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.addf", location)
        .add_operands(&[lhs, rhs])
        .add_results(&[lhs.r#type()])
        .build()
        .expect("valid arith.addf")
}

/// Create an `arith.subf` operation.
fn arith_subf<'c>(
    lhs: Value<'c, 'c>,
    rhs: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.subf", location)
        .add_operands(&[lhs, rhs])
        .add_results(&[lhs.r#type()])
        .build()
        .expect("valid arith.subf")
}

/// Create an `arith.mulf` operation.
fn arith_mulf<'c>(
    lhs: Value<'c, 'c>,
    rhs: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.mulf", location)
        .add_operands(&[lhs, rhs])
        .add_results(&[lhs.r#type()])
        .build()
        .expect("valid arith.mulf")
}

/// Create an `arith.divf` operation.
fn arith_divf<'c>(
    lhs: Value<'c, 'c>,
    rhs: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.divf", location)
        .add_operands(&[lhs, rhs])
        .add_results(&[lhs.r#type()])
        .build()
        .expect("valid arith.divf")
}

/// Create an `arith.negf` operation.
fn arith_negf<'c>(
    operand: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.negf", location)
        .add_operands(&[operand])
        .add_results(&[operand.r#type()])
        .build()
        .expect("valid arith.negf")
}

/// State for code generation.
///
/// Note: we do NOT store the variables HashMap or the "current block"
/// as struct fields. Both are passed as parameters instead.
///
/// In Melior 0.27+, `Value` has two lifetime parameters (`'c` and `'a`).
/// Storing `Value<'c, 'a>` in a struct field and then calling methods that
/// take `&mut self` creates a borrow conflict — the returned `Value` keeps
/// `&mut self` alive, and then the same method can't access `self.context`
/// or other struct fields without conflicting with that borrow.
///
/// The fix: separate immutable context from mutable compilation state.
/// `compile_expression` only needs `&self` (for `self.context`). The
/// variables HashMap and current block are passed as separate parameters.
/// This is actually a cleaner design — it makes the borrow structure
/// explicit rather than hiding it in the struct.
pub struct CodeGenerator<'c> {
    context: &'c Context,
    module: Module<'c>,
}

impl<'c> CodeGenerator<'c> {
    pub fn new(context: &'c Context) -> Self {
        let location = Location::unknown(context);
        let module = Module::new(location);
        Self {
            context,
            module,
        }
    }

    // ========================================================================
    // Entry point
    // ========================================================================

    pub fn generate(self, program: &Program) -> Module<'c> {
        // Each function gets its own variable scope.
        // We don't store variables in the struct — see the note above.
        //
        // NOTE: We only compile top-level functions here. Top-level
        // statements (var, print, if, while, expression statements) are
        // handled by wrapping them in a synthetic @main function — see
        // the "Top-Level Statements" section below.
        for stmt in &program.statements {
            if let Stmt::Function(f) = stmt {
                self.compile_function(f);
            }
        }
        self.module
    }

    // A complete implementation would add a method like
    // `generate_main(program, ...)` that wraps top-level non-function
    // statements into a synthetic @main function. The walkthrough
    // examples below show the @main output you'd get. The pattern is
    // straightforward: create a func.func @main, then compile each
    // non-function statement into its body. We leave the implementation
    // as an exercise to keep the code generator focused on the core
    // compilation logic.

    // ========================================================================
    // Statement compilation
    // ========================================================================

    fn compile_statement(&self, stmt: &Stmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        match stmt {
            Stmt::Function(f) => self.compile_function(f),
            Stmt::Return(r) => self.compile_return(r, block, variables),
            Stmt::Var(v) => self.compile_var(v, block, variables),
            Stmt::If(i) => self.compile_if(i, block, variables),
            Stmt::While(w) => self.compile_while(w, block, variables),
            Stmt::Print(p) => self.compile_print(p, block, variables),
            Stmt::Block(b) => self.compile_block_stmt(b, block, variables),
            Stmt::Expression(e) => { self.compile_expression(&e.expr, block, variables); }
        }
    }

    fn compile_function(&self, func: &FunctionStmt) {
        let location = Location::unknown(self.context);
        let float_type = Type::float64(self.context);

        // Create parameter types (all f64 for now — dynamic typing)
        let param_types: Vec<Type> = func.params.iter().map(|_| float_type).collect();
        let return_type = float_type;

        // Create the function type
        let function_type = FunctionType::new(self.context, &param_types, &[return_type]);

        // Create the function body block (not yet in a region)
        let block = Block::new(
            &param_types.iter().map(|&t| (t, location)).collect::<Vec<_>>()
        );

        // Each function gets its own variable scope.
        // We don't store variables in the struct — see the note above.
        let mut variables: HashMap<String, Value<'c, 'c>> = HashMap::new();

        // Store parameters as variables
        for (i, param_name) in func.params.iter().enumerate() {
            let arg: Value = block.argument(i).unwrap().into();
            variables.insert(param_name.clone(), arg);
        }

        // Compile the function body into the block
        for stmt in &func.body {
            self.compile_statement(stmt, &block, &mut variables);
        }

        // Add implicit return nil ONLY if the last statement wasn't a return.
        // Without this guard, a function like `fun f() { return 1; }` would
        // have two return operations in the same block — MLIR rejects operations
        // after a terminator.
        let needs_implicit_return = func.body.last().map_or(true, |stmt| {
            !matches!(stmt, Stmt::Return(_))
        });
        if needs_implicit_return {
            let nil_value = self.compile_nil(&block);
            block.append_operation(func::r#return(&[nil_value], location));
        }

        // Move the block into a region
        let region = Region::new();
        region.append_block(block);

        // Add function to module
        self.module.body().append_operation(func::func(
            self.context,
            StringAttribute::new(self.context, &func.name),
            TypeAttribute::new(function_type.into()),
            region,
            &[],
            location,
        ));

        // Variables are dropped automatically — no need to clear
    }

    fn compile_return(&self, ret: &ReturnStmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = Location::unknown(self.context);
        let value = match &ret.value {
            Some(expr) => self.compile_expression(expr, block, variables),
            None => self.compile_nil(block),
        };

        block.append_operation(func::r#return(&[value], location));
    }

    fn compile_var(&self, var: &VarStmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let value = self.compile_expression(&var.init, block, variables);
        variables.insert(var.name.clone(), value);
    }

    fn compile_if(&self, if_stmt: &IfStmt, current_block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = Location::unknown(self.context);

        // Compile the condition into the current (outer) block
        let condition = self.compile_expression(&if_stmt.condition, current_block, variables);

        // Build then region: create a block, compile into it, move into region
        //
        // CAUTION: We pass the same `variables` HashMap to both branches.
        // Variable declarations in one branch leak into the other, but the
        // SSA value lives in the wrong block — using it produces invalid MLIR.
        // A production compiler would track scope depth or use scf.if with
        // results. For this tutorial, don't declare variables inside if-branches
        // and use them outside.
        let then_block = Block::new(&[]);
        for stmt in &if_stmt.then_branch {
            self.compile_statement(stmt, &then_block, variables);
        }
        let then_region = Region::new();
        then_region.append_block(then_block);

        // Build else region (even if empty — scf.if requires both regions).
        // No scf.yield needed here: MLIR's scf.if without result types
        // implicitly yields at the end of each region. (scf.if *with* result
        // types needs explicit scf.yield — see the logical operator
        // compilation below for an example.)
        let else_block = Block::new(&[]);
        for stmt in &if_stmt.else_branch {
            self.compile_statement(stmt, &else_block, variables);
        }
        let else_region = Region::new();
        else_region.append_block(else_block);

        // Create scf.if and append to the outer block
        let if_op = OperationBuilder::new("scf.if", location)
            .add_operands(&[condition])
            .add_regions([then_region, else_region])
            .build()
            .expect("valid scf.if operation");
        current_block.append_operation(if_op);
    }

    fn compile_while(&self, while_stmt: &WhileStmt, current_block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = Location::unknown(self.context);

        // For scf.while, we need:
        // 1. A "before" region that computes the condition and calls scf.condition
        // 2. An "after" region that is the loop body and calls scf.yield
        //
        // CAUTION: Same variable-scoping caveat as compile_if — we pass the same
        // `variables` HashMap to both the before and after regions. This causes
        // two problems:
        //
        // 1. Variable declarations inside the loop body leak into the enclosing
        //    scope, but the SSA value lives in the wrong block. Don't declare
        //    variables inside while-loops and use them outside.
        //
        // 2. Re-assigning a variable inside the loop (e.g., `i = i + 1`) updates
        //    the HashMap with a Value from the after region. On the next iteration,
        //    the before region tries to use that Value — but the after region
        //    doesn't dominate the before region. This produces invalid MLIR.
        //    The example in the "A Third Example: while and Back-Edges" section
        //    shows the fix: thread loop-modified variables as scf.while operands
        //    instead of tracking them through the HashMap.
        //
        // A production compiler would thread loop-carried values through
        // scf.while's operand list and scf.yield, which keeps the SSA values
        // in the correct dominance relationship.

        // Build the before region (condition check)
        let before_block = Block::new(&[]);
        let condition = self.compile_expression(&while_stmt.condition, &before_block, variables);

        // scf.condition takes the condition value and any loop-carried values
        let condition_op = OperationBuilder::new("scf.condition", location)
            .add_operands(&[condition])
            .build()
            .expect("valid scf.condition");
        before_block.append_operation(condition_op);

        let before_region = Region::new();
        before_region.append_block(before_block);

        // Build the after region (loop body)
        let after_block = Block::new(&[]);
        for stmt in &while_stmt.body {
            self.compile_statement(stmt, &after_block, variables);
        }
        // scf.while's after region must end with scf.yield
        let yield_op = OperationBuilder::new("scf.yield", location)
            .build()
            .expect("valid scf.yield");
        after_block.append_operation(yield_op);

        let after_region = Region::new();
        after_region.append_block(after_block);

        // Create scf.while and append to the outer block
        let while_op = OperationBuilder::new("scf.while", location)
            .add_regions([before_region, after_region])
            .build()
            .expect("valid scf.while");
        current_block.append_operation(while_op);
    }

    fn compile_print(&self, print: &PrintStmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = Location::unknown(self.context);
        let value = self.compile_expression(&print.value, block, variables);

        // Call a runtime print function: func.call @lox_print(value)
        let print_op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_print"),
            &[value],
            &[],
            location,
        );
        block.append_operation(print_op);
    }

    fn compile_block_stmt(&self, block_stmt: &BlockStmt, current_block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        for stmt in &block_stmt.statements {
            self.compile_statement(stmt, current_block, variables);
        }
    }

    // ========================================================================
    // Expression compilation
    // ========================================================================

    fn compile_expression(&self, expr: &Expr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        match expr {
            Expr::Binary(b) => self.compile_binary(b, block, variables),
            Expr::Unary(u) => self.compile_unary(u, block, variables),
            Expr::Literal(l) => self.compile_literal(l, block, variables),
            Expr::Grouping(g) => self.compile_expression(&g.expr, block, variables),
            Expr::Variable(v) => self.compile_variable(v, variables),
            Expr::Assign(a) => self.compile_assign(a, block, variables),
            Expr::Call(c) => self.compile_call(c, block, variables),
            Expr::Logical(l) => self.compile_logical(l, block, variables),
        }
    }

    fn compile_binary(&self, binary: &BinaryExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);

        let lhs = self.compile_expression(&binary.left, block, variables);
        let rhs = self.compile_expression(&binary.right, block, variables);

        let op = match binary.op {
            BinaryOp::Add => arith_addf(lhs, rhs, location),
            BinaryOp::Sub => arith_subf(lhs, rhs, location),
            BinaryOp::Mul => arith_mulf(lhs, rhs, location),
            BinaryOp::Div => arith_divf(lhs, rhs, location),
            BinaryOp::Less => arith::cmpf(self.context, arith::CmpfPredicate::Olt, lhs, rhs, location),
            BinaryOp::LessEqual => arith::cmpf(self.context, arith::CmpfPredicate::Ole, lhs, rhs, location),
            BinaryOp::Greater => arith::cmpf(self.context, arith::CmpfPredicate::Ogt, lhs, rhs, location),
            BinaryOp::GreaterEqual => arith::cmpf(self.context, arith::CmpfPredicate::Oge, lhs, rhs, location),
            BinaryOp::Equal => arith::cmpf(self.context, arith::CmpfPredicate::Oeq, lhs, rhs, location),
            BinaryOp::NotEqual => arith::cmpf(self.context, arith::CmpfPredicate::Une, lhs, rhs, location),
        };

        let result_ref = block.append_operation(op);
        result_ref.result(0).unwrap().into()
    }

    fn compile_unary(&self, unary: &UnaryExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);

        match unary.op {
            UnaryOp::Negate => {
                let operand = self.compile_expression(&unary.right, block, variables);
                let op = arith_negf(operand, location);
                let result = block.append_operation(op);
                result.result(0).unwrap().into()
            }
            UnaryOp::Not => {
                // In our "numbers only" model, `not x` is:
                //   if x == 0.0 { 1.0 } else { 0.0 }
                // We use scf.if since there's no direct float-to-boolean negation.
                //
                // The generated MLIR looks like this:
                //   %zero = arith.constant 0.0 : f64
                //   %is_zero = arith.cmpf oeq, %x, %zero : f64
                //   %result = scf.if %is_zero -> (f64) {
                //     %one = arith.constant 1.0 : f64
                //     scf.yield %one : f64
                //   } else {
                //     %zero2 = arith.constant 0.0 : f64
                //     scf.yield %zero2 : f64
                //   }
                //
                // `not 0.0` → 1.0 (falsy becomes truthy)
                // `not 5.0` → 0.0 (truthy becomes falsy)

                let operand = self.compile_expression(&unary.right, block, variables);

                // Compare operand to 0.0
                let zero_val = block.append_operation(arith::constant(
                    self.context,
                    FloatAttribute::new(self.context, Type::float64(self.context), 0.0).into(),
                    location,
                ));
                let is_zero = block.append_operation(arith::cmpf(
                    self.context,
                    arith::CmpfPredicate::Oeq,
                    operand,
                    zero_val.result(0).unwrap().into(),
                    location,
                ));

                // Build then region (x == 0.0 → result = 1.0)
                let then_region = {
                    let then_block = Block::new(&[]);
                    let one_val = then_block.append_operation(arith::constant(
                        self.context,
                        FloatAttribute::new(self.context, Type::float64(self.context), 1.0).into(),
                        location,
                    ));
                    then_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[one_val.result(0).unwrap().into()])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(then_block);
                    r
                };

                // Build else region (x != 0.0 → result = 0.0)
                let else_region = {
                    let else_block = Block::new(&[]);
                    let zero_val = else_block.append_operation(arith::constant(
                        self.context,
                        FloatAttribute::new(self.context, Type::float64(self.context), 0.0).into(),
                        location,
                    ));
                    else_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[zero_val.result(0).unwrap().into()])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(else_block);
                    r
                };

                // scf.if(is_zero) { then } { else } -> f64
                let if_op = OperationBuilder::new("scf.if", location)
                    .add_operands(&[is_zero.result(0).unwrap().into()])
                    .add_results(&[Type::float64(self.context)])
                    .add_regions([then_region, else_region])
                    .build()
                    .expect("valid scf.if");
                let result = block.append_operation(if_op);
                result.result(0).unwrap().into()
            }
        }
    }

    fn compile_literal(&self, literal: &LiteralExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);

        match &literal.value {
            LoxValue::Nil => self.compile_nil(block),
            LoxValue::Bool(b) => {
                // In our "numbers only" model, booleans are 1.0 and 0.0
                let op = arith::constant(
                    self.context,
                    FloatAttribute::new(self.context, Type::float64(self.context), if *b { 1.0 } else { 0.0 }).into(),
                    location,
                );
                let result = block.append_operation(op);
                result.result(0).unwrap().into()
            }
            LoxValue::Number(n) => {
                let op = arith::constant(
                    self.context,
                    FloatAttribute::new(self.context, Type::float64(self.context), *n).into(),
                    location,
                );
                let result = block.append_operation(op);
                result.result(0).unwrap().into()
            }
            LoxValue::String(_) => {
                // String constants can't be represented as a single f64 — they
                // need the tagged union layout (tag + data). Our values are still
                // f64, so we compile strings to nil for now. This means `print "hello"`
                // prints `0` in the numbers-only model, which is wrong but honest —
                // the tagged union model in Part 7 fixes this.
                //
                // The `create_string_constant` function below builds the
                // `llvm.mlir.global constant` operation, but it only makes
                // sense once we switch to the tagged union representation.
                self.compile_nil(block)
            }
        }
    }

    fn compile_nil(&self, block: &Block<'c>) -> Value<'c, 'c> {
        // In our "numbers only" subset, nil is represented as 0.0 f64.
        // This is consistent with the simplified typing model.
        // The tagged union model in Part 7 uses (TAG_NIL, 0) instead,
        // which distinguishes nil from false and 0.0.
        let location = Location::unknown(self.context);
        let op = arith::constant(
            self.context,
            FloatAttribute::new(self.context, Type::float64(self.context), 0.0).into(),
            location,
        );
        let result = block.append_operation(op);
        result.result(0).unwrap().into()
    }

    fn compile_variable(&self, var: &VariableExpr, variables: &HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        // Look up the variable in the current scope
        variables.get(&var.name)
            .copied()
            .unwrap_or_else(|| panic!("Undefined variable: {}", var.name))
    }

    fn compile_assign(&self, assign: &AssignExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let value = self.compile_expression(&assign.value, block, variables);
        variables.insert(assign.name.clone(), value);
        value
    }

    fn compile_call(&self, call: &CallExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);

        // Compile arguments
        let args: Vec<Value> = call.arguments.iter()
            .map(|arg| self.compile_expression(arg, block, variables))
            .collect();

        // For now, assume callee is a direct function call
        if let Expr::Variable(var) = call.callee.as_ref() {
            let call_op = func::call(
                self.context,
                FlatSymbolRefAttribute::new(self.context, &var.name),
                &args,
                &[Type::float64(self.context)],
                location,
            );

            let result_ref = block.append_operation(call_op);
            return result_ref.result(0).unwrap().into();
        }

        // Indirect call (first-class function) - not implemented
        unimplemented!("Indirect function calls not yet supported")
    }
}

Logical Operators: Why `scf.if` Is Mandatory

Lox’s and and or are different from most languages. They don’t return a boolean — they return one of their operands:

5 and 3 returns 3 (left is truthy, so return the right)
nil and 3 returns nil (left is falsy, so return the left)
5 or 3 returns 5 (left is truthy, so return the left)
nil or 3 returns 3 (left is falsy, so return the right)

We can’t use bitwise operations. arith.andi and arith.ori operate on bits, not control flow. They’d evaluate both sides unconditionally — no short-circuiting. In Lox, nil and crash() must never call crash() because nil is falsy.

The result comes from a branch, not a computation. a and b returns either a or b — the actual value that flowed through the branch. That’s what scf.if with results gives us: each branch yields a value, and the scf.if produces whichever one was taken.

The translation pattern:

a and b → scf.if a { yield b } else { yield a }
a or b → scf.if a { yield a } else { yield b }

In the numbers-only model, yielding a when it’s falsy is the same as yielding 0.0 — but we yield a directly because it’s semantically correct and avoids creating a redundant constant. (The tagged union model in Part 7 uses the same pattern with (tag, payload) pairs instead of bare f64 values.)

#![allow(unused)]
fn main() {
    fn compile_logical(&self, logical: &LogicalExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);

        let left = self.compile_expression(&logical.left, block, variables);

        // Convert left to i1 for the condition (nonzero = true)
        let zero_val = block.append_operation(arith::constant(
            self.context,
            FloatAttribute::new(self.context, Type::float64(self.context), 0.0).into(),
            location,
        ));
        let cond = block.append_operation(arith::cmpf(
            self.context,
            arith::CmpfPredicate::One,  // ordered not-equal (nonzero = true)
            left,
            zero_val.result(0).unwrap().into(),
            location,
        ));

        match logical.op {
            LogicalOp::And => {
                // `a and b` → if a { yield b } else { yield a }

                // Then region: evaluate right (left is truthy)
                let then_region = {
                    let then_block = Block::new(&[]);
                    let right = self.compile_expression(&logical.right, &then_block, variables);
                    then_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[right])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(then_block);
                    r
                };

                // Else region: return left (it's falsy, so pass it through)
                let else_region = {
                    let else_block = Block::new(&[]);
                    // `left` is defined in the enclosing scope, so it's
                    // visible inside the scf.if region. We yield it directly
                    // instead of creating a constant 0.0.
                    else_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[left])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(else_block);
                    r
                };

                let if_op = OperationBuilder::new("scf.if", location)
                    .add_operands(&[cond.result(0).unwrap().into()])
                    .add_results(&[Type::float64(self.context)])
                    .add_regions([then_region, else_region])
                    .build()
                    .expect("valid scf.if");
                let result = block.append_operation(if_op);
                result.result(0).unwrap().into()
            }
            LogicalOp::Or => {
                // `a or b` → if a { yield a } else { yield b }

                // Then region: return left (it's truthy, so pass it through)
                let then_region = {
                    let then_block = Block::new(&[]);
                    // `left` is defined in the enclosing scope, so it's
                    // visible inside the scf.if region. We yield it directly
                    // instead of creating a constant 1.0.
                    then_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[left])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(then_block);
                    r
                };

                // Else region: evaluate right
                let else_region = {
                    let else_block = Block::new(&[]);
                    let right = self.compile_expression(&logical.right, &else_block, variables);
                    else_block.append_operation(
                        OperationBuilder::new("scf.yield", location)
                            .add_operands(&[right])
                            .build()
                            .expect("valid scf.yield")
                    );
                    let r = Region::new();
                    r.append_block(else_block);
                    r
                };

                let if_op = OperationBuilder::new("scf.if", location)
                    .add_operands(&[cond.result(0).unwrap().into()])
                    .add_results(&[Type::float64(self.context)])
                    .add_regions([then_region, else_region])
                    .build()
                    .expect("valid scf.if");
                let result = block.append_operation(if_op);
                result.result(0).unwrap().into()
            }
        }
    }
}

/// Main entry point for code generation
pub fn generate_module<'c>(context: &'c Context, program: &Program) -> Module<'c> {
    let generator = CodeGenerator::new(context);
    generator.generate(program)
}
}

Why `left` Is Visible Inside `scf.if` Regions

The compile_logical code uses left — a value computed in the outer block — inside the then and else regions of an scf.if. That might look wrong: we created a new Block for each region, so how can a value from a different block be visible?

It’s not wrong — it’s how MLIR’s dominance rules work. In MLIR, a value is dominant over any region nested inside the operation that uses it. left was added to the outer block before the scf.if was created. The scf.if operation is inside that outer block. Any region inside the scf.if can see values from the outer block because the outer block dominates them — control flow must pass through the outer block to reach the inner region.

This is the same rule that makes outer variable scoping work in most programming languages. A variable declared before an if statement is visible inside both branches — you can’t reach the branch without passing the declaration. MLIR formalizes this as dominance, and the verifier enforces it: if you try to use a value that isn’t dominant (say, a value defined in the then region inside the else region), the verifier rejects the IR.

In Melior’s API, this shows up as a Value<'c, 'c> being passable to operations in any nested region. The two lifetime parameters on Value<'c, 'a> are 'c (the context) and 'a (the parent operation). These are separate concerns: 'c ties the value to the Context that owns the entire IR, while 'a ties it to its defining operation. In our code, both lifetimes resolve to 'c because the context outlives both the value and any region that uses it — Rust’s borrow checker sees a single shared lifetime and allows the reference. The actual dominance check happens later, at verification time — Melior doesn’t enforce it at Rust compile time.

String Constants (No Allocation Needed!)

String literals are constants, not heap allocations. They live in the binary’s data section, the same way string constants work in C.

Note: Melior 0.27 doesn’t provide a direct helper for llvm.mlir.global constant. You’d build it with OperationBuilder. Here’s the conceptual approach — the exact API details may vary with your MLIR version.

#![allow(unused)]
fn main() {
// src/codegen/strings.rs
use melior::{
    Context, Location,
    ir::{
        attribute::{StringAttribute, TypeAttribute, UnitAttribute}, Identifier, Type, Module,
        operation::OperationBuilder,
    },
};

/// Create a global string constant using OperationBuilder.
///
/// This creates something like:
///   llvm.mlir.global constant @str_0("hello")
///
/// No heap allocation — the string lives in the data section.
pub fn create_string_constant(
    module: &Module,
    context: &Context,
    name: &str,        // e.g., "str_0"
    value: &str,
    location: Location,
) {
    let array_type = Type::parse(
        context,
        &format!("!llvm.array<{} x i8>", value.len()),
    ).expect("valid array type");

    let op = OperationBuilder::new("llvm.mlir.global", location)
        .add_attributes(&[
            (Identifier::new(context, "sym_name"), StringAttribute::new(context, name).into()),
            (Identifier::new(context, "type"), TypeAttribute::new(array_type.into()).into()),
            (Identifier::new(context, "constant"), UnitAttribute::new(context).into()),
            (Identifier::new(context, "value"), StringAttribute::new(context, value).into()),
        ])
        .build()
        .expect("valid llvm.mlir.global");

    module.body().append_operation(op);
}
}

The constant attribute on llvm.mlir.global is a unit attribute — it has no value. Its presence means “constant” and its absence means “mutable.” We use UnitAttribute::new(context), not StringAttribute::new(context, "true").

The value attribute uses StringAttribute with the string content. MLIR accepts a StringAttr directly for !llvm.array<N x i8> globals — the bytes are embedded in the data section. For non-string globals (integers, structs), you’d use a DenseElementsAttr or an IntegerAttr instead.

Why This Works

Approach	Memory Location	Allocation?
`llvm.mlir.global constant`	Data section	No (static)
Heap allocation (malloc)	Heap	Yes (runtime)
Stack allocation (alloca)	Stack	No, but per-call

String literals are static data — they exist for the lifetime of the program, embedded in the binary. No runtime cost.

Dynamic Typing with Tagged Unions

Lox is dynamically typed, so a function parameter can receive any type:

fun printValue(x) {
  print x;  // x could be number, string, bool, nil, or object
}

We need a tagged union type:

#![allow(unused)]
fn main() {
// src/codegen/types.rs
use melior::ir::Type;
use melior::Context;

/// A Lox value is a tagged union: struct { tag: i8, data: i64 }
pub fn lox_value_type(context: &Context) -> Type {
    Type::parse(context, "!llvm.struct<(i8, i64)>").unwrap()
}

/// Tag values for each Lox type
pub const TAG_NIL: i8 = 0;
pub const TAG_BOOL: i8 = 1;
pub const TAG_NUMBER: i8 = 2;
pub const TAG_STRING: i8 = 3;
pub const TAG_CLOSURE: i8 = 4;    // matches compiled value tags from Part 7
pub const TAG_INSTANCE: i8 = 5;   // matches compiled value tags from Part 7
}

These are not the same as the GC’s ObjType discriminants. The TAG_* constants number the types that appear in compiled values — TAG_NIL and TAG_BOOL are here because they’re Lox value types, even though they’re not heap objects. The ObjType enum in Part 2 (Number = 0, String = 1, etc.) numbers the types for the GC’s trace_object dispatch — it includes Environment (a GC-internal type) and doesn’t include Nil or Bool. Same concepts, different numbering, different purpose. The runtime’s from_raw/to_raw functions (Part 9) translate between them.

Why we’re not using this yet: The codegen above uses all-f64 values to keep the examples focused on MLIR concepts. Tagged unions require tag-checking before every operation, which adds significant boilerplate. A production Lox compiler would use lox_value_type() for every value and insert tag checks before arithmetic. The “numbers only” model is a stepping stone — once you understand the codegen patterns, adding tagged unions is a mechanical extension.

What it looks like in practice: Part 9 (Standard Library and Runtime) shows the tagged-union MLIR for 1 + 2 — every arith.addf becomes “check both tags are TAG_NUMBER → extract the f64 payloads → add → re-pack as (TAG_NUMBER, result).” Same core operation, more wrapping. Part 7 (Classes and Instances) introduces the representation; Part 9 shows the runtime that uses it.

Source Locations in MLIR

Every operation in MLIR has a source location. Unlike LLVM where debug info is optional, in MLIR locations are core to the IR.

The Location API

MLIR operations carry source locations for error messages and debug output. Melior provides several ways to create them:

#![allow(unused)]
fn main() {
// src/location.rs
use melior::ir::Location;
use melior::Context;

pub fn demonstrate_locations(context: &Context) {
    // Unknown location — for generated code with no source mapping
    let unknown = Location::unknown(context);
    
    // File location — specific file, line, column
    // Note: the exact Location::new signature varies between Melior versions.
    // In some versions, file locations use Location::file() or a builder.
    // If Location::new(context, filename, line, col) doesn't compile, try:
    //   Location::new(context, filename)        // newer versions: filename only
    //   Location::file(context, filename, line, col)  // some versions
    // Or use a name location as a fallback:
    //   Location::name(context, "test.lox:10:5", unknown)
    let file_loc = Location::new(context, "test.lox", 10, 5);
    
    // Name location — for generated code, use a descriptive name
    let name_loc = Location::name(context, "implicit_return", unknown);
    
    // Callsite location — links a callee's definition to its call site
    // Note: Location::call_site may not exist in all Melior versions.
    // Some versions use Location::callsite (no underscore), fused locations,
    // or you can fall back to using the file location directly.
    // If this doesn't compile, replace with: let callsite_loc = file_loc;
    let callsite_loc = Location::call_site(file_loc, unknown);
}
}

Updated Code Generator with Proper Locations

#![allow(unused)]
fn main() {
pub struct CodeGenerator<'c> {
    context: &'c Context,
    module: Module<'c>,
    filename: String,
    // Note: variables are NOT stored in the struct.
    // See the "Why Separate Context from State?" section for details.
}

impl<'c> CodeGenerator<'c> {
    pub fn new(context: &'c Context, filename: &str) -> Self {
        // Note: Location::new(context, filename, 1, 1) may not compile in all
        // Melior versions. See the "Version-resilient alternative" note above.
        // If it doesn't compile, try Location::new(context, filename) or
        // Location::unknown(context) as a fallback.
        let location = Location::new(context, filename, 1, 1);
        let module = Module::new(location);
        Self { 
            context, 
            module, 
            filename: filename.to_string(),
        }
    }

    /// Convert an AST location to an MLIR location
    /// Note: same version caveat as CodeGenerator::new — if Location::new
    /// doesn't compile with 4 arguments, try Location::new(context, filename)
    /// or fall back to Location::name(context, "file:line:col", unknown).
    fn loc(&self, ast_loc: crate::ast::Location) -> Location<'c> {
        Location::new(self.context, &self.filename, ast_loc.line, ast_loc.column)
    }

    /// Get a location for generated/implicit code
    fn generated_loc(&self, description: &str) -> Location<'c> {
        Location::name(self.context, description, Location::unknown(self.context))
    }
}
}

What the IR Looks Like With Locations

Before (using Location::unknown):

module {
  func.func @add(%arg0: f64, %arg1: f64) -> f64 {
    %0 = arith.addf %arg0, %arg1 : f64
    func.return %0 : f64
  }
}

After (with proper locations, shown with -mlir-print-debuginfo):

module {
  func.func @add(%arg0: f64, %arg1: f64) -> f64 
      loc("test.lox":1:1) 
  {
    %0 = arith.addf %arg0, %arg1 : f64 
        loc("test.lox":2:14)
    func.return %0 : f64 loc("test.lox":2:3)
  } loc("test.lox":1:1)
} loc("test.lox":1:1)

A Complete Example

Here’s a minimal working example that creates an add function using Melior’s API. This follows the same pattern as our code generator: build the block, append operations, then move the block into a region.

Note: This example defines the arith_addf helper function locally because Melior 0.27’s arith module doesn’t provide an addf helper. The same helper is defined in the code generator above — if you’re building incrementally, you already have it. If you’re building this example standalone, the helper is included below.

// examples/simple_add.rs
use melior::{
    Context,
    dialect::{func, DialectRegistry},
    ir::{
        attribute::{StringAttribute, TypeAttribute},
        r#type::FunctionType,
        Location, Module, Region, Block, Type, Value, ValueLike,
        operation::{OperationBuilder, OperationLike},
        block::BlockLike, RegionLike,
    },
    utility::register_all_dialects,
};

/// Create an `arith.addf` operation.
/// We define this locally because Melior 0.27's `arith` module
/// doesn't provide an `addf` helper — only `constant`, `cmpf`,
/// `cmpi`, and `select`. See the code generator above for the
/// full set of arithmetic helpers.
fn arith_addf<'c>(
    lhs: Value<'c, 'c>,
    rhs: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new("arith.addf", location)
        .add_operands(&[lhs, rhs])
        .add_results(&[lhs.r#type()])
        .build()
        .expect("valid arith.addf")
}

fn main() {
    let registry = DialectRegistry::new();
    register_all_dialects(&registry);
    
    let context = Context::new();
    context.append_dialect_registry(&registry);
    context.load_all_available_dialects();
    
    let location = Location::unknown(&context);
    let module = Module::new(location);
    
    // Create function type: (f64, f64) -> f64
    let float_type = Type::float64(&context);
    let function_type = FunctionType::new(&context, &[float_type, float_type], &[float_type]);
    
    // Create the function body block (not yet in a region)
    let block = Block::new(&[
        (float_type, location),
        (float_type, location),
    ]);
    
    // %sum = arith.addf %arg0, %arg1 : f64
    let sum = block.append_operation(arith_addf(
        block.argument(0).unwrap().into(),
        block.argument(1).unwrap().into(),
        location,
    ));
    
    // return %sum : f64
    block.append_operation(func::r#return(
        &[sum.result(0).unwrap().into()],
        location,
    ));
    
    // Move block into region
    let region = Region::new();
    region.append_block(block);
    
    // Create the function
    module.body().append_operation(func::func(
        &context,
        StringAttribute::new(&context, "add"),
        TypeAttribute::new(function_type.into()),
        region,
        &[],
        location,
    ));
    
    // Verify and print
    // Note: verify() requires the OperationLike trait to be in scope.
    // It returns bool in Melior 0.27 — use assert! instead of .expect().
    assert!(module.as_operation().verify(), "module verification failed");
    println!("{}", module.as_operation());
}

Output:

module {
  func.func @add(%arg0: f64, %arg1: f64) -> f64 {
    %0 = arith.addf %arg0, %arg1 : f64
    func.return %0 : f64
  }
}

Lowering to LLVM IR

You’ve built a code generator that produces MLIR in the arith, func, and scf dialects. That IR is correct but it can’t run on hardware — those dialects are abstractions, not machine instructions. To produce an executable, you need to lower the IR through a pipeline of transformation passes, each one replacing higher-level operations with lower-level ones until you reach LLVM IR (which maps directly to machine code).

The Lowering Pipeline

MLIR lowering happens in stages. Each stage converts one dialect to a lower-level dialect. The order matters — you can’t lower scf.if to LLVM until you’ve first converted it to cf.cond_br (control flow in the cf dialect), because scf operations don’t have LLVM equivalents.

Here’s the pipeline for our Lox compiler:

Lox source
    ↓  (parse + codegen)
MLIR: arith + func + scf    ← What our code generator produces
    ↓  (scf-to-cf)
MLIR: arith + func + cf      ← Structured control flow → branches
    ↓  (arith-to-llvm)
MLIR: func + cf + llvm        ← Arithmetic ops → LLVM integer/float ops
    ↓  (func-to-llvm + cf-to-llvm)
MLIR: llvm only               ← Functions → LLVM functions, branches → LLVM br/cond_br
    ↓  (mlir-translate)
LLVM IR                       ← Textual LLVM IR (.ll)
    ↓  (clang / llc)
Machine code                  ← Native executable

Each arrow is a pass. Each pass converts one dialect’s operations into operations from a lower dialect. The key insight: lowering is compositional. You don’t need a “Lox to LLVM” pass. You need a series of small passes, each handling one dialect.

A Concrete Example: `var x = 1 + 2; print x;`

Here’s what this Lox program looks like at each stage of the pipeline. We’re using the numbers-only model — every value is an f64, and print takes a single float argument.

After code generation (arith + func):

module {
  func.func private @lox_print(f64)

  func.func @main() -> f64 {
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %sum = arith.addf %one, %two : f64
    func.call @lox_print(%sum) : (f64) -> ()
    %nil = arith.constant 0.0 : f64
    func.return %nil : f64
  }
}

This is what our code generator produces — arith operations for math, func operations for functions and calls. No scf here because there’s no control flow — straight-line code all the way through. Every value is an f64. The @lox_print declaration tells the verifier that an external function with that name exists — the linker will resolve it to our runtime.

After arith-to-llvm (func still present):

module {
  func.func private @lox_print(f64)

  func.func @main() -> f64 {
    %one = llvm.mlir.constant(1.0 : f64) : f64
    %two = llvm.mlir.constant(2.0 : f64) : f64
    %sum = llvm.fadd %one, %two : f64
    func.call @lox_print(%sum) : (f64) -> ()
    %nil = llvm.mlir.constant(0.0 : f64) : f64
    func.return %nil : f64
  }
}

arith.addf became llvm.fadd. arith.constant became llvm.mlir.constant. The func operations are untouched — this pass only converts arithmetic. This is the key insight: each pass handles one dialect and ignores everything else.

After func-to-llvm (llvm only):

module {
  llvm.func private @lox_print(f64)

  llvm.func @main() -> f64 {
    %one = llvm.mlir.constant(1.0 : f64) : f64
    %two = llvm.mlir.constant(2.0 : f64) : f64
    %sum = llvm.fadd %one, %two : f64
    llvm.call @lox_print(%sum) : (f64) -> ()
    %nil = llvm.mlir.constant(0.0 : f64) : f64
    llvm.return %nil : f64
  }
}

func.func became llvm.func, func.call became llvm.call, func.return became llvm.return. Every operation now lives in the llvm dialect — one translation step away from actual LLVM IR.

After mlir-translate (LLVM IR):

declare void @lox_print(double)

define double @main() {
entry:
  %0 = fadd double 1.0, 2.0
  call void @lox_print(double %0)
  ret double 0.0
}

This is textual LLVM IR — what llc or clang consumes to produce machine code. The names changed (llvm.fadd → fadd, llvm.mlir.constant → inline constants), but the operations are the same. The @lox_print declaration is an external symbol — the linker will resolve it to our runtime function.

Four stages, one program. Each stage is a mechanical translation. No magic — dialects converting to lower dialects until you hit something the hardware understands.

A Second Example: `if` and the `scf` Dialect

The first example had no control flow, so scf never appeared. Let’s see what happens when it does. Here’s a Lox program with an if statement:

fun max(a, b) {
  if (a < b) {
    print b;
  } else {
    print a;
  }
}

After code generation (arith + func + scf):

module {
  func.func private @lox_print(f64)

  func.func @max(%arg0: f64, %arg1: f64) -> f64 {
    %cond = arith.cmpf olt, %arg0, %arg1 : f64
    scf.if %cond {
      func.call @lox_print(%arg1) : (f64) -> ()
    } else {
      func.call @lox_print(%arg0) : (f64) -> ()
    }
    %nil = arith.constant 0.0 : f64
    func.return %nil : f64
  }
}

Now there’s scf.if — structured control flow with two regions (then and else). The condition comes from arith.cmpf (ordered less-than). Notice that scf.if doesn’t produce a value here — both branches call lox_print and fall through. The implicit 0.0 return comes after the scf.if.

After scf-to-cf (arith + func + cf):

module {
  func.func private @lox_print(f64)

  func.func @max(%arg0: f64, %arg1: f64) -> f64 {
    %cond = arith.cmpf olt, %arg0, %arg1 : f64
    cf.cond_br %cond, ^then, ^else
      ^then:
        func.call @lox_print(%arg1) : (f64) -> ()
        cf.br ^continue
      ^else:
        func.call @lox_print(%arg0) : (f64) -> ()
        cf.br ^continue
      ^continue:
    %nil = arith.constant 0.0 : f64
    func.return %nil : f64
  }
}

The scf.if became three blocks: ^then, ^else, and ^continue. Each branch ends with cf.br ^continue — the structured guarantee that both paths rejoin is now explicit.

Indented blocks are a presentation choice. In these examples, ^then, ^else, and ^continue are indented under cf.cond_br / llvm.cond_br to show their logical relationship. In MLIR’s actual textual format, blocks are siblings within the function body — they’re all at the same level, not nested under the branch. A reader who feeds this into mlir-translate would need to remove the indentation. The indented presentation makes the control flow structure easier to follow at the cost of not being valid MLIR syntax.

cf.cond_br takes the condition and two block labels — if the condition is true, jump to ^then; otherwise jump to ^else.

Variable scoping in compile_if. The code generator passes the same variables HashMap to both branches of scf.if. This means a variable declared inside one branch leaks into the other’s namespace — but the SSA value lives in the wrong block, so using it produces invalid MLIR (a dominance violation). In the max example above this doesn’t matter because both branches only call lox_print and don’t declare variables. But if you wrote if (cond) { var x = 1; } print x;, the code generator would put x in the HashMap with an SSA value from the then block, and print x would try to use that value in the outer block — where it isn’t dominant. A production compiler would either track scope depth (each branch gets its own variable namespace, and declarations don’t escape) or use scf.if with results (yield the variable’s value from both branches). Our simplified compiler does neither. Don’t declare variables inside if branches and use them outside. The same caveat applies to compile_while — the “A Third Example” section below shows the production-ready pattern with loop-carried values.

module {
  llvm.func private @lox_print(f64)

  llvm.func @max(%arg0: f64, %arg1: f64) -> f64 {
    %cond = llvm.fcmp "olt" %arg0, %arg1 : f64
    llvm.cond_br %cond, ^then, ^else
      ^then:
        llvm.call @lox_print(%arg1) : (f64) -> ()
        llvm.br ^continue
      ^else:
        llvm.call @lox_print(%arg0) : (f64) -> ()
        llvm.br ^continue
      ^continue:
    %nil = llvm.mlir.constant(0.0 : f64) : f64
    llvm.return %nil : f64
  }
}

Every operation is now in the llvm dialect. arith.cmpf olt became llvm.fcmp "olt". cf.cond_br became llvm.cond_br. cf.br became llvm.br. The block structure is identical — only the dialect names changed.

After mlir-translate (LLVM IR):

declare void @lox_print(double)

define double @max(double %0, double %1) {
entry:
  %cond = fcmp olt double %0, %1
  br i1 %cond, label %then, label %else

then:
  call void @lox_print(double %1)
  br label %continue

else:
  call void @lox_print(double %0)
  br label %continue

continue:
  ret double 0.0
}

Notice what happened to the condition: arith.cmpf produces an i1 (single-bit integer) after lowering, and LLVM’s br i1 uses it directly. In the MLIR stages, the condition was typed as the result of arith.cmpf / llvm.fcmp; in LLVM IR it’s a plain i1. This is LLVM’s representation of boolean values — comparison results are always i1, a single-bit integer. MLIR’s arith.cmpf and llvm.fcmp produce the same thing; LLVM IR makes the i1 type explicit. (The C calling convention governs how function arguments and return values are passed between callers and callees — it’s a separate concern from how LLVM represents booleans internally.)

A Third Example: `while` and Back-Edges

The if example showed branches that jump forward — both paths end up at ^continue. A while loop is different: the body jumps backward to the condition check. This back-edge is what makes loops work, and it’s what scf.while hides behind a region-based API.

var i = 0;
while (i < 5) {
  print i;
  i = i + 1;
}

About this example: The MLIR below shows scf.while with loop-carried values — the counter %i threads through the scf.condition and scf.yield as an operand. This is the correct approach for while loops that modify variables used in the condition. Our code generator above uses a simpler approach (tracking variables through the HashMap) that works for read-only conditions but produces invalid MLIR when the loop body reassigns a variable. The CAUTION note in compile_while explains the dominance issue. The example here shows the production-ready pattern.

After code generation (arith + func + scf):

module {
  func.func private @lox_print(f64)

  func.func @main() -> f64 {
    %zero = arith.constant 0.0 : f64
    %five = arith.constant 5.0 : f64
    %one = arith.constant 1.0 : f64
    %loop_result = scf.while (%i = %zero) : f64 {
      %cond = arith.cmpf olt, %i, %five : f64
      scf.condition(%cond) %i : f64
    } do {
    ^bb0(%i: f64):
      func.call @lox_print(%i) : (f64) -> ()
      %next = arith.addf %i, %one : f64
      scf.yield %next : f64
    }
    func.return %loop_result : f64
  }
}

scf.while has two regions. The before region checks the condition and calls scf.condition — if true, execution enters the after region; if false, the loop exits with the current iterator value. The after region runs the body and yields the updated value back to the before region. The loop variable %i flows between the two regions: initialized to %zero, updated by the after region, and checked again by the before region on each iteration.

After scf-to-cf (arith + func + cf):

module {
  func.func private @lox_print(f64)

  func.func @main() -> f64 {
    %zero = arith.constant 0.0 : f64
    %five = arith.constant 5.0 : f64
    %one = arith.constant 1.0 : f64
    cf.br ^before(%zero : f64)
    ^before(%i: f64):
      %cond = arith.cmpf olt, %i, %five : f64
      cf.cond_br %cond, ^body(%i : f64), ^exit(%i : f64)
    ^body(%i: f64):
      func.call @lox_print(%i) : (f64) -> ()
      %next = arith.addf %i, %one : f64
      cf.br ^before(%next : f64)     ← the back-edge
    ^exit(%i: f64):
      func.return %i : f64
  }
}

Now the back-edge is explicit: cf.br ^before(%next : f64) at the bottom of ^body jumps back to ^before. The loop is three blocks with branches — no special “while” construct, only control flow. cf.cond_br at the top of ^before decides whether to enter the body or exit. The %i argument threads through every block, carrying the current loop counter.

Compare this to the if example. Both use cf.cond_br for the branch decision. The difference is what happens next: the if jumps forward to a continue block, while the while jumps backward to the condition. That’s the only structural difference between a conditional and a loop at the cf level.

After arith-to-llvm + func-to-llvm + cf-to-llvm (llvm only):

module {
  llvm.func private @lox_print(f64)

  llvm.func @main() -> f64 {
    %zero = llvm.mlir.constant(0.0 : f64) : f64
    %five = llvm.mlir.constant(5.0 : f64) : f64
    %one = llvm.mlir.constant(1.0 : f64) : f64
    llvm.br ^before(%zero : f64)
    ^before(%i: f64):
      %cond = llvm.fcmp "olt" %i, %five : f64
      llvm.cond_br %cond, ^body(%i : f64), ^exit(%i : f64)
    ^body(%i: f64):
      llvm.call @lox_print(%i) : (f64) -> ()
      %next = llvm.fadd %i, %one : f64
      llvm.br ^before(%next : f64)
    ^exit(%i: f64):
      llvm.return %i : f64
  }
}

Same block structure as the cf stage. Only the dialect names changed. The back-edge — llvm.br ^before — is still there. By this point, MLIR is done. The final step is mlir-translate to LLVM IR, which produces the same block structure with LLVM IR syntax.

After mlir-translate (LLVM IR):

declare void @lox_print(double)

define double @main() {
entry:
  br label %before

before:                                             ; ← the back-edge target
  %i = phi double [ 0.0, %entry ], [ %next, %body ]
  %cond = fcmp olt double %i, 5.0
  br i1 %cond, label %body, label %exit

body:
  call void @lox_print(double %i)
  %next = fadd double %i, 1.0
  br label %before                                 ; ← the back-edge

exit:
  ret double %i
}

Something new appears in the LLVM IR that wasn’t in the MLIR: the phi instruction. In LLVM IR, each block is a separate scope — values from other blocks aren’t directly accessible. The phi instruction is LLVM’s way of merging values from different predecessors: “if we came from entry, the value is 0.0; if we came from body, the value is %next.” MLIR handles this implicitly through block arguments (^before(%i: f64)), but LLVM IR requires an explicit phi.

This is one of the clearest differences between MLIR and LLVM IR. MLIR uses block arguments; LLVM uses phi nodes. They express the same thing — a value that depends on which predecessor block we came from — but MLIR’s syntax is simpler. When you read LLVM IR and see a phi, think “that was a block argument in MLIR.”

Three examples, same pipeline. The first showed arithmetic. The second added forward branching. The third added back-edges and introduced phi nodes. Every Lox program is a combination of these three patterns — arithmetic, conditionals, and loops — and every one follows the same lowering path.

Tagged union model: The examples above use the numbers-only model. In the tagged union model (Parts 7–12), the same programs would pass (i8, i64) pairs to @lox_print instead of bare f64 values, and 1.0 would be wrapped in a struct with tag=2 (TAG_NUMBER). The lowering pipeline is the same — only the code generation step changes.

Why this ordering? The first example (arithmetic only) didn’t have control flow — straight-line code, so only arith-to-llvm and func-to-llvm did anything. But a program with if or while produces scf operations, and those need their own pass. Here’s the full ordering for programs that have control flow:

scf-to-cf first. scf.if and scf.while are structured control flow — they use regions to guarantee well-formedness. But LLVM doesn’t have regions; it has basic blocks and branches. The scf-to-cf pass converts structured control flow into explicit cf.cond_br (conditional branch) and cf.br (unconditional branch) operations.
arith-to-llvm next. arith.addf, arith.cmpf, and friends become llvm.fadd, llvm.fcmp, etc. This pass doesn’t touch control flow — it only converts arithmetic and comparison operations.
func-to-llvm and cf-to-llvm last. These convert the remaining high-level operations. func.func becomes llvm.func, func.call becomes llvm.call, func.return becomes llvm.return. cf.cond_br and cf.br become llvm.cond_br and llvm.br.

The order of arith-to-llvm relative to func-to-llvm/cf-to-llvm doesn’t matter — they touch different operations. We list arith-to-llvm first because it produces the simpler transformation. Only scf-to-cf before cf-to-llvm is a hard ordering constraint.

The passes don’t interfere with each other because they operate on different dialects. arith-to-llvm ignores scf and func operations; func-to-llvm ignores arith and cf operations. This separation is MLIR’s key advantage over LLVM — each dialect is independently optimizable and lowerable.

Implementing the Pipeline

#![allow(unused)]
fn main() {
// src/lib.rs
pub mod ast;
pub mod lexer;
pub mod parser;
pub mod codegen;

use anyhow::Result;
use melior::{
    Context,
    dialect::DialectRegistry,
    pass::PassManager,
    utility::register_all_dialects,
};

pub fn compile_to_llvm(source: &str) -> Result<String> {
    // 1. Tokenize and parse
    let tokens = lexer::tokenize(source)?;
    let program = parser::Parser::new(tokens).parse()?;
    
    // 2. Generate MLIR
    let registry = DialectRegistry::new();
    register_all_dialects(&registry);
    
    let context = Context::new();
    context.append_dialect_registry(&registry);
    context.load_all_available_dialects();
    
    let module = codegen::generate_module(&context, &program);
    
    // 3. Run lowering passes (MLIR → LLVM IR)
    //
    //    The pipeline converts dialects from high to low:
    //
    //    scf → cf     (structured control flow → explicit branches)
    //    arith → llvm  (arithmetic ops → LLVM float/integer ops)
    //    func → llvm  (functions → LLVM functions)
    //    cf → llvm    (branches → LLVM branches)
    //
    //    In Melior 0.27, pass registration and invocation looks like:
    //
    //    let pass_manager = PassManager::new(&context);
    //    pass_manager.add_pass(melior::pass::conversion::create_scf_to_control_flow());
    //    pass_manager.add_pass(melior::pass::conversion::create_arith_to_llvm());
    //    pass_manager.add_pass(melior::pass::conversion::create_func_to_llvm());
    //    pass_manager.add_pass(melior::pass::conversion::create_control_flow_to_llvm());
    //    pass_manager.run(&module)?;
    //
    //    The exact pass function names may differ between Melior versions.
    //    Check the `melior::pass` module for available conversion passes.
    //    The function names follow the pattern `convert_<dialect>_to_<dialect>`
    //    (e.g., `create_scf_to_control_flow`, `create_arith_to_llvm`). If a function
    //    name doesn't match, search for the dialect name in the `pass` module —
    //    the underlying MLIR pass is the same regardless of what Melior calls it.
    
    Ok(module.as_operation().to_string())
}
}

What About Control Flow?

For a program with an if statement, the pipeline adds one more step:

// Before scf-to-cf:
scf.if %cond {
  // then region
} else {
  // else region
}

// After scf-to-cf:
cf.cond_br %cond, ^then_block, ^else_block
^then_block:
  // then operations
  cf.br ^join
^else_block:
  // else operations
  cf.br ^join
^join:
  // continue

scf.if uses regions (nested blocks inside the operation). cf.cond_br uses basic blocks and explicit branches. The scf-to-cf pass unwraps the regions into flat control flow. Then cf-to-llvm converts cf.cond_br and cf.br into llvm.cond_br and llvm.br.

The key difference: scf.if guarantees well-formedness (the regions can’t be jumped into from outside). cf.cond_br does not — a malformed program could branch into the middle of an if-else. The scf dialect’s structure is a safety net that the lowering pass removes once the IR is correct.

For simplicity, this example shows an scf.if without results. When scf.if produces a value (like var x = if (cond) { a } else { b }), each region ends with scf.yield to provide the result, and the scf-to-cf pass threads that result through block arguments on cf.br:

// scf.if with a result:
%x = scf.if %cond -> f64 {
  scf.yield %a : f64
} else {
  scf.yield %b : f64
}

// After scf-to-cf, the result is threaded through block arguments:
cf.cond_br %cond, ^then, ^else
^then:
  cf.br ^join(%a : f64)       // pass %a as a block argument to ^join
^else:
  cf.br ^join(%b : f64)       // pass %b as a block argument to ^join
^join(%result: f64):           // %result is %a or %b, depending on the branch
  // use %result here

The ^join block takes a parameter (%result: f64) that receives the value from whichever branch was taken. This is how cf represents the single-assignment result that scf.if computed structurally. Every branch into ^join must provide the same number and types of block arguments — if they don’t match, the verifier catches it.

The C Runtime

Our generated MLIR declares @lox_print as an external function — the verifier accepts it, the lowering passes leave it alone, and the final LLVM IR emits a declare void @lox_print(double). But clang can’t link an executable without a definition for that symbol. We need a runtime.

The runtime is tiny for the numbers-only model — one function that prints a float:

// runtime/print.c
#include <stdio.h>

void lox_print(double value) {
    printf("%g\n", value);
}

That’s it. %g prints numbers without trailing zeros — 1.0 prints as 1, 3.14 prints as 3.14. It picks the shorter of %f (fixed-point) and %e (scientific). This matches Lox’s print behavior for numbers.

The obvious problem: nil, false, and 0 are all 0.0 in the numbers-only model, so lox_print prints 0 for all three. We can’t tell them apart because we threw away the type information when we decided every value is an f64. The tagged union model in Part 7 fixes this — lox_print gets the tag alongside the payload and can dispatch to printf("nil"), printf("false"), or printf("%g", payload) as appropriate. But for now, one function, one format specifier, and we move on.

Using the CLI

With the runtime in place, the full compile pipeline is:

# Compile Lox to MLIR
cargo run -- compile input.lox --emit-mlir -o output.mlir

# Lower MLIR to LLVM IR
mlir-translate output.mlir --mlir-to-llvmir -o output.ll

# Compile to executable (link the runtime)
clang output.ll runtime/print.c -o output -lm

# Run it
./output

That -lm links the C math library. We don’t use it yet, but it’s good practice — Lox’s standard library (Part 10) calls fmod, floor, and other math functions.

If you get undefined reference to lox_print at link time, make sure runtime/print.c is on the compile command and the path is correct relative to where you’re running clang.

Project Structure

lox-mlir/
├── Cargo.toml
├── src/
│   ├── lib.rs              # Library entry point
│   ├── main.rs             # CLI entry point
│   ├── ast.rs              # AST definitions
│   ├── lexer.rs            # Tokenizer (Token, TokenType, LexValue)
│   ├── parser.rs           # Parser (ParseError, Parser)
│   ├── codegen/
│   │   ├── mod.rs
│   │   ├── generator.rs    # MLIR code generator
│   │   ├── types.rs        # Tagged union types
│   │   └── strings.rs      # String constant handling
│   └── runtime/
│       └── print.c         # C runtime — lox_print for the numbers-only model
├── examples/
│   ├── simple_add.rs
│   └── *.lox
└── tests/
    └── integration.rs

Quick Reference: Lox → MLIR Mapping

Lox Construct	Rust Enum	MLIR Operation
`a + b`	`BinaryOp::Add`	`arith.addf`
`a - b`	`BinaryOp::Sub`	`arith.subf`
`a * b`	`BinaryOp::Mul`	`arith.mulf`
`a / b`	`BinaryOp::Div`	`arith.divf`
`a < b`	`BinaryOp::Less`	`arith.cmpf olt`
`a == b`	`BinaryOp::Equal`	`arith.cmpf oeq`
`not x`	`UnaryOp::Not`	`arith.cmpf oeq` → `scf.if` (1.0 / 0.0)
`var x = v`	`VarStmt`	Store in HashMap
`x`	`VariableExpr`	Load from HashMap
`if (c) {...}`	`IfStmt`	`scf.if`
`while (c) {...}`	`WhileStmt`	`scf.while`
`fun f(...) {...}`	`FunctionStmt`	`func.func`
`f(args)`	`CallExpr`	`func.call`
`return v`	`ReturnStmt`	`func.return`

Tagged Union Model (Parts 7–12)

When using the !llvm.struct<(i8, i64)> tagged union, the same Lox constructs produce more MLIR operations — each one checks the tag, extracts the payload, operates, and re-packs the result:

Lox Construct	Numbers-Only MLIR	Tagged Union MLIR
`a + b`	`arith.addf`	tag check → `llvm.extractvalue` → `llvm.bitcast` → `arith.addf` → `llvm.bitcast` → `llvm.insertvalue`
`a < b`	`arith.cmpf olt`	tag check → extract → `arith.cmpf olt` → re-pack
`a and b`	`scf.if` → result	`scf.if` on tag → extract → `scf.if` for short-circuit → re-pack
`var x = v`	Store `f64` in HashMap	Store `(tag, payload)` pair
`nil`	`arith.constant 0.0`	`llvm.undef` + `llvm.insertvalue` (tag=0, payload undef)
`true` / `false`	`arith.constant 1.0/0.0` (no real boolean support)	`llvm.insertvalue` (tag=1, payload 1/0)
`"hello"`	Compiles to `nil` (0.0)	`llvm.mlir.addressof` + `llvm.insertvalue` (tag=3)
`print x`	`func.call @lox_print(%x)`	`func.call @lox_print(%tag, %payload)` or pass the whole struct

The numbers-only model keeps each row to one MLIR operation. The tagged union model expands each row to 3–6 operations. That’s the cost of dynamic typing at the MLIR level — and it’s why we start with the simpler model in Parts 1–6.

A few notes on the numbers-only column.

true/false as 1.0/0.0 — the numbers-only model has no real boolean support. 1.0 and 0.0 are a C-style convention, not a faithful representation. Lox programs that use booleans will compile, but the semantics are approximate.

"hello" compiles to nil (0.0) — strings can’t be represented as a single f64. The code generator’s compile_literal returns compile_nil() for string values, which means print "hello" prints 0 in the numbers-only model. Strings require the tagged union model (Part 7).

print x with tag+payload — the tagged-union column shows the two-argument form (pass tag and payload separately). You could also pass the whole !llvm.struct<(i8, i64)> and let the callee destructure it — both are valid design choices. The two-argument form is more explicit about what’s happening at the ABI level.

Differences from C++ MLIR

Aspect	C++ MLIR	Melior (Rust)
Dialect definition	TableGen (`.td`)	Rust code directly
Operations	Generated from ODS	Built with `OperationBuilder`
Ownership	Manual / raw pointers	RAII with lifetimes
Pattern rewriting	C++ classes	Closures / Rust traits
Error handling	`LogicalResult`	`Result<T, Error>`

Next Steps

You’ve seen the full pipeline: parse Lox, generate MLIR, lower through dialects, emit LLVM IR. Every value is an f64, and we hand off arith and func operations to MLIR’s built-in lowering passes. That works — but it’s leaky. Every variable sits in a Rust HashMap that disappears after codegen. No garbage collection. No heap-allocated objects. No way for the compiled program to manage memory at runtime.

Part 2 adds a garbage collector. Not a library — we generate MLIR that implements one: shadow stack roots, mark-and-sweep traversal, and custom lox.* operations for allocation and rooting that have no standard MLIR equivalent. This is where the tutorial stops being “emit arith operations and let MLIR handle the rest” and starts being “define your own semantics in MLIR.”

Next: Part 2 — Garbage Collection from Scratch — We can compile Lox to MLIR, but our compiled code has a problem: every string, every closure, every instance leaks memory forever. We need a garbage collector. We’ll build mark-sweep from scratch — the same algorithm Lua and the JVM use, scaled down.

MLIR for Lox: Part 2 — Garbage Collection from Scratch — Because Lox Values Can’t Clean Up After Themselves

var a = "hello";
var b = a;
a = nil;
// Is "hello" still needed? b still references it!

When a was set to "hello", the runtime allocated memory for that string. When a was set to nil, the string is still reachable through b. Free it too early and b points to garbage. Free it too late and your program’s memory grows forever.

In C, you’d call free() yourself and get it wrong. In a language like Lox, the runtime needs to figure out when it’s safe — and that’s what garbage collection does.

The core question GC answers:

When is an object no longer needed, so it’s safe to free?

This part builds the foundation: the mark-sweep algorithm, object headers, allocation, and the global object list. Part 3 adds shadow stacks (for finding roots) and Part 4 wires the whole thing into the MLIR code generator so the compiler emits GC bookkeeping automatically. By the end of Part 4, you’ll have a real mark-sweep collector. The kind you’d find in a production interpreter, scaled down.

A note on the code examples. Parts 2 and 3 use Rust for their code examples because it’s more readable than the equivalent C — pattern matching, Option instead of null, and thread_local! instead of __thread globals all make the logic easier to follow. The actual runtime (Part 9) is C, and the data structures are the same: a linked list of ObjHeaders, a shadow stack of StackFrames. If you’re reading ahead, the C versions of alloc, gc_push_frame, gc_collect, and friends in Part 9 are the real implementations. The Rust here is the conceptual skeleton.

The Intuition

Think of it like a party at your house:

Objects = guests
References = guests holding a “ticket” to stay
GC = you checking who still has a ticket

Guest A enters (allocate)
    ↓
Guest A gives ticket to Guest B (reference)
    ↓
Guest A leaves but Guest B still has ticket
    ↓
You check: "Does anyone have Guest A's ticket?"
    ↓
No? → Guest A can leave (free)

Key insight: An object is “live” if it’s reachable from your program’s variables. If nothing points to it, it’s garbage.

The Two Questions

GC must answer:

What objects are “roots”? (starting points for reachability)
What objects does each object reference? (follow the chain)

What Are Roots?

A “root” is something your program can directly access:

var x = 1;    // 'x' is a root (global variable)
print x;      // We can reach '1' from 'x'

Roots come from three places: global variables, local variables on the stack, and temporaries — the intermediate results that expressions produce while they’re being computed.

What Is Reachability?

var a = "hello";   // "hello" is reachable from root 'a'
var b = a;         // "hello" is reachable from 'b' too
a = nil;           // Still reachable from 'b'
b = nil;           // Now unreachable! Garbage!

Visual representation:

BEFORE:                      AFTER b = nil:

    ┌─────────┐                  ┌─────────┐
    │ root: a │──► "hello"       │ root: a │──► nil
    └─────────┘                  └─────────┘
    ┌─────────┐                  ┌─────────┐
    │ root: b │──► "hello"       │ root: b │──► nil
    └─────────┘                  └─────────┘
                                      │
                                 "hello" ← GARBAGE!
                                 (nothing points to it)

The Algorithm: Mark-Sweep

The simplest GC algorithm is mark-sweep. It has two phases:

Phase 1: Mark

┌─────────────────────────────────────────────┐
│ MARK PHASE                                  │
│                                             │
│   1. Start with all roots                   │
│   2. For each root, mark the object         │
│   3. Recursively mark objects it references │
│   4. Repeat until nothing new to mark       │
└─────────────────────────────────────────────┘

Phase 2: Sweep

┌─────────────────────────────────────────────┐
│ SWEEP PHASE                                 │
│                                             │
│   1. Walk all allocated objects             │
│   2. If marked: clear mark (still live)     │
│   3. If not marked: FREE IT (garbage)       │
└─────────────────────────────────────────────┘

Visual Example

Before collection:

Objects in heap:
  
  [A] ──► [B] ──► [C]
   ↑
  root (global variable)
  
  [D] ──► [E]
   ↑
  nothing! (D was a local that went out of scope)

After mark phase:

  [A*] ──► [B*] ──► [C*]    (* = marked, still live)
   ↑
  root
  
  [D] ──► [E]              (not marked, these are garbage)

After sweep phase:

  [A] ──► [B] ──► [C]       (live, marks cleared for next cycle)
  
  [freed] [freed]           (D and E are gone! memory reclaimed)

The Implementation Challenge

Now we know WHAT to do. But HOW does the GC know:

Where are the roots? (what variables are currently in scope)
What does each object reference? (which other objects it points to)

These are the two hard problems. Let’s tackle them one at a time.

Problem 1: Finding Roots

When the GC runs, your program is paused mid-execution. The stack has local variables. Those are roots.

fun example() {
    var a = 1;    // 'a' is on the stack → root
    var b = 2;    // 'b' is on the stack → root
    // GC runs here!
    // It needs to know about 'a' and 'b'
}

The challenge: In compiled code, variables are CPU registers or stack slots. The GC doesn’t know which ones are pointers vs integers.

Solutions: Conservative GC treats everything that looks like a pointer as a pointer — an integer whose value happens to equal a heap address will keep that object alive even though nothing actually references it. Simple, but you retain garbage you don’t need to. Precise GC tells the GC exactly where the pointers are — more work to set up, but no false positives. We’ll use precise GC.

Problem 2: Finding Object References

Each object may reference other objects:

class Pair {
    init(first, second) {
        this.first = first;   // this object references 'first'
        this.second = second; // this object references 'second'
    }
}

var pair = Pair(1, 2);
// The 'pair' object references two number objects

The challenge: The GC needs to know: given an object, what other objects does it point to?

Solution: Store type information with each object so the GC knows how to walk it.

Our First Data Structure

Before writing any GC code, we need a way to represent objects. Every object needs:

A header (metadata for the GC)
The actual data (the object’s fields)

┌────────────────────────────┐
│ Object Header              │
│   - marked: bool           │
│   - type: ObjType          │
│   - size: u32              │
├────────────────────────────┤
│ Object Data                │
│   - field 1                │
│   - field 2                │
│   - ...                    │
└────────────────────────────┘

Why the Header?

When the GC walks all objects, it needs to know whether an object is already marked (to avoid infinite loops), what type it is (to find references to other objects), and how big it is (so the sweep phase knows how much memory to free).

The Type Enum

Different Lox objects have different data:

Lox Type	What It Stores	What It References
Number	`f64`	Nothing
String	`bytes`	Nothing
Closure	`function ptr, env`	Environment variables
Instance	`class ptr, fields`	Field values

Closure and Instance don’t appear until Parts 5 and 7. They’re listed here because the ObjType enum needs to account for every kind of object the runtime will handle — even types we haven’t built yet. Think of this table as the full map; we’ll fill in each row as we go.

Writing the Collector

Now we understand the concepts. Let’s implement the absolute minimum:

Step 1: Define the Object Header

#![allow(unused)]
fn main() {
// src/runtime/object.rs

/// Type of a Lox object
#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ObjType {
    Number = 0,
    String = 1,
    Closure = 2,
    Instance = 3,
}
}

This enum grows as the tutorial progresses. Part 5 (Closures) adds Environment = 2 for closure capture, which shifts Closure to 3 and Instance to 4. Part 7 (Classes) adds Class = 5 and BoundMethod = 6. Each part shows the cumulative enum with explicit discriminants so you can see which values are in use.

#![allow(unused)]
fn main() {
/// Header prepended to every heap object
/// 
/// Memory layout:
///   [header: ObjHeader][data: depends on ObjType]
#[repr(C)]
pub struct ObjHeader {
    /// Has this object been marked by the GC?
    pub marked: bool,
    
    /// What type of object is this?
    pub obj_type: ObjType,
    
    /// Size of the data portion (not including header)
    pub size: u32,
    
    /// Pointer to next object in the allocation list
    /// (the GC needs to walk all objects during sweep)
    /// null = end of list
    pub next: *mut ObjHeader,
}
}

Step 2: The Global Object List

The GC needs to know about ALL objects to sweep them:

#![allow(unused)]
fn main() {
// src/runtime/gc.rs
use std::cell::RefCell;

/// All allocated objects, linked list style
thread_local! {
    static ALL_OBJECTS: RefCell<*mut ObjHeader> = RefCell::new(std::ptr::null_mut());
    static OBJECT_COUNT: RefCell<usize> = RefCell::new(0);
}
}

Why thread_local!? Because Lox is single-threaded, and this avoids passing a GC context everywhere.

Step 3: Allocation (Without GC Yet)

#![allow(unused)]
fn main() {
/// Allocate a new object
/// 
/// This is like `malloc` but:
///   1. Adds a header for GC metadata
///   2. Tracks the object in a global list
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    // Calculate total size: header + data
    let total_size = std::mem::size_of::<ObjHeader>() + size;
    
    // Allocate raw memory
    // Note: size is usize but we store it as u32 in the header,
    // which means allocations larger than 4 GB would silently
    // truncate. Fine for Lox objects — nothing in this language
    // needs that much contiguous memory.
    let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
    let ptr = unsafe { std::alloc::alloc(layout) };
    
    if ptr.is_null() {
        panic!("Out of memory!");
    }
    
    // Write the header
    let header = ptr as *mut ObjHeader;
    unsafe {
        (*header).marked = false;
        (*header).obj_type = obj_type;
        (*header).size = size as u32;
        
        // Add to global list
        ALL_OBJECTS.with(|list| {
            (*header).next = *list.borrow();
            *list.borrow_mut() = header;
        });
    }
    
    // Return pointer to DATA (after header)
    let data_ptr = unsafe { ptr.add(std::mem::size_of::<ObjHeader>()) };
    
    // Increment count
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() += 1;
    });
    
    data_ptr
}
}

Memory Layout Deep Dive

Let’s trace through exactly what happens when we allocate:

#![allow(unused)]
fn main() {
// Calling:
let ptr = alloc(16, ObjType::String);

// Step 1: Calculate total size
//   sizeof(ObjHeader) = 16 bytes (bool + u8 + 2 padding + u32 + pointer)
//   data_size = 16 bytes
//   total = 32 bytes

// Step 2: Allocate raw memory
//   ptr = 0x1000 (example address)

// Step 3: Write header at ptr
//   0x1000: marked = false
//   0x1001: obj_type = String (1)
//   0x1004: size = 16
//   0x1008: next = previous head of list (null if first)

// Step 4: Return data pointer
//   data_ptr = 0x1010 (ptr + 16)
}

Visual Memory Layout

With repr(C), the compiler lays out fields in declaration order with natural alignment. bool and u8 are both 1-byte aligned, so obj_type goes right after marked. Then 2 bytes of padding align size: u32 to a 4-byte boundary. next is a raw pointer (8 bytes, 8-byte aligned), which is already satisfied at offset 8.

Offset  Field                   Bytes
──────────────────────────────────────────
  +0    marked: false           1 byte
  +1    obj_type: String (1)    1 byte
  +2    (padding)               2 bytes  ← align size to 4
  +4    size: 16                4 bytes
  +8    next: 0x2000            8 bytes  ← next object in allocation list
 +16    ── data starts here ──
        Your 16 bytes of string data

Total header: 16 bytes. Data pointer returned by alloc() points to offset +16.

Why isn’t next bigger? The next field is a raw pointer (8 bytes), not Option<*mut ObjHeader> (which you might expect since the list needs a sentinel for “no next object”). Rust’s null pointer optimization means Option<*mut T> is the same size as a raw pointer — None maps to a null pointer, Some(ptr) maps to the pointer value. No extra byte for a discriminant. That’s why the header is 16 bytes, not 24. (Our code uses a raw *mut ObjHeader with null as the sentinel, which is the C-compatible equivalent of the same idea.)

Key insight: When you receive a pointer from alloc(), you can always find the header by subtracting 16 bytes (sizeof header).

Practice Exercise

Before moving on, let’s verify your understanding:

Exercise 1: Trace the Allocations

Given this code:

#![allow(unused)]
fn main() {
let a = alloc(8, ObjType::Number);   // Allocates a number
let b = alloc(24, ObjType::String);  // Allocates a string

// What does ALL_OBJECTS look like?
// What are the header addresses for a and b?
}

Click to reveal answer

ALL_OBJECTS → b's header → a's header → null

a's header:
  - address: a_data_ptr - 16  (header precedes data by sizeof(ObjHeader))
  - size: 8
  - type: Number
  - next: null (it was first)

b's header:
  - address: b_data_ptr - 16
  - size: 24
  - type: String
  - next: a's header

The linked list is built in reverse order. Each new allocation becomes the new head.

Exercise 2: Why the Header?

Why do we need marked, obj_type, and size in the header? What would happen if we removed each one?

Click to reveal answer

Without marked: We couldn’t track which objects are live. The sweep phase wouldn’t know what to free. We might free live objects or keep garbage forever.
Without obj_type: We couldn’t walk object references. When marking a closure, we wouldn’t know it has an environment to also mark.
Without size: We couldn’t properly free memory (need to know how much to deallocate). We also couldn’t debug allocation issues.

Exercise 3: What Breaks Without `thread_local!`?

Suppose we changed the global object list to a plain static:

#![allow(unused)]
fn main() {
static mut ALL_OBJECTS: *mut ObjHeader = std::ptr::null_mut();
}

What would go wrong if two threads ran Lox programs simultaneously? (Hint: think about the alloc function — specifically the line that updates next and the line that sets the list head.)

Click to reveal answer

Two problems:

Data race on the linked list. Thread A allocates, sets (*header).next = current_head. Thread B allocates concurrently, does the same. Both read the same current_head and both write their header as the new head. One of them loses — its object is no longer reachable from the list, so it’ll never be freed. That’s a memory leak.
Data race on the count. Both threads read OBJECT_COUNT, both increment, both write back. Final count is one less than the actual number of objects.

thread_local! avoids both: each thread gets its own list and its own counter. No sharing, no races, no locks needed.

Next: Part 3 — Finding Roots — Mark-sweep can free garbage, but it has a prerequisite: knowing which objects are still alive. In compiled code, that means knowing which stack slots hold pointers. We’ll build a shadow stack that tells the GC exactly where the roots are — and wire it into the MLIR code generator so the compiler emits root registration automatically.

MLIR for Lox: Part 3 — Finding GC Roots — The Compiler Knows Where the Pointers Are

When GC runs, it needs to know: what variables are currently holding object references?

fun example() {
    var a = "hello";    // 'a' is on the stack
    var b = 42;         // 'b' is on the stack (but it's a number, not a reference)
    var c = a;          // 'c' is on the stack, points to "hello"
    
    // GC RUNS HERE
    
    // The GC needs to know:
    //   - 'a' is a reference to a string object
    //   - 'b' is NOT a reference (it's a number, not a pointer)
    //   - 'c' is a reference to the same string object
}

Part 2 built a mark-sweep collector that knows how to collect garbage. But mark-sweep has a prerequisite: you need to know where to start marking. That’s the root-finding problem, and it’s harder than it looks.

Why This Is Hard

In compiled code, a, b, and c are bytes in memory (stack slots or registers). The GC sees:

Stack memory:
  [slot 0]: 0x7fff1234   <-- Is this a pointer? A number? Who knows!
  [slot 1]: 42           <-- This looks like a number, but could be a pointer!
  [slot 2]: 0x7fff1234   <-- Same as slot 0

The GC can’t tell the difference between a pointer and an integer that happens to look like a valid address.

Two Approaches

Approach	How It Works	Pros	Cons
Conservative	Treat anything that LOOKS like a pointer as a pointer	Simple, no compiler changes	Can’t move objects, may keep garbage alive
Precise	Tell GC exactly where pointers are	Efficient, enables moving GC	Compiler must track pointer locations

We’ll use precise GC because:

It’s more educational
It’s what real language implementations use
It enables better collectors later

The Shadow Stack

The simplest way to do precise GC: maintain a linked list of stack frames.

The Concept

┌─────────────────────────────────────────────────────────────┐
│                        SHADOW STACK                         │
│                                                             │
│   Each function call pushes a "frame" that lists:           │
│     - How many roots it has                                 │
│     - Pointers to each root                                 │
│                                                             │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│   ┌─────────────────────┐                                   │
│   │ Frame for main()    │                                   │
│   │   roots: [a, c]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for example() │                                   │
│   │   roots: [x, y]     │──┐                                │
│   └─────────────────────┘  │                                │
│                            ▼                                │
│   ┌─────────────────────┐                                   │
│   │ Frame for helper()  │                                   │
│   │   roots: [temp]     │──► NULL                           │
│   └─────────────────────┘                                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

When GC runs:

Walk the linked list of frames
For each frame, mark each root
Done! All reachable objects are marked

Each root is a data pointer — the same pointer alloc() returns (pointing to the object’s data, after the header). The GC can find the header by subtracting sizeof(ObjHeader) from any data pointer. This convention is important: if you accidentally store a header pointer in the shadow stack, mark_object will go back too far and read garbage.

The Frame Structure

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs

/// A frame in the shadow stack
/// 
/// This is pushed when a function is called,
/// and popped when a function returns.
#[repr(C)]
pub struct StackFrame {
    /// Pointer to the next frame (up the call stack)
    /// null = bottom of stack
    pub next: *mut StackFrame,
    
    /// Number of roots in this frame
    pub root_count: usize,
    
    /// Array of root pointers
    /// (this is a flexible array member - actual size = root_count)
    pub roots: [*mut u8; 0],
}

impl StackFrame {
    /// Get a pointer to the roots array
    pub fn roots_ptr(&mut self) -> *mut *mut u8 {
        self.roots.as_mut_ptr()
    }
    
    /// Get a root by index
    pub fn get_root(&self, index: usize) -> *mut u8 {
        assert!(index < self.root_count);
        unsafe { *self.roots.as_ptr().add(index) }
    }
    
    /// Set a root by index
    pub fn set_root(&mut self, index: usize, value: *mut u8) {
        assert!(index < self.root_count);
        unsafe {
            *self.roots.as_mut_ptr().add(index) = value;
        }
    }
}
}

Global Stack Head

#![allow(unused)]
fn main() {
// src/runtime/shadow_stack.rs (continued)

use std::cell::RefCell;

/// The head of the shadow stack (most recent frame)
thread_local! {
    pub static SHADOW_STACK_HEAD: RefCell<*mut StackFrame> = RefCell::new(std::ptr::null_mut());
}
}

Push and Pop Operations

#![allow(unused)]
fn main() {
/// Push a new frame onto the shadow stack
/// 
/// This allocates a frame with space for `root_count` roots.
/// Returns a pointer to the roots array so the compiler can fill it in.
/// 
/// # Safety
/// The caller must call `pop_frame` before the function returns.
#[no_mangle]
pub unsafe extern "C" fn gc_push_frame(root_count: usize) -> *mut *mut u8 {
    // Calculate frame size: header + root pointers
    let frame_size = std::mem::size_of::<StackFrame>() 
                   + root_count * std::mem::size_of::<*mut u8>();
    
    // Allocate the frame
    let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
    let frame_ptr = std::alloc::alloc(layout) as *mut StackFrame;
    
    if frame_ptr.is_null() {
        panic!("Out of memory allocating stack frame!");
    }
    
    // Initialize the frame
    SHADOW_STACK_HEAD.with(|head| {
        (*frame_ptr).next = *head.borrow();
        (*frame_ptr).root_count = root_count;
        
        // Initialize all roots to null
        for i in 0..root_count {
            (*frame_ptr).set_root(i, std::ptr::null_mut());
        }
        
        // Update head
        *head.borrow_mut() = frame_ptr;
    });
    
    // Return pointer to roots array
    (*frame_ptr).roots_ptr()
}

/// Pop a frame from the shadow stack
/// 
/// # Safety
/// Must be called exactly once for each `gc_push_frame`.
#[no_mangle]
pub unsafe extern "C" fn gc_pop_frame() {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = *head.borrow();
        if frame_ptr.is_null() {
            panic!("No frame to pop!");
        }
        
        // Calculate frame size for deallocation
        let root_count = (*frame_ptr).root_count;
        let frame_size = std::mem::size_of::<StackFrame>() 
                       + root_count * std::mem::size_of::<*mut u8>();
        
        // Update head to next frame
        *head.borrow_mut() = (*frame_ptr).next;
        
        // Free the frame
        let layout = std::alloc::Layout::from_size_align(frame_size, 8).unwrap();
        std::alloc::dealloc(frame_ptr as *mut u8, layout);
    });
}

/// Set a root in the current frame
#[no_mangle]
pub unsafe extern "C" fn gc_set_root(root_index: usize, value: *mut u8) {
    SHADOW_STACK_HEAD.with(|head| {
        let frame_ptr = *head.borrow();
        if frame_ptr.is_null() {
            panic!("No frame!");
        }
        (*frame_ptr).set_root(root_index, value);
    });
}
}

Putting It Together — The Full GC

Now we have:

Allocation (Part 2)
Shadow stack for roots (this part)

Let’s write the actual gc_collect() function.

Mark Phase

#![allow(unused)]
fn main() {
// src/runtime/gc.rs (continued)

use crate::runtime::object::{ObjHeader, ObjType};
use crate::runtime::shadow_stack::{SHADOW_STACK_HEAD, StackFrame};

/// Mark all reachable objects
fn gc_mark() {
    // Walk the shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        
        while !current.is_null() {
            let frame = unsafe { &*current };
            
            // Mark each root in this frame
            for i in 0..frame.root_count {
                let root = frame.get_root(i);
                if !root.is_null() {
                    mark_object(root);
                }
            }
            
            // Move to next frame
            current = frame.next;
        }
    });
}

/// Recursively mark an object and everything it references
///
/// `obj_ptr` is a DATA pointer — the same kind `alloc()` returns
/// (pointing after the header). The shadow stack stores these
/// data pointers, not header pointers. That's why we need to
/// go back to find the header.
fn mark_object(obj_ptr: *mut u8) {
    // Get the header (before the data pointer)
    // sub(1) moves back by sizeof(ObjHeader) bytes — not 1 byte.
    // Rust pointer arithmetic is typed: sub(1) on a *mut ObjHeader
    // moves back by one ObjHeader (16 bytes), the same way that
    // ptr - 1 on a T* in C moves back by sizeof(T). The key insight
    // is that the data pointer and the header are adjacent in memory —
    // the data starts immediately after the header, so stepping back
    // one ObjHeader lands on the header. (See Part 2's layout table.)
    let header = unsafe { (obj_ptr as *mut ObjHeader).sub(1) };
    
    // Already marked? Skip (avoid infinite loops)
    if unsafe { (*header).marked } {
        return;
    }
    
    // Mark it
    unsafe { (*header).marked = true; }
    
    // Now mark any objects this one references
    // (depends on object type)
    mark_references(header);
}

/// Mark objects referenced by this object
/// (Extended in Part 5 to add the Environment arm for closures)
fn mark_references(header: *mut ObjHeader) {
    let obj_type = unsafe { (*header).obj_type };
    // The data starts right after the header — same layout as Part 2's
    // memory diagram. We go forward from the header (whereas mark_object
    // goes backward from the data pointer). add(sizeof(ObjHeader)) lands
    // at offset +16, which is exactly where alloc() returns.
    let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) };
    
    match obj_type {
        ObjType::Number | ObjType::String => {
            // These don't reference other objects.
            // You might wonder why Number is here — aren't numbers stack
            // values, not heap objects? In this model, they are. This arm
            // exists for exhaustiveness: ObjType has a Number variant, so
            // the match must handle it. In practice, the GC never traces
            // a Number object because numbers are never heap-allocated.
        }
        
        ObjType::Closure => {
            // A closure references its captured variables.
            // This layout is the simple model: a function pointer followed by a flat
            // capture array. Part 5 (Closures) revises the layout to use a function
            // index + environment pointer — and the GC trace logic changes to walk
            // the environment's slots instead of a capture array. Both approaches
            // do the same thing (walk the closure's outgoing references and mark
            // them); the difference is how those references are stored.
            // Layout (this part): [function_ptr: *mut u8 (8 bytes)][capture_count: usize (8 bytes)][capture_0][capture_1]...
            //                       offset 0                   offset 8                         offset 16
            unsafe {
                let capture_count = *(data.add(8) as *const usize);
                let captures = data.add(16) as *const *mut u8;
                
                for i in 0..capture_count {
                    let captured = *captures.add(i);
                    if !captured.is_null() {
                        mark_object(captured);
                    }
                }
            }
        }
        
        ObjType::Instance => {
            // An instance references its field values
            // Layout: [class_ptr: *mut u8 (8 bytes)][field_count: usize (8 bytes)][field_0_key][field_0_value]...
            //           offset 0                    offset 8                          offset 16
            unsafe {
                let field_count = *(data.add(8) as *const usize);
                let fields = data.add(16) as *const *mut u8;
                
                // Each field is (key_ptr, value_ptr)
                // We only trace field values, not class_ptr or field keys.
                //
                // Why skip class_ptr? The class pointer references a class object
                // that's reachable through global variables (class declarations are
                // rooted in the compiler's global scope). As long as classes are
                // always global declarations — never dynamically allocated and
                // assigned to a local variable — the class object is already live.
                // If you add dynamically-allocated classes, you'd need to trace
                // class_ptr here; otherwise, a class stored in a local could be
                // collected while instances still reference it.
                //
                // Why skip field keys? Field keys are ObjString pointers owned by
                // the class definition. They're reachable through the class (if
                // you traced class_ptr, you'd find them). Since the class is
                // always reachable (see above), the keys are transitively live.
                //
                // Part 7 (Classes and Instances) refines this model — instances
                // gain a more detailed layout with method tables and the class
                // object stores its own GC trace. The GC logic changes to walk the
                // class's method table, which is the right place to trace class_ptr
                // and keys for the general case.
                for i in 0..field_count {
                    let value = *fields.add(i * 2 + 1);
                    if !value.is_null() {
                        mark_object(value);
                    }
                }
            }
        }
    }
}
}

Sweep Phase

#![allow(unused)]
fn main() {
/// Free all unmarked objects
fn gc_sweep() {
    let mut freed_count = 0;
    
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        // Collect surviving objects, then rebuild the linked list.
        // This uses a Vec because unlinking in-place (pointer
        // manipulation while walking) is error-prone: you need a
        // `prev` pointer to splice out the current node, a special
        // case for the list head (no `prev` to update), and careful
        // handling when `prev.next` should skip the freed node.
        // The Vec approach separates the two phases (identify
        // survivors, rebuild list) and avoids the pointer
        // gymnastics entirely.
        //
        // The Vec allocation goes through Rust's system allocator
        // (the same allocator that backs Vec in any Rust program),
        // not our custom `alloc()`. So this won't trigger a
        // re-entrant GC — `maybe_gc` only fires from `alloc()`,
        // not from the system allocator. A production GC would
        // use in-place pointer manipulation to avoid even this
        // system allocation, but clarity wins in a tutorial.
        let mut keep: Vec<*mut ObjHeader> = Vec::new();
        
        while !current.is_null() {
            let header = unsafe { &*current };
            let next = header.next;
            
            if header.marked {
                // Still alive! Clear mark for next cycle
                unsafe { (*current).marked = false; }
                keep.push(current);
            } else {
                // Dead! Free it
                let total_size = std::mem::size_of::<ObjHeader>() + header.size as usize;
                let layout = std::alloc::Layout::from_size_align(total_size, 8).unwrap();
                unsafe {
                    std::alloc::dealloc(current as *mut u8, layout);
                }
                freed_count += 1;
            }
            
            current = next;
        }
        
        // Rebuild linked list from surviving objects
        for (i, &header_ptr) in keep.iter().enumerate() {
            let next = if i + 1 < keep.len() {
                keep[i + 1]
            } else {
                std::ptr::null_mut()
            };
            unsafe { (*header_ptr).next = next; }
        }
        
        *objects.borrow_mut() = keep.first().copied().unwrap_or(std::ptr::null_mut());
    });
    
    OBJECT_COUNT.with(|count| {
        *count.borrow_mut() -= freed_count;
    });
    
    eprintln!("GC: Freed {} objects, {} remaining", freed_count, 
              OBJECT_COUNT.with(|c| *c.borrow()));
}
}

The Main Entry Point

#![allow(unused)]
fn main() {
/// Run a garbage collection cycle
/// 
/// This is automatically called when allocation threshold is hit,
/// but can also be called manually.
#[no_mangle]
pub extern "C" fn gc_collect() {
    gc_mark();
    gc_sweep();
}
}

When to Collect

We need to decide: when should gc_collect() run?

Options:

Every allocation - Simple but slow
Every N allocations - Simple, tunable
When memory exceeds threshold - More sophisticated
Never automatically - Manual only (good for debugging)

We’ll use option 2 (every N allocations):

#![allow(unused)]
fn main() {
// src/runtime/gc.rs

/// Trigger collection after this many allocations
const GC_THRESHOLD: usize = 1024;

/// Check if we should collect
fn maybe_gc() {
    OBJECT_COUNT.with(|count| {
        if *count.borrow() >= GC_THRESHOLD {
            gc_collect();
        }
    });
}

/// Modified alloc to trigger GC
pub fn alloc(size: usize, obj_type: ObjType) -> *mut u8 {
    maybe_gc();  // Check before allocating
    
    // ... rest of alloc code ...
}
}

Step-by-Step Mark-Sweep Walkthrough

Let’s trace through a complete GC cycle with a concrete example:

Setup: A Running Program

fun main() {
    var a = "hello";    // Object A: String "hello"
    var b = 42;         // Not an object (number is inline)
    var c = a;          // c points to same string as a
    
    // ... some code that creates garbage ...
    var temp = "garbage";  // Object G: String "garbage"
    temp = nil;            // G is now garbage!
    
    // GC RUNS HERE
    
    print a;  // Should still work
    print c;  // Should still work
}

State Before GC

Shadow Stack:                     ← roots store DATA pointers
┌─────────────────────────────────┐    (the pointer alloc() returns)
│ Frame for main()                │
│   root[0]: 0x0110 (a)           │
│   root[1]: null (b)             │
│   root[2]: 0x0110 (c)           │
└─────────────────────────────────┘

Heap Objects:                     ← ALL_OBJECTS links HEADERS
ALL_OBJECTS → 0x0200 (G) → 0x0100 (A) → None
                                   (headers, not data pointers)

Object A:
  header (at 0x0100): { marked: false, type: String, size: 5, next: None }
  data  (at 0x0110): "hello"        ← 0x0100 + 16 bytes = 0x0110

Object G:
  header (at 0x0200): { marked: false, type: String, size: 7, next: 0x0100 }
  data  (at 0x0210): "garbage"      ← 0x0200 + 16 bytes = 0x0210

Mark Phase: Step by Step

Step 1: Walk to frame (main)
        
Step 2: Process root[0] = 0x0110 (a)
        - This is a data pointer. Get header by going back 16 bytes:
          0x0110 - 16 = 0x0100
        - (0x0100).marked = false → mark it!
        - Type is String, no references to mark
        Result: Object A is now marked

Step 3: Process root[1] = null
        - Skip null values
        Result: No action

Step 4: Process root[2] = 0x0110 (c)
        - Get header at 0x0110 - 16 = 0x0100
        - (0x0100).marked = true → already marked, skip
        Result: Already marked, no action

Step 5: No more frames
        Mark phase complete!

State After Mark Phase

Shadow Stack: (unchanged)
┌─────────────────────────────────┐
│ Frame for main()                │
│   root[0]: 0x0110               │
│   root[1]: null                 │
│   root[2]: 0x0110               │
└─────────────────────────────────┘

Heap Objects:
ALL_OBJECTS → 0x0200 (G) → 0x0100 (A) → None

Object A:
  header (at 0x0100): { marked: TRUE, type: String, size: 5, next: None }
  data  (at 0x0110): "hello"

Object G:
  header (at 0x0200): { marked: FALSE, type: String, size: 7, next: 0x0100 }
  data  (at 0x0210): "garbage"

Sweep Phase: Step by Step

Step 1: Walk to object 0x0200 (G — header address)
        - marked = false → FREE IT!
        - Free 23 bytes (16-byte header + 7 data)
        - freed_count = 1

Step 2: Walk to object 0x0100 (A — header address)
        - marked = true → KEEP IT
        - Clear mark for next cycle
        - marked = false
        - Add to keep list

Step 3: Rebuild linked list
        - keep = [0x0100]
        - ALL_OBJECTS = 0x0100
        - 0x0100.next = null

Result: 1 object freed, 1 remaining

State After Sweep Phase

Shadow Stack: (unchanged)

Heap Objects:
ALL_OBJECTS → 0x0100 (A) → null

Object A:
  header (at 0x0100): { marked: false, type: String, size: 5, next: null }
  data  (at 0x0110): "hello"

Object G: FREED! Memory returned to system.

Common Mistakes and Debugging

Mistake 1: Forgetting to Register a Root

Code:

#![allow(unused)]
fn main() {
fn broken_function() {
    let obj = alloc(16, ObjType::String);
    // Oops! Forgot gc_set_root(0, obj)
    
    gc_collect();  // BAD! obj might be freed!
    
    // Now obj points to freed memory
    use_object(obj);  // CRASH or corruption!
}
}

Fix:

#![allow(unused)]
fn main() {
fn correct_function() {
    gc_push_frame(1);  // One root slot
    let obj = alloc(16, ObjType::String);
    gc_set_root(0, obj);  // Register as root!
    
    gc_collect();  // Safe! obj is protected
    
    // ... use obj ...
    
    gc_pop_frame();  // Clean up
}
}

Mistake 2: Push/Pop Mismatch

Code:

#![allow(unused)]
fn main() {
fn broken() {
    gc_push_frame(2);
    // Oops! Forgot gc_pop_frame()
}  // Frame leaks!

fn next_call() {
    gc_push_frame(1);  // Creates NEW frame
    // But old frame is still in the list!
    // GC will mark stale pointers
}
}

Fix: Always use RAII pattern or ensure pop on all paths:

#![allow(unused)]
fn main() {
struct FrameGuard;

impl FrameGuard {
    fn new(root_count: usize) -> Self {
        // gc_push_frame returns the roots array pointer, but
        // FrameGuard doesn't save it — you'd call gc_set_root
        // instead. The return value matters when the *compiler*
        // fills in roots directly (see "From Runtime to Compiler"
        // at the end of this part).
        unsafe { gc_push_frame(root_count) };
        FrameGuard
    }
}

impl Drop for FrameGuard {
    fn drop(&mut self) {
        unsafe { gc_pop_frame() };
    }
}

fn correct() {
    let _guard = FrameGuard::new(2);
    // gc_pop_frame called automatically when _guard drops
    // Even if we panic or return early!
}
}

What About Circular References?

If object A references B and B references A, won’t mark_object loop forever? No — the marked flag prevents that:

Object A references Object B
Object B references Object A

mark(A):
  A.marked = true
  mark(B)  // A's reference
    B.marked = true
    mark(A)  // B's reference
      A.marked = true → SKIP (already marked)
    return
  return

No infinite loop!

Debugging Tips

Add these diagnostic functions:

#![allow(unused)]
fn main() {
/// Check if the GC state is consistent
pub fn gc_validate() {
    // Check shadow stack
    SHADOW_STACK_HEAD.with(|head| {
        let mut current = *head.borrow();
        let mut frame_count = 0;
        
        while !current.is_null() {
            frame_count += 1;
            if frame_count > 1000 {
                panic!("Shadow stack too deep! Infinite loop?");
            }
            current = unsafe { (*current).next };
        }
    });
    
    // Check object list
    ALL_OBJECTS.with(|objects| {
        let mut current = *objects.borrow();
        let mut obj_count = 0;
        
        while !current.is_null() {
            obj_count += 1;
            if obj_count > 100000 {
                panic!("Object list too long! Infinite loop?");
            }
            current = unsafe { (*current).next };
        }
        
        // Verify count matches
        OBJECT_COUNT.with(|count| {
            assert_eq!(*count.borrow(), obj_count, "Object count mismatch!");
        });
    });
}
}

Practice Exercises

Exercise 1: Trace the Mark Phase

Given this state:

Shadow Stack:                                    ← roots store DATA pointers
  Frame(main): roots = [0x1010, null, 0x2010]     (the pointer alloc() returns)
  Frame(helper): roots = [0x3010]

Heap:                                            ← header addresses listed first;
  0x1000: String "a" (not marked)   data at 0x1010    data is 16 bytes after
  0x2000: String "b" (not marked)   data at 0x2010    (same convention as the
  0x3000: String "c" (not marked)   data at 0x3010     walkthrough above)
  0x4000: String "garbage" (not marked)   data at 0x4010

Which objects get marked?

Click to reveal answer

0x1000: Marked (root 0x1010 in main frame → header at 0x1010 − 16 = 0x1000)
0x2000: Marked (root 0x2010 in main frame → header at 0x2010 − 16 = 0x2000)
0x3000: Marked (root 0x3010 in helper frame → header at 0x3010 − 16 = 0x3000)
0x4000: NOT marked (no root points to it) → will be freed

The mark phase walks all frames, subtracts 16 from each non-null data pointer to find the header, then marks it.

Exercise 2: Root Counting for Nested Scopes

How many root slots does this function need?

fun makeGreeter(greeting) {
    var counter = 0;
    
    fun greet(name) {
        counter = counter + 1;
        print greeting + " " + name;
    }
    
    return greet;
}

Count every variable that could hold a heap reference — parameters, locals, and anything captured by the inner function. Remember: in the numbers-only model, we root everything uniformly (see “Why We Root Numbers” in Part 4 for the full reasoning).

Click to reveal answer

makeGreeter needs 3 root slots:

greeting (parameter) — could be a string
counter (local) — a number now, but we root everything uniformly (see “Why We Root Numbers” in Part 4), and later when closures capture it, it will need to survive GC
greet (local) — the closure object itself, allocated on the heap

greet needs 1 root slot:

name (parameter) — could be a string

Wait — what about greeting and counter? Those are captured from the outer scope. But captured variables are reachable through the closure’s environment, which is itself a root in makeGreeter. The GC traces roots → objects → references, so it will find greeting and counter through the environment. A variable only needs its own root slot if it can’t be reached by tracing from another root.

The takeaway: captured variables don’t need separate root slots in the inner function — they’re already reachable through the closure object.

Exercise 3: Why Not Mark During Sweep?

Why do we separate mark and sweep into two phases? Why not free objects as soon as we find they’re unreachable?

Click to reveal answer

We need to mark ALL live objects before freeing ANY.

Consider:

Object A references Object B
We're walking the heap in order: A, then B

If we freed A immediately when we saw it was unmarked:
  - We'd lose the reference to B
  - B would become unreachable
  - But B might be reachable from a root!
  
By marking first, we ensure we've found ALL live objects.
Then sweep can safely free everything else.

Two-phase approach ensures correctness.

From Runtime to Compiler

The shadow stack API we built in this part — gc_push_frame, gc_set_root, gc_pop_frame — is designed to be called from compiled code. Right now, we’re calling it by hand from Rust. But in a real compiler, the code generator emits the calls automatically: every function gets a push_frame at entry and a pop_frame before every return, and every variable store gets a set_root.

Part 4 (MLIR Integration) shows how the MLIR code generator does exactly that — emitting lox.push_frame, lox.set_root, and lox.pop_frame operations that lower to the runtime calls we defined here. The concept is the same; the difference is who writes the calls.

Next: Part 4 — MLIR Integration — The shadow stack works, but calling gc_push_frame by hand in every function is error-prone and verbose. The compiler should do it for us. We’ll define custom MLIR operations (lox.alloc, lox.set_root) that emit GC bookkeeping during lowering — the GC becomes invisible to the programmer and automatic in the generated code.

MLIR for Lox: Part 4 — Wiring the GC into MLIR — Dialect Operations That Manage Memory for You

You’ve built a garbage collector from scratch. It allocates objects, tracks roots via a shadow stack, and frees unreachable memory. It works — but it’s conceptual code in Rust. The Lox compiler doesn’t know about it yet.

This part bridges the gap. We’ll define a custom MLIR dialect with operations like lox.alloc and lox.set_root, wire those operations into the code generator from Part 1, and lower them to calls into the GC runtime. By the end, the MLIR code generator will produce IR that automatically manages memory through our GC — no manual malloc or free in sight.

The Runtime Module

First, let’s organize our runtime into a proper Rust module:

#![allow(unused)]
fn main() {
// src/runtime/mod.rs

pub mod object;
pub mod shadow_stack;
pub mod gc;

// Re-export the public API
pub use object::{ObjHeader, ObjType};
pub use gc::{alloc, gc_collect};
pub use shadow_stack::{gc_push_frame, gc_pop_frame, gc_set_root};
}

And the Cargo.toml needs to build it as a static library:

# Cargo.toml

[package]
name = "lox-runtime"
version = "0.1.0"
edition = "2021"

[lib]
name = "lox_runtime"
crate-type = ["staticlib", "cdylib"]

Why both? The static library (.a) is what the linker uses when we compile the final binary — clang output.o -llox_runtime. The dynamic library (.so) is needed if you want to load the runtime at runtime with dlopen, which some JIT setups require. We build both so the same compiled runtime works for either linking approach.

The Lox MLIR Dialect

We need MLIR operations that correspond to our GC operations:

Operation	Meaning	LLVM Lowering
`lox.alloc`	Allocate a heap object	Call `lox_runtime::alloc`
`lox.set_root`	Register a value as a GC root	Store in shadow stack slot
`lox.push_frame`	Push a stack frame	Call `gc_push_frame`
`lox.pop_frame`	Pop a stack frame	Call `gc_pop_frame`

Defining Operations in Melior

Melior doesn’t use TableGen like C++ MLIR. We define operations directly in Rust:

#![allow(unused)]
fn main() {
// src/codegen/lox_dialect.rs

use melior::{
    Context, Location,
    dialect::Dialect,
    ir::{Operation, OperationBuilder, OperationLike, Region, RegionLike, Type, Value},
};

/// The Lox dialect namespace
pub const DIALECT_NAME: &str = "lox";

/// Operation names
pub mod ops {
    // These four are used in this part:
    pub const ALLOC: &str = "lox.alloc";
    pub const PUSH_FRAME: &str = "lox.push_frame";
    pub const POP_FRAME: &str = "lox.pop_frame";
    pub const SET_ROOT: &str = "lox.set_root";
    // These two are defined here for completeness.
    // Part 6 (Complete Reference) shows them in action — they
    // load and store values from heap-allocated environments
    // and instance fields, which don't exist yet in the
    // numbers-only model.
    pub const LOAD: &str = "lox.load";
    pub const STORE: &str = "lox.store";
}

/// Create a lox.alloc operation
/// 
/// This allocates a heap object of the given type.
/// Returns a pointer to the object data (after the header).
pub fn create_alloc<'c>(
    context: &'c Context,
    obj_type: u8,      // ObjType enum value
    size: usize,       // Size of object data in bytes
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    // The lox.alloc operation takes:
    //   - type: i8 (the ObjType enum)
    //   - size: i64 (allocation size)
    // And returns:
    //   - ptr: !llvm.ptr (pointer to object data)
    
    OperationBuilder::new(ops::ALLOC, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "obj_type"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i8").unwrap(), obj_type as i64).into()),
            (melior::ir::Identifier::new(context, "size"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), size as i64).into()),
        ])
        .add_results(&[Type::parse(context, "!llvm.ptr").unwrap()])
        .build()
        .unwrap()
}

/// Create a lox.push_frame operation
/// 
/// This pushes a new shadow stack frame with the given number of root slots.
/// Returns a pointer to the roots array.
pub fn create_push_frame<'c>(
    context: &'c Context,
    root_count: usize,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::PUSH_FRAME, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "root_count"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), root_count as i64).into()),
        ])
        .add_results(&[Type::parse(context, "!llvm.ptr").unwrap()])
        .build()
        .unwrap()
}

/// Create a lox.pop_frame operation
pub fn create_pop_frame<'c>(
    context: &'c Context,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::POP_FRAME, location)
        .build()
        .unwrap()
}

/// Create a lox.set_root operation
/// 
/// Sets a root in the current shadow stack frame.
pub fn create_set_root<'c>(
    context: &'c Context,
    root_index: usize,
    value: Value<'c, 'c>,
    location: Location<'c>,
) -> melior::ir::Operation<'c> {
    OperationBuilder::new(ops::SET_ROOT, location)
        .add_attributes(&[
            (melior::ir::Identifier::new(context, "index"),
                melior::ir::attribute::IntegerAttribute::new(
                    Type::parse(context, "i64").unwrap(), root_index as i64).into()),
        ])
        .add_operands(&[value])
        .build()
        .unwrap()
}
}

Lowering to LLVM

Our lox.* operations can’t be executed directly. MLIR doesn’t know what lox.alloc means — it’s a symbol we invented. To produce runnable code, every lox.* operation has to become something LLVM understands: function calls, memory operations, and LLVM dialect instructions.

This is called lowering, and it happens in stages:

Lox dialect          Standard MLIR dialects        LLVM dialect
─────────────    →    ──────────────────────    →    ────────────
lox.alloc              func.call @lox_alloc          llvm.call @lox_alloc
lox.push_frame         func.call @gc_push_frame      llvm.call @gc_push_frame
lox.set_root           func.call @gc_set_root        llvm.call @gc_set_root
arith.addf             arith.addf                    llvm.fadd
scf.if                 scf.if                        cf.cond_br + llvm.blockaddr
scf.if                 cf.cond_br (after scf→cf)     cf.cond_br (after cf→llvm)

Each arrow is a conversion pass. The Lox dialect lowers first (our custom pass). Then the standard MLIR dialects lower in sequence: scf → cf, cf → llvm, arith → llvm, func → llvm. The order matters — scf.if can’t lower directly to LLVM. It first becomes cf.cond_br (a branch), and then cf.cond_br becomes an LLVM branch.

Why not lower everything in one pass? Because MLIR dialects are designed to compose. The arith.addf inside a scf.if is the same operation regardless of whether it’s inside a conditional. By lowering one dialect at a time, each pass only needs to know about two things: the source dialect and the target dialect. A single monolithic pass would need to understand every dialect’s semantics — and would break the moment anyone adds a new dialect.

Now let’s see the code that does it:

#![allow(unused)]
fn main() {
// src/codegen/lowering.rs

use melior::{
    Context, Location, PassManager,
    ir::{Block, BlockLike, Module, Operation, OperationLike, Region, RegionLike, Value},
    dialect::{arith, func, llvm},
    pass::Pass,
};

// ========================================================================
// Constant helpers
// ========================================================================
// These create `arith.constant` operations for i8 and i64 values. They
// extract the result immediately without appending the operation to a
// block — the resulting Value is used as an *argument* to `func::call`
// operations that the lowering functions append later. This works in
// Melior because the Operation is an owned value and its results are
// accessible right away, even before the operation is placed in a block.
// (The `func::call` operations in the lowering functions below *are*
// appended to the block first, then their results are extracted — that's
// the append-then-extract pattern. The constant helpers skip the append
// step because they produce inputs, not standalone operations.)

/// Create an `arith.constant` with an i8 value.
fn create_const_i8<'c>(context: &'c Context, value: i8) -> melior::ir::Value<'c, 'c> {
    let location = Location::unknown(context);
    let op = arith::constant(
        context,
        melior::ir::attribute::IntegerAttribute::new(
            melior::ir::Type::parse(context, "i8").unwrap(),
            value as i64,
        ).into(),
        location,
    );
    op.result(0).unwrap().into()
}

/// Create an `arith.constant` with an i64 value.
fn create_const_i64<'c>(context: &'c Context, value: i64) -> melior::ir::Value<'c, 'c> {
    let location = Location::unknown(context);
    let op = arith::constant(
        context,
        melior::ir::attribute::IntegerAttribute::new(
            melior::ir::Type::parse(context, "i64").unwrap(),
            value,
        ).into(),
        location,
    );
    op.result(0).unwrap().into()
}

/// Lower lox.alloc to a runtime call
fn lower_alloc(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    // Get attributes
    // In melior 0.27, use IntegerAttribute::try_from to convert the
    // generic Attribute, then call .value() to get the i64.
    let obj_type = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("obj_type").unwrap()
    ).unwrap().value() as i64;
    let size = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("size").unwrap()
    ).unwrap().value() as i64;
    
    // Create constants for arguments
    let obj_type_val = create_const_i8(context, obj_type as i8);
    let size_val = create_const_i64(context, size);
    
    // Call lox_runtime_alloc(size, obj_type) — size first, obj_type second
    // This matches the Rust runtime's parameter order: alloc(size: usize, obj_type: ObjType)
    let call = block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "lox_runtime_alloc"),
        &[size_val, obj_type_val],
        &[Type::parse(context, "!llvm.ptr").unwrap()],
        location,
    ));
    
    // Replace uses of the original result with the call's result
    let new_result = call.result(0).unwrap();
    
    // (In real code, we'd track and replace all uses of the lox.alloc result)
}

/// Lower lox.push_frame to a runtime call
fn lower_push_frame(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    let root_count = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("root_count").unwrap()
    ).unwrap().value() as i64;
    
    let count_val = create_const_i64(context, root_count);
    
    // Call gc_push_frame(root_count)
    let call = block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "gc_push_frame"),
        &[count_val],
        &[Type::parse(context, "!llvm.ptr").unwrap()],
        location,
    ));
    
    // Replace uses of the original lox.push_frame result
    let new_result = call.result(0).unwrap();
}

/// Lower lox.pop_frame to a runtime call
fn lower_pop_frame(op: &Operation, block: &mut Block, context: &Context) {
    let location = op.location();
    
    // Call gc_pop_frame()
    block.append_operation(func::call(
        context,
        melior::ir::attribute::FlatSymbolRefAttribute::new(context, "gc_pop_frame"),
        &[],
        &[],
        location,
    ));
}

/// Lower lox.set_root to a store instruction
fn lower_set_root(op: &Operation, block: &mut Block, context: &Context, frame_ptr: Value<'c, 'c>) {
    let location = op.location();
    
    let root_index = melior::ir::attribute::IntegerAttribute::try_from(
        op.attribute("index").unwrap()
    ).unwrap().value() as i64;
    let value = op.operand(0).unwrap();
    
    // Compute the address: frame_ptr[root_index]
    //
    // The MLIR-level lox.set_root can take either f64 or !llvm.ptr values
    // (our representation stores both numbers and heap pointers as f64).
    // At the LLVM level, the shadow stack is an array of pointers (!llvm.ptr),
    // so the element type for GEP is !llvm.ptr. When the root value is f64,
    // the conversion is a two-step process: bitcast f64 → i64 (reinterpret
    // the bits as an integer), then inttoptr i64 → !llvm.ptr (convert the
    // integer to a pointer). LLVM's bitcast cannot convert f64 → ptr
    // directly. On a 64-bit target the byte offsets are the same either
    // way (8 bytes per pointer, 8 bytes per f64), but the semantics
    // matter for 32-bit targets or if the value type ever changes.
    //
    // Note: the Rust code below calls `llvm::store(value, ...)` without
    // an explicit bitcast or inttoptr. Whether this works depends on
    // the Melior version — some versions of `llvm::store` accept
    // mismatched operand types and inject the conversion during
    // lowering, while others require the caller to produce the correct
    // type. The MLIR assembly above shows the explicit two-step
    // conversion that works in all versions. If `llvm::store` in your
    // version rejects the type mismatch, add the bitcast and inttoptr
    // before the store, following the pattern in the MLIR assembly.
    // In opaque-pointer mode (MLIR 17+), GEP carries the element type as
    // a parameter — no bitcast needed for the GEP itself.
    let index_const = create_const_i64(context, root_index);
    let root_ptr = block.append_operation(llvm::getelementptr(
        context,
        Type::parse(context, "!llvm.ptr").unwrap(), // pointer type
        Type::parse(context, "!llvm.ptr").unwrap(),  // element type — shadow stack holds pointers
        frame_ptr,
        &[index_const],
        location,
    ));
    let root_ptr_val = root_ptr.result(0).unwrap().into();
    
    // Store the value at the computed address
    block.append_operation(llvm::store(value, root_ptr_val, location));
}
}

Code Generation for Functions

Now we modify our function code generator to use the shadow stack:

#![allow(unused)]
fn main() {
// src/codegen/generator.rs (modified)
use std::collections::HashMap;

impl<'c> CodeGenerator<'c> {
    
    /// Compile a function with GC support
    fn compile_function(&self, func: &FunctionStmt, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = self.loc(func.location);
        
        // === STEP 1: Count roots ===
        // Roots = parameters + local variables
        let root_count = self.count_roots(func);
        
        // === STEP 2: Create function type ===
        let float_type = Type::float64(self.context);
        let param_types: Vec<Type> = func.params.iter().map(|_| float_type).collect();
        let function_type = FunctionType::new(self.context, &param_types, &[float_type]);
        
        // === STEP 3: Create function body ===
        let block = Block::new(
            &param_types.iter().map(|&t| (t, location)).collect::<Vec<_>>()
        );
        
        // === STEP 4: Push shadow stack frame ===
        let push_frame = block.append_operation(
            create_push_frame(self.context, root_count, location)
        );
        
        // The result is a pointer to the roots array
        let roots_ptr = push_frame.result(0).unwrap().into();
        
        // === STEP 5: Store parameters as roots ===
        for (i, param_name) in func.params.iter().enumerate() {
            let arg = block.argument(i).unwrap();
            
            // Store the parameter value in roots[i]
            // (In a real implementation, we'd handle this based on type)
            self.set_root(&block, i, arg.into(), location);
            
            // Also track in our local variables map
            // (Note: variables are passed as a parameter, not stored in self.
            //  See Part 1's "Why Separate Context from State?" section.)
            variables.insert(param_name.clone(), arg.into());
        }
        
        // === STEP 6: Compile function body ===
        // Note: we pass the block as a parameter (not stored in self),
        // following the block-ownership pattern from Part 1.
        // Note: In a complete compiler, each var declaration in the body
        // would consume the next root slot (starting from params.len()).
        // For now, the root_count passed to push_frame accounts for
        // parameters + all local variables, so the frame is large enough.
        
        for stmt in &func.body {
            self.compile_statement(stmt, &block, variables);
        }
        
        // === STEP 7: Add implicit return if needed ===
        // ...
        
        // === STEP 8: Pop shadow stack frame ===
        let pop_frame = create_pop_frame(self.context, location);
        block.append_operation(pop_frame);
        
        // === STEP 9: Create the function ===
        let region = Region::new();
        region.append_block(block);  // block is consumed here
        self.module.body().append_operation(func::func(
            self.context,
            StringAttribute::new(self.context, &func.name),
            TypeAttribute::new(function_type.into()),
            region,
            &[],
            location,
        ));
    }
    
    /// Count the total number of roots needed for a function
    fn count_roots(&self, func: &FunctionStmt) -> usize {
        // Parameters are roots
        let mut count = func.params.len();
        
        // Local variables are roots
        for stmt in &func.body {
            count += self.count_roots_in_stmt(stmt);
        }
        
        count
    }
    
    /// Count roots introduced by a statement
    fn count_roots_in_stmt(&self, stmt: &Stmt) -> usize {
        match stmt {
            Stmt::Var(v) => 1,  // Each var declaration is a root
            Stmt::Block(b) => b.statements.iter()
                .map(|s| self.count_roots_in_stmt(s))
                .sum(),
            Stmt::If(i) => {
                i.then_branch.iter().map(|s| self.count_roots_in_stmt(s)).sum::<usize>() +
                i.else_branch.iter().map(|s| self.count_roots_in_stmt(s)).sum::<usize>()
            }
            Stmt::While(w) => w.body.iter().map(|s| self.count_roots_in_stmt(s)).sum(),
            // Stmt::Function is handled as 0 here because closures
            // aren't in the numbers-only model yet. When Part 5 adds
            // closures, a Function declaration creates a closure value
            // that needs a root slot — the count becomes 1 for each
            // nested function.
            _ => 0,
        }
    }
    
    /// Set a root in the shadow stack
    fn set_root(&self, block: &Block<'c>, index: usize, value: Value<'c, 'c>, location: Location<'c>) {
        let set_root_op = create_set_root(self.context, index, value, location);
        block.append_operation(set_root_op);
    }
}
}

The Lowering Pass

We need a pass that converts lox.* operations to LLVM calls. This pass has two parts: a pass manager that runs conversions in the right order, and a custom Lox-to-LLVM pass that handles our dialect-specific operations.

There’s a design tension in Melior 0.27 that shows up here. The individual lowering functions we wrote earlier — lower_alloc, lower_push_frame, etc. — need two things: the Block to append new operations to, and the Context to create types and attributes. The lowering functions take both as parameters. But Operation::walk() gives us each operation in isolation. In Melior 0.27, there’s no op.parent_block() or op.context() — you can’t navigate from an operation back to its block or context inside the walk closure.

The fix is to walk blocks, not operations. Each Block knows its parent Region, and each Region knows its parent Operation. From a block, you have access to the Context (via the block’s parent chain), and you can iterate the block’s operations directly. This is the pattern that compiles and runs:

#![allow(unused)]
fn main() {
// src/codegen/lowering_pass.rs

use melior::{
    Context, Module, PassManager,
    pass::Pass,
};

// Standard Melior conversion passes — these come from melior::pass::conversion, not our local module.
use melior::pass::conversion::{
    create_scf_to_control_flow,
    create_control_flow_to_llvm,
    create_arith_to_llvm,
    create_func_to_llvm,
};

/// Create the lowering pass manager
pub fn create_lowering_pass_manager(context: &Context) -> PassManager {
    let pm = PassManager::new(context);
    
    // Lower Lox dialect to LLVM (our custom pass, defined below)
    pm.add_pass(convert_lox_to_llvm());
    
    // Lower standard dialects to LLVM (Melior built-in passes)
    pm.add_pass(create_scf_to_control_flow());
    pm.add_pass(create_control_flow_to_llvm());
    pm.add_pass(create_arith_to_llvm());
    pm.add_pass(create_func_to_llvm());
    
    pm
}

/// Our custom Lox-to-LLVM pass
mod pass {
    use super::*;
    
    pub fn convert_lox_to_llvm() -> Pass {
        Pass::from_info("lox-to-llvm", |module: &Module| {
            let context = module.context();
            
            // Walk each block in the module. Walking blocks (instead of
            // operations) gives us access to the Block and Context that the
            // lowering functions need. Each block is mutable — we can append
            // replacement operations directly.
            for region in module.as_operation().regions() {
                for mut block in region.blocks() {
                    let mut current_frame_ptr: Option<Value> = None;
                    let mut ops_to_process: Vec<Operation> = Vec::new();
                    
                    // Collect Lox operations first, then process them.
                    // We can't modify the block while iterating it, so we
                    // gather the operations first and handle them after.
                    for op in block.operations() {
                        let name = op.name().to_string();
                        if name.starts_with("lox.") {
                            ops_to_process.push(op.clone());
                        }
                    }
                    
                    for op in ops_to_process {
                        match op.name() {
                            "lox.alloc" => {
                                lower_alloc(&op, &mut block, &context);
                            }
                            "lox.push_frame" => {
                                lower_push_frame(&op, &mut block, &context);
                                // ⚠️ Simplification: op.result(0) returns the
                                // result of the *original* lox.push_frame operation,
                                // which lower_push_frame replaces with a func.call
                                // to @gc_push_frame. In a production compiler,
                                // current_frame_ptr should point to the *new*
                                // call's result — but since our gc_push_frame
                                // returns the frame pointer in the same way,
                                // the values are equivalent here.
                                current_frame_ptr = Some(op.result(0).unwrap().into());
                            }
                            "lox.pop_frame" => {
                                lower_pop_frame(&op, &mut block, &context);
                            }
                            "lox.set_root" => {
                                let frame_ptr = current_frame_ptr
                                    .expect("set_root without a preceding push_frame");
                                lower_set_root(&op, &mut block, &context, frame_ptr);
                            }
                            _ => {}
                        }
                    }
                }
            }
        })
    }
}
}

Linking Everything Together

Now we need to link the MLIR-generated code with our Rust runtime:

Step 1: Compile the Runtime

cd lox-mlir
cargo build --release

This produces target/release/liblox_runtime.a (static lib) and liblox_runtime.so (dynamic lib).

Step 2: Generate MLIR

#![allow(unused)]
fn main() {
// Compile Lox source to MLIR
let mlir = compile_to_mlir(source)?;
println!("{}", mlir);
}

Output:

module {
  func.func @example() -> f64 {
    %0 = lox.push_frame root_count = 3 : !llvm.ptr
    
    // Allocate a string object
    %1 = lox.alloc obj_type = 1, size = 5 : !llvm.ptr
    
    // Store as root 0
    lox.set_root index = 0, %1 : !llvm.ptr
    
    // ... function body ...
    
    lox.pop_frame
    func.return %result : f64
  }
}

Step 3: Lower to LLVM IR

# Lower MLIR to LLVM IR
mlir-translate output.mlir --mlir-to-llvmir -o output.ll

Step 4: Compile to Object File

# Compile LLVM IR to object file
llc output.ll -filetype=obj -o output.o

# Link with runtime
clang output.o -L./target/release -llox_runtime -o output

Step 5: Run

./output

Why We Root Numbers

You’ll notice something odd in the MLIR examples below: we register f64 values as GC roots, even though numbers aren’t heap objects. Why?

In the numbers-only model we’re using right now, every value is an f64 — no heap objects exist, and the GC has nothing to trace. So registering numbers as roots doesn’t do anything useful yet. But we do it anyway for two reasons:

The code generator doesn’t know which variables will hold heap objects. Right now, every variable is a number. But when we add strings, closures, and instances in Part 7, some variables will hold f64-sized heap pointers instead of numbers. The lox.set_root operation treats both identically — it stores the raw bits in the shadow stack. The GC will need to distinguish them later, but the code generator doesn’t need to.

It keeps the code generator simple now and correct later. If we skip rooting for number variables, we’d need to go back and add it for every variable type that might hold a pointer. That’s error-prone — miss one and the GC frees a live object. By rooting every variable uniformly, we get correctness by default. The cost is that the GC does an unnecessary type check on number roots at collection time. For a tutorial interpreter, that tradeoff is fine.

When Part 7 introduces the tagged union model, the GC walks the shadow stack and checks each root: is this a plain number (skip it) or a heap pointer (trace it)? The shadow stack doesn’t know in advance — it stores raw f64 bits — so the GC distinguishes them at runtime using the tagged union’s type tag. (Part 9 discusses an alternative that separates pointers from numbers at the type level, so the GC can skip roots that are provably not pointers.)

With that out of the way, let’s see the full pipeline in action.

Complete MLIR Example

Let’s trace through the complete compilation of a simple Lox function:

Input: Lox Source

fun add(a, b) {
    return a + b;
}

print add(1, 2);

Stage 1: MLIR (Lox Dialect)

module {
  // The 'add' function
  func.func @add(%arg0: f64, %arg1: f64) -> f64 
      attributes {sym_name = "add"} 
  {
    // Push shadow stack frame with 2 roots (for parameters)
    %frame = lox.push_frame root_count = 2 : !llvm.ptr
    
    // Register parameters as roots
    lox.set_root index = 0, %arg0 : f64
    lox.set_root index = 1, %arg1 : f64
    
    // Every Lox value is f64 — numbers and heap pointers alike.
    // The GC checks at runtime whether each root is a pointer or a plain
    // number. (See "Why We Root Numbers" above for the full explanation.)
    
    // The addition
    %sum = arith.addf %arg0, %arg1 : f64
    
    // Pop frame before return
    lox.pop_frame
    
    // Return
    func.return %sum : f64
  }
  
  // The main entry point
  func.func @main() -> i32 {
    %frame = lox.push_frame root_count = 0 : !llvm.ptr
    
    // Call add(1, 2)
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %result = func.call @add(%one, %two) : (f64, f64) -> f64
    
    // Print the result
    // (simplified - would call a print runtime function)
    
    lox.pop_frame
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

Stage 2: After Lowering (LLVM Dialect)

module {
  llvm.func @add(%arg0: f64, %arg1: f64) -> f64 {
    // Constants for root count and slot indices
    %c2_i64 = llvm.mlir.constant(2 : i64) : i64
    %c0_i64 = llvm.mlir.constant(0 : i64) : i64
    %c1_i64 = llvm.mlir.constant(1 : i64) : i64

    // gc_push_frame(2)
    %frame = llvm.call @gc_push_frame(%c2_i64) : (i64) -> !llvm.ptr
    
    // gc_set_root(0, arg0) - converted to bitcast + inttoptr + store
    // The frame is an opaque !llvm.ptr. We use getelementptr to compute
    // the address of each root slot. In opaque-pointer mode (MLIR 17+),
    // GEP carries the element type as a parameter — no bitcast needed
    // for the GEP itself. But the shadow stack stores pointers, and arg0
    // is an f64. LLVM's bitcast cannot convert f64 → ptr directly (it only
    // works between pointer types, or vectors of the same size). Instead,
    // we do a two-step conversion: bitcast f64 → i64 (reinterpret the bits
    // as an integer, which is valid since both are 8-byte first-class
    // types), then inttoptr i64 → !llvm.ptr (convert the integer to a
    // pointer). The bits are the same through both steps; the type changes.
    // (When the root value is already !llvm.ptr — a heap object, not a
    // number — no conversion is needed.)
    %root0_ptr = llvm.getelementptr %frame[%c0_i64] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.ptr
    %arg0_as_i64 = llvm.bitcast %arg0 : f64 to i64
    %arg0_as_ptr = llvm.inttoptr %arg0_as_i64 : i64 to !llvm.ptr
    llvm.store %arg0_as_ptr, %root0_ptr : !llvm.ptr, !llvm.ptr
    
    // gc_set_root(1, arg1)
    %root1_ptr = llvm.getelementptr %frame[%c1_i64] : (!llvm.ptr, i64) -> !llvm.ptr, !llvm.ptr
    %arg1_as_i64 = llvm.bitcast %arg1 : f64 to i64
    %arg1_as_ptr = llvm.inttoptr %arg1_as_i64 : i64 to !llvm.ptr
    llvm.store %arg1_as_ptr, %root1_ptr : !llvm.ptr, !llvm.ptr
    
    // Addition
    %sum = llvm.fadd %arg0, %arg1 : f64
    
    // gc_pop_frame()
    llvm.call @gc_pop_frame() : () -> ()
    
    llvm.return %sum : f64
  }
  
  llvm.func @main() -> i32 {
    // ... similar lowering ...
    llvm.return %c0_i32 : i32
  }
  
  // External declarations for runtime
  llvm.func @gc_push_frame(i64) -> !llvm.ptr
  llvm.func @gc_pop_frame() -> ()
  llvm.func @lox_runtime_alloc(i64, i8) -> !llvm.ptr
}

Stage 3: LLVM IR

define double @add(double %0, double %1) {
entry:
  ; Push shadow stack frame
  %frame = call ptr @gc_push_frame(i64 2)
  
  ; Store parameters as roots
  ; With opaque pointers (LLVM 15+), LLVM IR allows storing a double
  ; directly into a ptr-typed slot — the store instruction specifies
  ; the value type, and the 8-byte double fits in the 8-byte pointer
  ; slot. The MLIR LLVM dialect is stricter: it requires matching types
  ; for the store operand, so a two-step conversion is needed at the
  ; MLIR level (bitcast f64 → i64, then inttoptr i64 → !llvm.ptr).
  ; LLVM IR handles the type coercion implicitly in the store. The
  ; conversion steps from the MLIR stage vanish here because LLVM's
  ; opaque-pointer model (LLVM 15+) treats store as a typed operation
  ; — the value type is carried by the instruction, not the pointer
  ; — so the store doesn't need the pointer type to match the value
  ; type. This is the same underlying mechanism; the MLIR LLVM
  ; dialect doesn't relax the type rules as far as LLVM IR
  ; does.
  %root0_ptr = getelementptr ptr, ptr %frame, i64 0
  store double %0, ptr %root0_ptr
  
  %root1_ptr = getelementptr ptr, ptr %frame, i64 1
  store double %1, ptr %root1_ptr
  
  ; Addition
  %sum = fadd double %0, %1
  
  ; Pop frame
  call void @gc_pop_frame()
  
  ret double %sum
}

define i32 @main() {
entry:
  %result = call double @add(double 1.0, double 2.0)
  ; ... print result ...
  ret i32 0
}

; External runtime functions
declare ptr @gc_push_frame(i64)
declare void @gc_pop_frame()
declare ptr @lox_runtime_alloc(i64, i8)

Stage 4: Assembly (x86-64, simplified)

add:
    push   rbp
    mov    rbp, rsp
    
    ; gc_push_frame(2)
    mov    rdi, 2
    call   gc_push_frame
    ; gc_push_frame returns the frame pointer in rax
    ; (the call already placed it there — no move needed)
    
    ; Store roots (simplified)
    movsd  [rax], xmm0       ; store %0
    movsd  [rax + 8], xmm1   ; store %1
    
    ; Addition
    addsd  xmm0, xmm1        ; %sum = %0 + %1
    
    ; gc_pop_frame()
    call   gc_pop_frame
    ; NOTE: In a real compiler, gc_pop_frame is a call that may clobber
    ; xmm0 (a caller-saved register on x86-64). Since addsd already put the
    ; result in xmm0, the call would destroy it. The fix: either save the
    ; result to a callee-saved register before the call, or reorder the
    ; call before the addition. The reorder is safe here because addsd on
    ; f64 values can't trigger GC — no allocation occurs, so the roots
    ; in the popped frame are already past their last use. A production
    ; compiler's register allocator would handle this automatically.
    
    ; Return
    pop    rbp
    ret

main:
    ; ...
    movsd  xmm0, 1.0
    movsd  xmm1, 2.0
    call   add
    ; ...

Handling Different Object Types

Part 2 defined ObjType discriminants: String = 1, Environment = 2, Closure = 3. The lox.alloc operation takes an obj_type attribute matching one of these values. Let’s see how we generate code for different object types:

String Allocation

var s = "hello";

Generated MLIR:

// Allocate string object (ObjType.String = 1, size = 5)
%str = lox.alloc obj_type = 1, size = 5 : !llvm.ptr

// Initialize string data
// (would fill in length, hash, and characters)

// Store as root
lox.set_root index = 0, %str : !llvm.ptr

Closure Allocation

fun makeCounter() {
    var count = 0;
    fun counter() {
        count = count + 1;
        return count;
    }
    return counter;
}

Generated MLIR (simplified):

func.func @makeCounter() -> !llvm.ptr {
    // 1. Allocate environment for captured 'count'
    %env = lox.alloc obj_type = 2, size = 16 : !llvm.ptr
    // Environment layout: [enclosing, count_slots...]
    
    // 2. Initialize count = 0 in environment
    // (store at offset)
    
    // 3. Allocate closure
    %closure = lox.alloc obj_type = 3, size = 16 : !llvm.ptr
    // Closure layout: [function_index, environment_ptr]
    
    // 4. Link closure to environment
    // (store function index and env pointer)
    
    func.return %closure : !llvm.ptr
}

Practice Exercises

Exercise 1: Generate MLIR for a Function

Write the MLIR (Lox dialect) for this Lox function:

fun multiply(x, y) {
    var result = x * y;
    return result;
}

Click to reveal answer

func.func @multiply(%arg0: f64, %arg1: f64) -> f64 {
    // 3 roots: x, y, result
    %frame = lox.push_frame root_count = 3 : !llvm.ptr
    
    // Register parameters
    lox.set_root index = 0, %arg0 : f64
    lox.set_root index = 1, %arg1 : f64
    
    // (See "Why We Root Numbers" above for the full explanation.)
    
    // Compute x * y
    %product = arith.mulf %arg0, %arg1 : f64
    
    // Register 'result' as a root (it's a plain number in the
    // numbers-only model — no heap allocation needed, but we still
    // root it so the GC knows about it)
    lox.set_root index = 2, %product : f64
    
    lox.pop_frame
    func.return %product : f64
}

Note: In the numbers-only model, local variables that hold numbers don’t need heap allocation — we root the f64 value directly in the shadow stack. Object-typed variables (strings, closures) would need lox.alloc. (See “Why We Root Numbers” above for why this works.)

Exercise 2: Trace the Compilation Pipeline

Given this Lox code:

var x = 1;
var y = 2;
print x + y;

What does each stage produce?

Click to reveal answer

MLIR (Lox Dialect):

func.func @main() -> i32 {
    %frame = lox.push_frame root_count = 2 : !llvm.ptr
    
    // var x = 1
    %x_val = arith.constant 1.0 : f64
    lox.set_root index = 0, %x_val : f64
    
    // var y = 2
    %y_val = arith.constant 2.0 : f64
    lox.set_root index = 1, %y_val : f64
    
    // (See "Why We Root Numbers" above for the full explanation.)
    
    // x + y
    %sum = arith.addf %x_val, %y_val : f64
    
    // print (simplified)
    // call print_runtime(%sum)
    
    lox.pop_frame
    func.return %c0_i32 : i32
}

After Lowering: The lox.push_frame becomes func.call @gc_push_frame, etc.

LLVM IR: Standard LLVM with calls to runtime functions.

Assembly: Native x86-64 or ARM code.

Exercise 3: Why Separate Dialects?

Why do we have a separate lox dialect instead of generating LLVM IR directly?

Click to reveal answer

Abstraction Level: Lox dialect captures Lox semantics (allocation, GC roots) at a high level. We can optimize at this level before lowering.
Target Independence: MLIR can target WebAssembly, GPUs, or other backends. We don’t lock ourselves into LLVM.
Debugging: We can inspect the IR at each stage (Lox dialect → LLVM dialect → LLVM IR).
Custom Optimizations: We can write passes that understand Lox semantics (e.g., eliminate unnecessary allocations).
Incremental Lowering: We can do some optimizations in the Lox dialect, then lower to LLVM for target-specific work.

Next: Part 5 — Closures — Our GC can allocate and collect objects. Our code generator can emit MLIR for arithmetic and control flow. But there’s a Lox feature we’ve been ignoring: functions that capture variables from their enclosing scope. Closures need heap-allocated environments and an indirect call pattern — and they change everything about how the GC finds roots.

MLIR for Lox: Part 5 — Closures — When a Variable Outlives Its Stack Frame

You’ve built a garbage collector that tracks roots on the stack and frees everything else. It works — until a function returns another function that references a local variable. That variable should be dead (the stack frame is gone), but it isn’t (the returned closure still uses it). This is the closure problem, and it’s the hardest part of garbage collection.

The Closure Problem

Consider this Lox code:

fun makeCounter() {
    var count = 0;
    
    fun counter() {
        count = count + 1;  // 'count' is from makeCounter's scope!
        return count;
    }
    
    return counter;
}

var c = makeCounter();  // makeCounter returns, but 'count' must live on!
print c();  // 1
print c();  // 2
print c();  // 3

The problem:

makeCounter() returns
Its stack frame is destroyed
But count must still exist because counter captures it!
Where does count live?

Stack vs Heap

┌─────────────────────────────────────────────────────────────┐
│ WRONG: count on the stack                                   │
│                                                             │
│   makeCounter() called                                      │
│   ┌─────────────────────┐                                   │
│   │ count = 0           │  ← on the stack                   │
│   │ return counter      │                                   │
│   └─────────────────────┘                                   │
│          ↓                                                  │
│   makeCounter() returns                                     │
│   ┌─────────────────────┐                                   │
│   │ (freed!)            │  ← count is gone!                 │
│   └─────────────────────┘                                   │
│          ↓                                                  │
│   c() is called                                             │
│   counter tries to access count... CRASH!                   │
│                                                             │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ RIGHT: count on the heap (in a closure environment)         │
│                                                             │
│   makeCounter() called                                      │
│   ┌─────────────────────┐      ┌─────────────────────┐      │
│   │ env = alloc()       │──────│ count = 0           │      │
│   │ return counter      │      │ (on the heap!)      │      │
│   └─────────────────────┘      └─────────────────────┘      │
│          ↓                           ↑                      │
│   makeCounter() returns              │                      │
│   (stack frame freed)                │                      │
│          ↓                           │                      │
│   c() is called ─────────────────────┘                      │
│   counter accesses count via env pointer                    │
│   count is still alive!                                      │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Closure Environments

A closure environment is a heap-allocated structure that holds captured variables.

Structure

Closure Environment:
┌────────────────────────────────────┐
│ Header (ObjHeader)                 │
│   marked: bool                     │
│   obj_type: ObjType::Environment   │
│   size: ...                        │
├────────────────────────────────────┤
│ Data                               │
│   enclosing: *mut Env (or null)    │  ← for nested closures
│   count: usize                     │  ← number of slots
│   slot[0]: value                   │
│   slot[1]: value                   │
│   ...                              │
└────────────────────────────────────┘

Closure Object

Closure Object:
┌────────────────────────────────────┐
│ Header (ObjHeader)                 │
│   marked: bool                     │
│   obj_type: ObjType::Closure       │
│   size: ...                        │
├────────────────────────────────────┤
│ Data                               │
│   function: *mut Function          │  ← the code to execute
│   environment: *mut Env            │  ← captured variables
└────────────────────────────────────┘

Implementing Environments

Let’s add environment support to our runtime:

#![allow(unused)]
fn main() {
// src/runtime/object.rs (extended)

#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq)]
pub enum ObjType {
    Number = 0,
    String = 1,
    Environment = 2,   // NEW: closure environment
    Closure = 3,       // shifted from 2 → 3 (was a forward declaration in Part 2)
    Instance = 4,      // shifted from 3 → 4
}
}

Note the renumbering. Part 2 defined the enum with Closure = 2 and Instance = 3 as forward declarations (the Closure variant existed but wasn’t used until this part). Now we’re inserting Environment = 2 before Closure, which shifts Closure to 3 and Instance to 4. This is why we use explicit discriminants — if we relied on implicit numbering, inserting a variant in the middle would silently break the GC’s trace_object dispatch. The explicit values make the contract between the enum and the heap header’s obj_type byte visible.

#![allow(unused)]
fn main() {
/// An environment (holds captured variables)
#[repr(C)]
pub struct Environment {
    /// Pointer to enclosing environment (for nested closures)
    /// null = no enclosing environment (top-level)
    pub enclosing: *mut Environment,
    
    /// Number of variable slots
    pub slot_count: usize,
    
    /// Variable slots (flexible array member)
    /// The `[0]` is Rust's approximation of a C flexible array member —
    /// `alloc_environment` allocates extra space for the slots beyond the
    /// struct size, and `get`/`set` access them via pointer arithmetic
    /// on `as_ptr().add(index)`.
    pub slots: [*mut u8; 0],
}

impl Environment {
    /// Get a slot value
    pub fn get(&self, index: usize) -> *mut u8 {
        assert!(index < self.slot_count);
        unsafe { *self.slots.as_ptr().add(index) }
    }
    
    /// Set a slot value
    pub fn set(&mut self, index: usize, value: *mut u8) {
        assert!(index < self.slot_count);
        unsafe { *self.slots.as_mut_ptr().add(index) = value; }
    }
}

/// A closure (function + environment)
#[repr(C)]
pub struct Closure {
    /// Pointer to the function code (an index into our function table)
    pub function_index: usize,
    
    /// Pointer to the captured environment
    pub environment: *mut Environment,
}
}

Why do the slots store *mut u8 but the MLIR code loads f64? The Rust runtime stores *mut u8 pointers in each slot — pointers to GC-tracked heap objects. In the numbers-only model, those heap objects are Number values whose data area contains an f64. The MLIR lox_runtime_env_get returns a !llvm.ptr pointing to the Number’s data area, and llvm.load reads the f64 from there. The MLIR lox_runtime_env_set_number takes an f64, boxes it into a heap-allocated Number (the C runtime handles this internally), and stores the pointer in the slot. We use env_set_number instead of the generic env_set (which takes void*) because passing an f64 through a void* parameter violates the C calling convention — on x86-64 System V, f64 is passed in an XMM register while void* is passed in a general-purpose register, so the callee would read garbage. In the tagged-union model (Part 7+), each slot would hold a (i8, i64) pair and the generic env_set with void* would be used instead.

Allocating an Environment

#![allow(unused)]
fn main() {
// src/runtime/gc.rs (extended)

/// Allocate a new environment with the given slot count
pub fn alloc_environment(slot_count: usize, enclosing: *mut Environment) -> *mut Environment {
    // Calculate size: header + environment struct + slots
    let env_data_size = std::mem::size_of::<Environment>() 
                      + slot_count * std::mem::size_of::<*mut u8>();
    
    let data_ptr = alloc(env_data_size, ObjType::Environment);
    let env = data_ptr as *mut Environment;
    
    unsafe {
        (*env).enclosing = enclosing;
        (*env).slot_count = slot_count;
        
        // Initialize all slots to null
        for i in 0..slot_count {
            (*env).set(i, std::ptr::null_mut());
        }
    }
    
    env
}

/// Allocate a new closure
pub fn alloc_closure(function_index: usize, environment: *mut Environment) -> *mut Closure {
    let data_ptr = alloc(std::mem::size_of::<Closure>(), ObjType::Closure);
    let closure = data_ptr as *mut Closure;
    
    unsafe {
        (*closure).function_index = function_index;
        (*closure).environment = environment;
    }
    
    closure
}
}

Marking Environments

When we mark a closure, we must also mark its environment. This is the same mark_references function from Part 3 — we’re adding the Environment arm for closures:

#![allow(unused)]
fn main() {
// src/runtime/gc.rs (extended)

fn mark_references(header: *mut ObjHeader) {
    let obj_type = unsafe { (*header).obj_type };
    let data = unsafe { (header as *mut u8).add(std::mem::size_of::<ObjHeader>()) };
    
    match obj_type {
        ObjType::Number | ObjType::String => {
            // No references
        }
        
        ObjType::Environment => {
            let env = data as *mut Environment;
            unsafe {
                // Mark enclosing environment
                if !(*env).enclosing.is_null() {
                    mark_object((*env).enclosing as *mut u8);
                }
                
                // Mark all slots
                for i in 0..(*env).slot_count {
                    let slot = (*env).get(i);
                    if !slot.is_null() {
                        mark_object(slot);
                    }
                }
            }
        }
        
        ObjType::Closure => {
            let closure = data as *mut Closure;
            unsafe {
                // Mark the environment
                let env = (*closure).environment;
                if !env.is_null() {
                    mark_object(env as *mut u8);
                }
            }
        }
        
        ObjType::Instance => {
            // (same as before)
        }
    }
}
}

Code Generation for Closures

Now let’s generate code for closures in our compiler:

The Challenge

When compiling a closure, we need to:

Identify captured variables - which variables from outer scopes are used?
Allocate an environment - create heap storage for captured variables
Store captured values - copy values into the environment
Access via environment - when the closure reads/writes captured vars, go through the environment

Step 1: Variable Analysis

#![allow(unused)]
fn main() {
// src/analysis/captures.rs

use crate::ast::*;

/// A variable that is captured by a closure
#[derive(Debug, Clone)]
pub struct CapturedVar {
    pub name: String,
    pub depth: usize,      // How many enclosing environments to follow (0 = immediately enclosing)
    pub slot_index: usize, // Index in the environment
}

/// Analyze a function to find captured variables
pub fn find_captures(func: &FunctionStmt) -> Vec<CapturedVar> {
    let mut analyzer = CaptureAnalyzer::new();
    analyzer.analyze_function(func);
    analyzer.captures
}

struct CaptureAnalyzer {
    scopes: Vec<Vec<String>>,  // Stack of local variables in each scope
    captures: Vec<CapturedVar>,
    current_slot: usize,
}

impl CaptureAnalyzer {
    fn new() -> Self {
        Self {
            scopes: vec![vec![]],  // Start with one scope for parameters
            captures: Vec::new(),
            current_slot: 0,
        }
    }
    
    fn analyze_function(&mut self, func: &FunctionStmt) {
        // Parameters are in scope 0
        for param in &func.params {
            self.scopes[0].push(param.clone());
        }
        
        // Analyze body
        for stmt in &func.body {
            self.analyze_stmt(stmt);
        }
    }
    
    fn analyze_stmt(&mut self, stmt: &Stmt) {
        match stmt {
            Stmt::Var(v) => {
                self.scopes.last_mut().unwrap().push(v.name.clone());
                self.analyze_expr(&v.init);
            }
            Stmt::Block(b) => {
                self.scopes.push(vec![]);
                for s in &b.statements {
                    self.analyze_stmt(s);
                }
                self.scopes.pop();
            }
            Stmt::Print(p) => self.analyze_expr(&p.expression),
            Stmt::Return(r) => {
                if let Some(value) = &r.value {
                    self.analyze_expr(value);
                }
            }
            Stmt::Function(f) => {
                // The function name is local to this scope
                self.scopes.last_mut().unwrap().push(f.name.clone());
                // But we don't recurse into the body here —
                // the compiler processes each function separately
                // (see "Compilation Order: Inside Out" below).
            }
            Stmt::Expression(e) => self.analyze_expr(&e.expression),
            Stmt::If(i) => {
                self.analyze_expr(&i.condition);
                self.analyze_stmt(&i.then_branch);
                if let Some(else_branch) = &i.else_branch {
                    self.analyze_stmt(else_branch);
                }
            }
            Stmt::While(w) => {
                self.analyze_expr(&w.condition);
                self.analyze_stmt(&w.body);
            }
        }
    }
    
    fn analyze_expr(&mut self, expr: &Expr) {
        match expr {
            Expr::Variable(v) => {
                // Is this variable captured? (not in any local scope)
                if !self.is_local(&v.name) {
                    // It's a capture!
                    if !self.captures.iter().any(|c| c.name == v.name) {
                        self.captures.push(CapturedVar {
                            name: v.name.clone(),
                            // depth = how many enclosing environments to follow
                            // at runtime. A variable captured from the immediately
                            // enclosing function is at depth 0 (first env up).
                            // For nested closures (capturing from two+ levels up),
                            // extending the scope analysis to track how many
                            // function boundaries are between the reference and
                            // its definition would compute the actual depth.
                            // Our compiler handles single-level
                            // captures — captured variables are always in the
                            // immediately enclosing environment.
                            depth: 0,
                            slot_index: self.current_slot,
                        });
                        self.current_slot += 1;
                    }
                }
            }
            // Recursive cases — the capture analysis doesn't change
            // behavior for these, it traverses into sub-expressions
            // to find more variable references.
            Expr::Binary(b) => {
                self.analyze_expr(&b.left);
                self.analyze_expr(&b.right);
            }
            Expr::Unary(u) => self.analyze_expr(&u.right),
            Expr::Call(c) => {
                self.analyze_expr(&c.callee);
                for arg in &c.arguments {
                    self.analyze_expr(arg);
                }
            }
            Expr::Assign(a) => {
                // Assignment to a variable might be a capture too
                if !self.is_local(&a.name) {
                    if !self.captures.iter().any(|c| c.name == a.name) {
                        self.captures.push(CapturedVar {
                            name: a.name.clone(),
                            depth: 0,
                            slot_index: self.current_slot,
                        });
                        self.current_slot += 1;
                    }
                }
                self.analyze_expr(&a.value);
            }
            Expr::Literal(_) => {} // No sub-expressions
            Expr::Logical(l) => {
                self.analyze_expr(&l.left);
                self.analyze_expr(&l.right);
            }
            Expr::Grouping(g) => self.analyze_expr(&g.expression),
        }
    }
    
    fn is_local(&self, name: &str) -> bool {
        // Check ALL scopes — a variable from an enclosing block in the
        // same function is still local, not a capture.
        //
        // Variables from enclosing *functions* aren't in self.scopes at all
        // (we only track the current function's scopes), so they naturally
        // fail this check and are classified as captures. That's correct:
        // a variable that isn't local to this function must be captured.
        self.scopes.iter().any(|scope| scope.contains(&name.to_string()))
    }
    
    // depth_of is shown for reference but not used in our simplified
    // capture analysis — it computes the scope depth within the current
    // function. For captured variables (not in self.scopes), this would
    // panic. A full compiler would track the number of enclosing
    // function boundaries between the reference and the definition to
    // compute the environment chain depth.
    fn _depth_of(&self, name: &str) -> usize {
        for (depth, scope) in self.scopes.iter().rev().enumerate() {
            if scope.contains(&name.to_string()) {
                return depth;
            }
        }
        panic!("Variable not found: {}", name);
    }
}
}

Step 2: Environment Allocation

#![allow(unused)]
fn main() {
// Continuing in src/codegen.rs — same imports as Part 4
use std::collections::HashMap;
// (BlockLike, RegionLike, OperationLike must be in scope)
// src/codegen/generator.rs (extended)

impl<'c> CodeGenerator<'c> {
    
    fn compile_function(&self, func: &FunctionStmt, variables: &mut HashMap<String, Value<'c, 'c>>, current_env: &mut Option<Value<'c, 'c>>) {
        // current_env is &mut because we assign to it below (*current_env = Some(env)),
        // not because alloc_environment needs mutability — it takes &Option<Value>.
        // ... setup ...
        
        // Find captured variables
        let captures = find_captures(func);
        
        // If we have captures, we need an environment
        if !captures.is_empty() {
            let env = self.alloc_environment(block, captures.len(), current_env);
            
            // Store captured values into the environment
            for capture in &captures {
                // get_capture_value looks up the variable in the enclosing
                // function's variables map, possibly traversing the scope
                // chain (capture.depth levels up) to find it.
                let value = self.get_capture_value(&capture.name, capture.depth);
                self.store_to_environment(block, env, capture.slot_index, value);
            }
            
            // The environment is now available for inner functions
            *current_env = Some(env);
        }
        
        // ... compile body ...
    }
    
    fn alloc_environment(&self, block: &Block<'c>, slot_count: usize, current_env: &Option<Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);
        
        // Call lox_runtime_alloc_environment(slot_count, enclosing)
        let slot_count_val = self.const_i64(slot_count as i64);
        let enclosing_val = current_env
            .unwrap_or_else(|| self.const_null());
        
        let call = block.append_operation(func::call(
            self.context,
            melior::ir::attribute::FlatSymbolRefAttribute::new(
                self.context, 
                "lox_runtime_alloc_environment"
            ),
            &[slot_count_val, enclosing_val],
            &[Type::parse(self.context, "!llvm.ptr").unwrap()],
            location,
        ));
        
        call.result(0).unwrap().into()
    }
    
    fn store_to_environment(&self, block: &Block<'c>, env: Value<'c, 'c>, index: usize, value: Value<'c, 'c>) {
        let location = Location::unknown(self.context);
        
        // The C runtime handles boxing the f64 into a heap-allocated
        // Number and storing the pointer in the environment slot.
        block.append_operation(func::call(
            self.context,
            melior::ir::attribute::FlatSymbolRefAttribute::new(
                self.context,
                "lox_runtime_env_set_number"
            ),
            &[env, self.const_i64(index as i64), value],
            &[],
            location,
        ));
    }
}

Why lox_runtime_env_set_number instead of lox_runtime_env_set?

The generic lox_runtime_env_set takes a void* parameter. Our values are f64. You might think: “cast the f64 to a pointer and pass it.” On x86-64, that breaks. The C calling convention passes void* arguments in general-purpose registers (rdi, rsi, rdx…) but passes double arguments in XMM registers (xmm0, xmm1…). If the callee expects a pointer in rdx but you passed a double in xmm0, the callee reads the wrong register and gets garbage.

The type-specific lox_runtime_env_set_number takes double directly, so the compiler puts the value in the right register. The C runtime then boxes the double into a heap-allocated Number object and stores that pointer in the environment slot. This is the kind of calling-convention trap that only bites you when you cross language boundaries — MLIR’s func.call doesn’t save you from the C ABI’s register rules.

#![allow(unused)]
fn main() {
    /// Get a pointer to a captured variable in the closure environment.
    /// The environment stores pointers to heap objects; `env_get` returns
    /// a pointer to the object's data area (past ObjHeader), so you can
    /// `llvm.load` the f64 directly.
    fn env_get_ptr(&self, block: &Block<'c>, env: Value<'c, 'c>, index: usize) -> Value<'c, 'c> {
        let location = Location::unknown(self.context);
        
        let call = block.append_operation(func::call(
            self.context,
            melior::ir::attribute::FlatSymbolRefAttribute::new(
                self.context,
                "lox_runtime_env_get"
            ),
            &[env, self.const_i64(index as i64)],
            &[Type::parse(self.context, "!llvm.ptr").unwrap()],
            location,
        ));
        
        call.result(0).unwrap().into()
    }
    
    /// Load a captured variable's value from the closure environment.
    /// Two steps: get the pointer (env_get_ptr), then load the f64.
    fn load_from_environment(&self, block: &Block<'c>, env: Value<'c, 'c>, index: usize) -> Value<'c, 'c> {
        let ptr = self.env_get_ptr(block, env, index);
        let location = Location::unknown(self.context);
        
        let loaded = block.append_operation(llvm::load(
            ptr,
            Type::float64(self.context),
            location,
        ));
        
        loaded.result(0).unwrap().into()
    }
}
}

How Does the Closure Receive Its Environment?

The code above shows the enclosing function creating an environment and storing captured values into it. But the closure itself is compiled as a separate function — and it needs an %env parameter to access those captured variables.

Here’s the missing link. When the compiler compiles a closure’s body, it does two things differently from a regular function:

The function signature includes an %env parameter. You’ll see this in the MLIR for @counter below — the first argument is always the environment pointer.
current_env is initialized from that parameter, not from alloc_environment. The allocation happened in the enclosing function, and the closure receives the pointer through the calling convention.

When the compiler encounters a reference to a captured variable like count, it calls load_from_environment with the %env parameter instead of looking up the variable in the stack-allocated variables HashMap. Local variables (the function’s own parameters and locals) still go through variables; only captured variables go through the environment.

This split is what find_captures determines — it classifies each variable reference as local or captured, and the compiler routes accordingly.

The Complete Picture

Here’s how everything connects for our makeCounter example:

Lox Source

fun makeCounter() {
    var count = 0;
    
    fun counter() {
        count = count + 1;
        return count;
    }
    
    return counter;
}

How Does the Runtime Call a Closure?

Before we look at the generated MLIR, there’s a question you should be asking: when the Lox program does c(), how does the runtime know which function to call and what to pass it?

The closure object has two fields: function_index and environment. The calling convention works like this:

When the Lox program does: c()

1. Load c from the stack (it's a pointer to a Closure object)
2. Read the Closure's function_index (e.g., 1)
3. Read the Closure's environment pointer (e.g., 0x2000)
4. Look up function_index 1 in a function table → @counter
5. Call @counter(environment)

The function table is an array the compiler builds at module creation time. Each closure-generating function gets an index. When alloc_closure(1, %env) is called, the 1 refers to @counter’s position in that table.

In the generated MLIR, this looks like:

// In the caller (e.g., main or another function)
%closure = <load closure from variable>

// Read function_index and environment from the closure object
%func_idx = llvm.load %closure[0] : !llvm.ptr -> i64
%env_ptr  = llvm.load %closure[8] : !llvm.ptr -> !llvm.ptr

// Indirect call through the function table
%func_table = llvm.mlir.addressof @lox_function_table : !llvm.ptr
%func_ptr = llvm.getelementptr %func_table[%func_idx] : (!llvm.ptr, i64) -> !llvm.ptr
%callee = llvm.load %func_ptr : !llvm.ptr -> !llvm.ptr
%result = llvm.call %callee(%env_ptr) : !llvm.ptr, !llvm.ptr -> f64

This is an indirect call. The target function isn’t known at compile time — it depends on which closure the variable holds at runtime. That’s the cost of first-class functions. The benefit is that c() works regardless of which closure c points to, whether it’s a counter, an adder, or anything else.

For closures that take their own parameters (like @add in the “Multiple Captured Variables” section below, which takes (%env: !llvm.ptr, %x: f64, %y: f64)), the environment pointer is followed by the function’s own arguments in the indirect call: %result = llvm.call %callee(%env_ptr, %x, %y). The calling convention is: environment first, then the closure’s declared parameters.

Simplification: A production compiler would use MLIR’s call operation with a symbol reference when the target is known at compile time (direct call) and fall back to indirect calls only for closures stored in variables. Our compiler uses indirect calls for all closure invocations for simplicity.

Now you know how closures are called. With that in place, the generated MLIR will make sense — when you see @counter(%env: !llvm.ptr), the %env comes from the calling convention above.

Generated MLIR (Simplified)

module {
  // makeCounter creates an environment for 'count'
  func.func @makeCounter() -> !llvm.ptr {
    // Push frame with 1 root: the environment must survive across
    // the alloc_closure call — if GC triggers there, %env needs to
    // be on the shadow stack or it could be collected.
    %frame = lox.push_frame root_count = 1 : !llvm.ptr
    
    // Allocate environment with 1 slot
    %env = func.call @lox_runtime_alloc_environment(1, null) : (i64, !llvm.ptr) -> !llvm.ptr
    
    // Root the environment before any further allocation
    lox.set_root index = 0, %env : !llvm.ptr
    
    // Initialize count = 0 (env_set_number boxes the f64 into a heap Number)
    %zero = arith.constant 0.0 : f64
    func.call @lox_runtime_env_set_number(%env, 0, %zero) : (!llvm.ptr, i64, f64)
    
    // Create closure for counter
    // (counter_index = 1, env = %env)
    // Safe: %env is rooted, so GC during alloc_closure can find it
    %closure = func.call @lox_runtime_alloc_closure(1, %env) : (i64, !llvm.ptr) -> !llvm.ptr
    
    lox.pop_frame
    func.return %closure : !llvm.ptr
  }
  
  // counter accesses 'count' via environment
  func.func @counter(%env: !llvm.ptr) -> f64 {
    %frame = lox.push_frame root_count = 1 : !llvm.ptr
    lox.set_root index = 0, %env : !llvm.ptr
    
    // Load count from environment (returns a pointer to the heap object)
    %count_ptr = func.call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> !llvm.ptr
    %count = llvm.load %count_ptr : !llvm.ptr -> f64
    
    // count = count + 1
    %one = arith.constant 1.0 : f64
    %new_count = arith.addf %count, %one : f64
    
    // Store back to environment (env_set_number boxes the new f64)
    func.call @lox_runtime_env_set_number(%env, 0, %new_count) : (!llvm.ptr, i64, f64)
    
    lox.pop_frame
    func.return %new_count : f64
  }
}

Memory Layout After `var c = makeCounter();`

Stack:
  ┌─────────────────┐
  │ c = 0x1000      │──┐
  └─────────────────┘  │
                       ▼
Heap:                  
  ┌─────────────────────────┐  0x1000: Closure
  │ header: { Closure }     │
  │ function_index: 1       │
  │ environment: 0x2000 ────│──┐
  └─────────────────────────┘  │
                               ▼
  ┌─────────────────────────┐  0x2000: Environment
  │ header: { Environment } │
  │ enclosing: null         │
  │ slot_count: 1           │
  │ slot[0]: 1.0 ───────────│──┐
  └─────────────────────────┘  │
                               │
  (count = 1.0) ◄──────────────┘

Nested Closures Deep Dive

Let’s trace through a more complex example with nested closures:

The Code

fun makeAdder(x) {
    fun adder(y) {
        return x + y;  // Captures 'x' from makeAdder
    }
    return adder;
}

var add5 = makeAdder(5);
var add10 = makeAdder(10);

print add5(3);   // 8
print add10(3);  // 13

Step-by-Step Execution

1. makeAdder(5) is called:

Stack:
  ┌─────────────────────────────┐
  │ Frame: makeAdder            │
  │   x = 5 (parameter)         │
  └─────────────────────────────┘
        │
        ▼ Creates environment
  ┌─────────────────────────────┐
  │ Environment 0x1000          │
  │   slot[0]: 5 (x)            │
  └─────────────────────────────┘
        │
        ▼ Creates closure
  ┌─────────────────────────────┐
  │ Closure 0x2000 (adder)      │
  │   function_index: 1         │
  │   environment: 0x1000 ──────│──► env with x=5
  └─────────────────────────────┘

Returns closure 0x2000, assigned to add5
makeAdder stack frame is destroyed, but environment lives on!

2. makeAdder(10) is called:

Creates NEW environment 0x3000 with x=10
Creates NEW closure 0x4000 pointing to environment 0x3000

add5  → closure 0x2000 → env 0x1000 (x=5)
add10 → closure 0x4000 → env 0x3000 (x=10)

3. add5(3) is called:

Stack:
  ┌─────────────────────────────┐
  │ Frame: adder                │
  │   y = 3 (parameter)         │
  │   env = 0x1000 (from closure)│
  └─────────────────────────────┘

Load x from env[0] = 5
Compute 5 + 3 = 8
Return 8

4. add10(3) is called:

Same process, but env = 0x3000
Load x from env[0] = 10
Compute 10 + 3 = 13
Return 13

Key Insight: Each Call Creates Separate Environment

makeAdder(5):
  → Creates env {x: 5}
  → Creates closure pointing to that env

makeAdder(10):
  → Creates NEW env {x: 10}
  → Creates NEW closure pointing to NEW env

The two closures share no state!

Multiple Captured Variables

What if we capture multiple variables?

fun makeOffsetter(offsetX, offsetY) {
    fun apply(x, y) {
        // Captures both offsetX and offsetY
        print offsetX + x;
        print offsetY + y;
    }
    return apply;
}

var shift = makeOffsetter(10, 20);
shift(1, 2);   // prints 11, then 22

Environment Layout

Environment:
  slot[0]: offsetX
  slot[1]: offsetY

When apply is called:
  1. Load offsetX from env[0]
  2. Load offsetY from env[1]
  3. Compute offsetX + x and print
  4. Compute offsetY + y and print

Generated Code

func.func @apply(%env: !llvm.ptr, %x: f64, %y: f64) {
    // Load captured variables (env_get returns pointers, then we load the f64 values)
    %offsetX_ptr = func.call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> !llvm.ptr
    %offsetX = llvm.load %offsetX_ptr : !llvm.ptr -> f64
    %offsetY_ptr = func.call @lox_runtime_env_get(%env, 1) : (!llvm.ptr, i64) -> !llvm.ptr
    %offsetY = llvm.load %offsetY_ptr : !llvm.ptr -> f64
    
    // Compute
    %sum_x = arith.addf %offsetX, %x : f64
    %sum_y = arith.addf %offsetY, %y : f64
    
    // Print each result
    func.call @lox_print(%sum_x) : (f64) -> ()
    func.call @lox_print(%sum_y) : (f64) -> ()
    func.return
}

Nested Environments

Beyond our simplified model. The find_captures analysis hardcodes depth: 0 because our compiler handles single-level captures — captured variables are always in the immediately enclosing environment. The nested example below shows what the runtime can do: environment chains, depth traversal, lox_runtime_env_get_enclosing are all real runtime machinery. But producing the correct depth values would require extending the compile-time scope analysis to track how many function boundaries are between the variable reference and its definition — a local depth computation during the existing scope walk, not a whole-program pass. The MLIR shown here is what a full compiler would generate; our simplified compiler would emit depth: 0 for all captures and wouldn’t handle the inner case correctly.

There are two gaps between the code shown and the nested example below. First, CaptureAnalyzer doesn’t compute actual depths — it always emits depth: 0. A full implementation would replace the hardcoded depth: 0 with a scope-walk that counts function boundaries between the reference and the definition. Second, the Stmt::Function arm pushes the function name into scope but doesn’t recurse into the body. This means captures from inner functions never flow upward to their enclosing functions. The “Compilation Order: Inside Out” section below explains the concept (inner functions determine what outer functions must capture). Here’s the mechanism that makes it work:
fn propagate_captures(
    inner_captures: &[CapturedVar],
    outer_scope: &Scope,
) -> Vec<CapturedVar> {
    // For each variable that the inner function captured
    // from beyond its immediately enclosing function,
    // the enclosing function must also capture it
    // (one level closer to the definition).
    inner_captures
        .iter()
        .filter(|c| c.depth > 0 || !outer_scope.is_local(&c.name))
        .map(|c| CapturedVar {
            name: c.name.clone(),
            depth: c.depth.saturating_sub(1),
            ..*c
        })
        .collect()
}
The key insight: if inner captures a at depth 1 (it’s in outer, one level beyond middle), then middle must capture a at depth 0 (it’s in outer, the immediately enclosing function). Each function in the chain captures at one level less depth than the function below it. This is the upward propagation that our simplified find_captures doesn’t implement — but that a full compiler needs.

Our simplified compiler emits depth: 0 for all captures. This is correct for single-level captures — the variable is always in the immediately enclosing environment. But it produces wrong code when a variable is two or more scopes away. The inner function needs a at depth 1, not depth 0, and our find_captures can’t compute that. The nested example below is here to show how the runtime would work if the analysis could produce the right depth values.

⚠️ The MLIR below is what a full compiler generates. The code shown earlier in this chapter does not produce this output for nested closures.

What if a closure captures a variable from two scopes up?

fun outer() {
    var a = 1;
    
    fun middle() {
        var b = 2;
        
        fun inner() {
            return a + b;  // a from outer, b from middle
        }
        
        return inner;
    }
    
    return middle;
}

Environment Chain

outer() creates:
  env_outer { a: 1 }

middle() creates:
  env_middle { b: 2, enclosing: env_outer }
                ↑
                Points to outer's environment

inner() closure:
  environment: env_middle

When inner() runs:
  1. Look up 'a': not in env_middle → follow enclosing → found in env_outer
  2. Look up 'b': found in env_middle

Variable Lookup Algorithm

#![allow(unused)]
fn main() {
/// Look up a variable by traversing the environment chain
fn lookup_variable(env: *mut Environment, depth: usize, slot: usize) -> *mut u8 {
    let mut current = env;
    
    // Walk up 'depth' levels through the enclosing chain.
    // depth=0 means the variable is in the current environment,
    // depth=1 means one level up, etc.
    for _ in 0..depth {
        current = unsafe { (*current).enclosing };
        assert!(!current.is_null(), "enclosing environment is null at depth > 0");
    }
    
    // Now access the slot
    unsafe { (*current).get(slot) }
}
}

Compilation Order: Inside Out

There’s a subtle but important constraint on how we compile closures: inner functions must be compiled before outer functions.

When outer defines middle which defines inner, the compilation order must be:

Compile inner → discover it captures a (from outer) and b (from middle)
Compile middle → now we know inner captures b from middle’s scope, so middle must allocate an environment with b in it
Compile outer → now we know middle captures a from outer’s scope, so outer must allocate an environment with a in it

If we compiled top-down, outer wouldn’t know it needs an environment for a until middle is compiled — and middle wouldn’t know about b until inner is compiled. The capture information flows upward: inner functions determine what outer functions must capture.

This is why the find_captures function analyzes a single function’s body — it finds variables that aren’t local to that function. The compiler then uses this information when compiling the enclosing function to allocate the right environment slots.

Generated MLIR

func.func @inner(%env: !llvm.ptr) -> f64 {
    // Load 'b' from current environment (depth=0, slot=0)
    %b_ptr = func.call @lox_runtime_env_get(%env, 0) : (!llvm.ptr, i64) -> !llvm.ptr
    %b = llvm.load %b_ptr : !llvm.ptr -> f64
    
    // Load 'a' from enclosing environment (depth=1, slot=0)
    // lox_runtime_env_get_enclosing follows the parent pointer in
    // the Environment struct — it must be declared alongside
    // lox_runtime_env_get and lox_runtime_env_set in the runtime.
    // Signature: lox_runtime_env_get_enclosing(env: !llvm.ptr) -> !llvm.ptr
    %env_outer = func.call @lox_runtime_env_get_enclosing(%env) : (!llvm.ptr) -> !llvm.ptr
    %a_ptr = func.call @lox_runtime_env_get(%env_outer, 0) : (!llvm.ptr, i64) -> !llvm.ptr
    %a = llvm.load %a_ptr : !llvm.ptr -> f64
    
    // Compute
    %sum = arith.addf %a, %b : f64
    
    func.return %sum : f64
}

Practice Exercises

Exercise 1: Trace Memory Layout

For this code:

fun factory(value) {
    fun getter() {
        return value;
    }
    fun setter(new_value) {
        value = new_value;
    }
    // In a real Lox program, you'd return both closures
    // via a class instance or global variables. For this
    // exercise, imagine both are available after calling
    // factory() — they share the same environment.
    return getter;
}

var get = factory(10);
print get();  // 10
// If setter were also available:
// setter(20);
// print get();  // 20

Draw the memory layout after calling factory(10) and assigning the result to get.

Click to reveal answer

Stack:
  ┌─────────────────────────────┐
  │ get = 0x1000 (closure)      │
  └─────────────────────────────┘

Heap:
  0x1000: Closure (getter)
    function_index: @getter
    environment: 0x3000 ──────────┐
                                  │
  (If setter were also returned:) │
  0x2000: Closure (setter)        │
    function_index: @setter       │
    environment: 0x3000 ──────────│── Same environment!
                                  │
  0x3000: Environment ◄───────────┘
    slot[0]: 10 (value)

Key insight: Both closures share the SAME environment. That’s why setter(20) would affect what getter() returns!

Exercise 2: Variable Analysis

For this function, what variables are captured and where do they go?

fun outer(x, y) {
    var a = x + y;
    
    fun inner(b) {
        return a + b + x;  // Captures a and x
    }
    
    return inner;
}

Click to reveal answer

Captured variables:

a (local in outer) → slot 0
x (parameter in outer) → slot 1

Environment for inner:

env_inner:
  slot[0]: a
  slot[1]: x

Note: y is NOT captured (not used by inner), so it’s not in the environment.

Initialization order matters: the environment must be allocated early so it can be rooted on the shadow stack before any later allocation (like alloc_closure) triggers GC. If GC ran during alloc_closure and the environment wasn’t rooted, it could be collected — taking a and x with it. The sequence is: allocate env → root env → compute a = x + y → store a in env[0] → store x in env[1] → allocate closure. The stores into the environment can happen in any order after allocation, but the environment must be on the shadow stack before alloc_closure.

Exercise 3: Why Not Copy Values?

Why can’t we copy captured values into the closure directly? Why do we need an environment?

fun example() {
    var count = 0;
    
    fun increment() {
        count = count + 1;  // MODIFIES count!
        return count;
    }
    
    return increment;
}

Click to reveal answer

If we copied count into the closure, each call to increment() would modify its own copy — the outer count would never change, and multiple calls wouldn’t accumulate.

By using an environment, all closures share the same environment. Modifications are visible to every closure that references it, so state is properly shared.

The environment is essentially a shared “box” that holds the variable.

Next: Part 6 — Complete Reference — Closures are the hardest single feature. Before adding more, let’s see the complete numbers-only compiler in one place: every module, every pass, every runtime function. This is the working system that Parts 1–5 built, assembled and running end to end.

MLIR for Lox: Part 6 — The Complete Project — Every Dialect Operation in One Place

Five parts. A garbage collector. Root tracking. MLIR integration. Closures. Now you need to see how they fit together — not as individual chapters, but as a single running system. This part is the assembled view: file structure, object layout, API reference, compilation pipeline, debugging guide, and performance notes.

Read it straight through for the big picture. Jump to a section when you need a detail.

Complete File Structure

lox-mlir/
├── ast/           # AST type definitions
├── lexer/         # Tokenization
├── parser/        # Parsing
├── analysis/      # Closure capture analysis
├── codegen/       # MLIR code generation, Lox dialect, lowering
├── runtime/       # Object header, garbage collector, shadow stack
├── examples/      # Example Lox programs (simple, closures, GC stress)
└── tests/         # Integration tests

Complete Object Layout

Every object in Lox has this layout:

┌─────────────────────────────────────────────────────┐
│ ObjHeader (16 bytes)                                │
├─────────────────────────────────────────────────────┤
│ Field         │ Offset │ Size │ Description        │
├───────────────┼────────┼──────┼────────────────────┤
│ marked        │ 0      │ 1    │ GC mark flag       │
│ obj_type      │ 1      │ 1    │ ObjType enum       │
│ padding       │ 2      │ 2    │ Alignment padding  │
│ size          │ 4      │ 4    │ Data size in bytes │
│ next          │ 8      │ 8    │ Ptr to next obj   │
└─────────────────────────────────────────────────────┘

Total header size: 16 bytes
Data follows immediately after header (offset +16).

Type-Specific Layouts

Number (ObjType::Number)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ value: f64 (8 bytes)         │
└──────────────────────────────┘
Total: 24 bytes

String (ObjType::String)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ length: usize (8 bytes)      │
│ hash: u64 (8 bytes)          │
│ chars: [u8; length]          │
└──────────────────────────────┘
Total: 32 + length bytes

Environment (ObjType::Environment)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ enclosing: *mut Env (8)      │
│ slot_count: usize (8)        │
│ slots: [*mut u8; slot_count] │
└──────────────────────────────┘
Total: 32 + (8 × slot_count) bytes

Closure (ObjType::Closure)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ function_index: usize (8)    │
│ environment: *mut Env (8)    │
└──────────────────────────────┘
Total: 32 bytes

Instance (ObjType::Instance)

┌──────────────────────────────┐
│ Header (16 bytes)            │
├──────────────────────────────┤
│ class: *mut Class (8)        │
│ field_count: usize (8)       │
│ fields: [(*str, *val); n]    │
└──────────────────────────────┘
Total: 32 + (16 × field_count) bytes

Complete GC API Reference

Runtime Functions (C ABI)

The C ABI convention here puts obj_type before size. If you worked through Part 5, you’ll notice this is the opposite order — Part 5 calls the Rust runtime directly with (size, obj_type), matching the Rust function signature. The extern "C" wrapper registered with the JIT reorders the arguments: the MLIR call site passes (obj_type, size), the C wrapper receives them in that order, then calls the Rust alloc(size, obj_type) with the arguments swapped back to the Rust function’s expected order. Both sections are internally consistent — Part 5 matches the Rust signature, and Part 7 matches the C ABI convention.

// Allocation
void* lox_runtime_alloc(uint8_t obj_type, size_t size);
void* lox_runtime_alloc_environment(size_t slot_count, void* enclosing);
void* lox_runtime_alloc_closure(size_t function_index, void* environment);

// Environment access
void lox_runtime_env_set(void* env, size_t index, void* value);
void lox_runtime_env_set_number(void* env, size_t index, double value);
void* lox_runtime_env_get(void* env, size_t index);

// Shadow stack
void** gc_push_frame(size_t root_count);
void gc_pop_frame(void);
void gc_set_root(void* frame, size_t index, void* value);

// Collection
void gc_collect(void);
size_t gc_object_count(void);

MLIR Operations

lox.alloc :: (obj_type: i8, size: i64) -> !llvm.ptr
  Allocates a heap object with the given type and data size (not including header).

lox.push_frame :: (root_count: i32) -> !llvm.ptr
  Pushes a shadow stack frame with the given number of root slots.
  Returns pointer to roots array.

lox.pop_frame :: () -> ()
  Pops the current shadow stack frame.

lox.set_root :: (index: i32, value: !llvm.ptr) -> ()
  Sets a root in the current frame.

lox.store :: (obj: !llvm.ptr, value: f64) -> ()
  Stores a number into a heap object's data field.

lox.load :: (obj: !llvm.ptr) -> f64
  Loads a number from a heap object's data field.

lox.print :: (value: f64) -> ()
  Prints a value via the runtime.

lox.alloc_environment :: (slot_count: i64, enclosing: !llvm.ptr) -> !llvm.ptr
  Allocates an environment with the given number of variable slots.
  `enclosing` is a null pointer for top-level environments, or a pointer to
  the parent environment for nested closures.

lox.alloc_closure :: (function_index: i64, environment: !llvm.ptr) -> !llvm.ptr
  Allocates a closure struct: a function index (position in the compiler's
  function table) and a pointer to the closure's environment.

lox.env_set :: (env: !llvm.ptr, index: i64, value: !llvm.ptr) -> ()
  Sets a slot in an environment. For the numbers-only model, use lox.env_set_number instead
  (passing an f64 through a void* parameter violates the C calling convention — the callee
  would read the wrong register).

lox.env_set_number :: (env: !llvm.ptr, index: i64, value: f64) -> ()
  Sets a slot in an environment, boxing the f64 into a heap-allocated Number.
  Numbers-only model only. The generic env_set takes a void* for the tagged-union model.

lox.env_get :: (env: !llvm.ptr, index: i64) -> !llvm.ptr
  Gets a slot from an environment.

The Compilation Pipeline

┌─────────────────────────────────────────────────────────────┐
│ 1. LEXING                                                   │
│    "var x = 1 + 2;" → [VAR, IDENT, EQUAL, NUMBER, ...]      │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 2. PARSING                                                  │
│    Tokens → AST                                             │
│    VarStmt { name: "x", init: BinaryExpr { ... } }          │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 3. ANALYSIS                                                 │
│    Find captured variables, compute root counts             │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 4. CODE GENERATION                                          │
│    AST → MLIR (Lox dialect)                                 │
│    func.func @main() {                                      │
│      %frame = lox.push_frame root_count = 1                 │
│      %x = lox.alloc obj_type = 0, size = 8                  │
│      lox.set_root index = 0, %x                             │
│      ...                                                    │
│    }                                                        │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 5. LOWERING                                                 │
│    MLIR (Lox dialect) → MLIR (LLVM dialect)                 │
│    lox.alloc → func.call @lox_runtime_alloc                 │
│    lox.push_frame → func.call @gc_push_frame                │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 6. LLVM IR GENERATION                                       │
│    MLIR → LLVM IR                                           │
│    define ptr @lox_runtime_alloc(...) { ... }               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 7. NATIVE CODE GENERATION                                   │
│    LLVM IR → Machine code                                   │
│    Link with liblox_runtime.a                               │
└─────────────────────────────────────────────────────────────┘
                              ↓
┌─────────────────────────────────────────────────────────────┐
│ 8. EXECUTION                                                │
│    ./program                                                │
│    Runtime GC manages memory                                │
└─────────────────────────────────────────────────────────────┘

IR at Each Stage: A Concrete Example

The pipeline diagram above shows the flow, but it doesn’t show what the IR actually looks like at each stage. Let’s trace a concrete Lox program through the entire compilation pipeline.

The program:

var x = 1 + 2;
print x;

Trivial, but it exercises every stage. Here’s what happens:

Stage 1–2: Lexing and Parsing

The parser produces an AST like:

Program
  VarDecl "x" = BinaryExpr(Num(1), Plus, Num(2))
  PrintStmt Name("x")

Stage 3: Analysis

No closures or captured variables in this program. The analysis pass computes root counts: x is a local variable, so it needs one GC root.

Stage 4: Code Generation (Lox Dialect MLIR)

The code generator walks the AST and emits MLIR using our custom Lox dialect operations. This example uses the GC-aware representation (heap-allocated objects, shadow stack roots) from Parts 3–6 — not the simpler numbers-only model from Part 1. The difference shows up in the function signature: func.func @main() returns nothing here, while Part 1’s func.func @main() -> f64 returns the result of the last expression. Both are valid; the GC-aware version is what a real Lox compiler would produce.

module {
  func.func @main() {
    %frame = lox.push_frame root_count = 1 : i32
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %sum = arith.addf %one, %two : f64
    %x = lox.alloc obj_type = 0, size = 8 : i64
    lox.set_root index = 0, %x : i32
    lox.store %x, %sum : f64
    %loaded = lox.load %x : f64
    lox.print %loaded : f64
    lox.pop_frame
    func.return
  }
}

Notice the mix: arith.addf is a standard MLIR arithmetic operation, but lox.push_frame, lox.alloc, lox.set_root, and lox.print are our custom dialect. At this stage, the IR is close to the source — the Lox-level operations are still visible.

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

The lowering pass replaces each lox.* operation with standard MLIR and runtime calls:

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
    %one = arith.constant 1.0 : f64
    %two = arith.constant 2.0 : f64
    %sum = arith.addf %one, %two : f64
    %obj_type = arith.constant 0 : i8    // ObjType::Number
    %size = arith.constant 8 : i64      // data size only (f64 = 8 bytes; lox.alloc adds header)
    %x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
    %index = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %index, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
    // Store the number into the object's data field
    // %x is already the data pointer — alloc() returns ptr past the header
    llvm.store %sum, %x : f64, !llvm.ptr
    %loaded = llvm.load %x : !llvm.ptr -> f64
    func.call @lox_print(%loaded) : (f64) -> ()
    func.call @gc_pop_frame() : () -> ()
    func.return
  }
}

The Lox dialect is gone. Every lox.* operation has been replaced by a func.call to a runtime function or a standard LLVM operation. The scf.if and scf.while operations (from control flow in more complex programs) would be lowered to cf (control flow) dialect operations at this stage.

This is a simplification — the actual lowered IR has more detail around pointer arithmetic and type casting. But the pattern is: custom dialect operations become runtime calls and standard MLIR operations.

Note: The llvm.store and llvm.load syntax in Stage 5 examples is simplified. With opaque pointers (!llvm.ptr), storing an f64 into a !llvm.ptr slot requires a two-step conversion (bitcast f64 → i64, then inttoptr i64 → !llvm.ptr), as shown in Part 4. We simplify the syntax here for readability — the real MLIR LLVM dialect requires the two-step conversion. The key point is the data flow: values are stored into and loaded from the right memory locations. Stage 6 (LLVM IR) uses the valid native syntax.

Stage 6: LLVM IR

After the MLIR-to-LLVM conversion pass, the IR becomes LLVM IR:

define void @main() {
entry:
  %frame = call ptr @gc_push_frame(i32 1)
  %x = call ptr @lox_runtime_alloc(i32 0, i64 8)
  call void @gc_set_root(ptr %frame, i32 0, ptr %x)
  store f64 3.0, ptr %x
  %loaded = load f64, ptr %x
  call void @lox_print(f64 %loaded)
  call void @gc_pop_frame()
  ret void
}

Notice that 1.0 + 2.0 has been folded into the constant 3.0 — this is a free optimization from MLIR’s arith dialect constant folding.

Stage 7–8: Native Code and Execution

LLVM compiles the IR to native machine code, links it with the runtime (liblox_runtime.a), and produces an executable. When you run it:

Why This Matters

Looking at the IR at each stage reveals something important: the Lox dialect is the only stage that’s specific to our language. Everything after Stage 5 is generic — standard MLIR operations, standard LLVM IR, standard machine code. The Lox dialect is the narrow waist of the hourglass.

This is the whole point of MLIR. You define a dialect that captures what makes your language unique (Lox’s GC-aware operations, in our case). Then you lower to standard dialects and let the framework handle the rest. You write the interesting part; MLIR handles the boring part.

IR at Each Stage: A Program with Control Flow

The var x = 1 + 2; print x; example only shows straight-line code. Let’s trace a program with an if statement to see how control flow appears in the IR.

var x = 10;
if (x > 5) {
  print x;
} else {
  print 0;
}

This introduces a comparison, a conditional branch, and two alternative paths. Here’s what the IR looks like at the key stages:

Stage 4: Code Generation (Lox Dialect MLIR)

module {
  func.func @main() {
    %frame = lox.push_frame root_count = 1 : i32
    %ten = arith.constant 10.0 : f64
    %x = lox.alloc obj_type = 0, size = 8 : i64
    lox.set_root index = 0, %x : i32
    lox.store %x, %ten : f64

    %loaded = lox.load %x : f64
    %five = arith.constant 5.0 : f64
    %cond = arith.cmpf ogt, %loaded, %five : f64

    scf.if %cond {
      // then-block
      %val = lox.load %x : f64
      lox.print %val : f64
    } else {
      // else-block
      %zero = arith.constant 0.0 : f64
      lox.print %zero : f64
    }

    lox.pop_frame
    func.return
  }
}

Two new things appear: arith.cmpf (a floating-point comparison) and scf.if (structured control flow). The scf dialect is MLIR’s standard way to represent conditionals — it has a then-region and an else-region, each in their own { } block. Unlike LLVM’s basic-block branches, scf.if is structured: the then and else regions can’t be jumped into from outside. This makes the IR easier to analyze and transform.

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
    %ten = arith.constant 10.0 : f64
    %obj_type = arith.constant 0 : i8
    %size = arith.constant 8 : i64
    %x = func.call @lox_runtime_alloc(%obj_type, %size) : (i8, i64) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %x) : (!llvm.ptr, i32, !llvm.ptr) -> ()
    llvm.store %ten, %x : f64, !llvm.ptr
    %loaded = llvm.load %x : !llvm.ptr -> f64
    %five = arith.constant 5.0 : f64
    %cond = arith.cmpf ogt, %loaded, %five : f64

    scf.if %cond {
      %val = llvm.load %x : !llvm.ptr -> f64
      func.call @lox_print(%val) : (f64) -> ()
    } else {
      %zero = arith.constant 0.0 : f64
      func.call @lox_print(%zero) : (f64) -> ()
    }

    func.call @gc_pop_frame() : () -> ()
    func.return
  }
}

The scf.if is still here — it doesn’t get lowered until the scf-to-cf conversion pass. The lox.* operations have been replaced by runtime calls and LLVM memory operations, but the structured control flow remains. This is by design: keeping scf.if as long as possible lets MLIR’s transformation passes reason about the control flow structure.

Later, the scf-to-cf pass will lower scf.if to cf.cond_br (a conditional branch between basic blocks), and the cf-to-llvm pass will lower that to LLVM’s br and cond_br instructions. But that happens after all the high-level transformations are done.

What’s different from straight-line code?

The scf.if operation creates regions — separate blocks of IR for the then and else paths. Each region can define its own SSA values. The arith.cmpf produces an i1 (boolean) value that drives the branch.

The Lox dialect doesn’t have its own if operation — we use MLIR’s standard scf.if directly. This is a design choice: some languages define their own conditional operation in their dialect, then lower it to scf.if. We skip that step because scf.if already says everything we need.

GC roots are shared across both branches. The %frame and %x values are available in both the then-region and else-region — MLIR regions can see values from their enclosing scope. This means we don’t need to push/pop roots for each branch; the pre-if roots cover both paths.

IR at Each Stage: A Program with Closures

Functions and closures are where the GC-aware model really shows its value. Let’s trace a program that defines a function, captures a variable, and calls the result:

fun makeAdder(n) {
  fun add(x) {
    return x + n;
  }
  return add;
}

var add5 = makeAdder(5);
print add5(3);

This program creates a closure (add) that captures n from its enclosing scope. The closure outlives makeAdder’s stack frame — that’s the whole reason we need heap-allocated closure environments and GC roots.

Stage 4: Code Generation (Lox Dialect MLIR)

module {
  func.func @main() {
    // One root: the closure returned by makeAdder. makeAdder itself is a
    // top-level function — we call it directly, so it doesn't need a closure
    // object. Only the returned closure (which captures n) needs to be rooted.
    %frame = lox.push_frame root_count = 1 : i32

    // Call makeAdder(5) — direct call, no closure allocation needed
    %five = arith.constant 5.0 : f64
    %add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
    lox.set_root index = 0, %add5_closure : !llvm.ptr

    // Call add5(3) — extract the environment from the closure
    // (matches the calling convention from Part 5: closure functions
    // receive the environment pointer, not the closure pointer)
    %three = arith.constant 3.0 : f64
    %env_ptr = llvm.load %add5_closure[8] : !llvm.ptr -> !llvm.ptr  // shorthand (see note below)
    %result = func.call @add(%three, %env_ptr) : (f64, !llvm.ptr) -> f64
    lox.print %result : f64

    lox.pop_frame
    func.return
  }

  func.func @makeAdder(%n: f64) -> !llvm.ptr {
    %frame = lox.push_frame root_count = 1 : i32

    // Allocate environment with 1 slot for the captured variable 'n'
    %null_env = llvm.mlir.nullptr : !llvm.ptr  // no enclosing environment
    %env = lox.alloc_environment slot_count = 1, enclosing = %null_env : !llvm.ptr
    lox.set_root index = 0, %env : !llvm.ptr

    // Store n into environment slot 0 (env_set_number boxes the f64 into a heap Number)
    lox.env_set_number %env, index = 0, %n : !llvm.ptr, i64, f64

    // Allocate closure for 'add', pointing to the environment
    // The compiler assigns function indices in compilation order.
    // @main and @makeAdder are called directly (no closure table entry
    // needed). @add is the first closure — index 1. (Index 0 would be
    // used if the first closure-creating function in compilation order
    // needed a table entry.)
    %c1 = arith.constant 1 : i64  // function_index for @add
    %add_closure = lox.alloc_closure function_index = %c1, environment = %env : !llvm.ptr

    lox.pop_frame
    func.return %add_closure : !llvm.ptr
  }

  func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
    // Load captured variable 'n' from the environment (slot 0)
    // The environment pointer is received directly — the caller extracts
    // it from the closure before calling us (see the calling convention
    // in Part 5's "How Does the Runtime Call a Closure?" section).
    %n_ptr = lox.env_get %env, index = 0 : (!llvm.ptr, i64) -> !llvm.ptr
    %n = llvm.load %n_ptr : !llvm.ptr -> f64
    %sum = arith.addf %x, %n : f64
    func.return %sum : f64
  }
}

%env_ptr = llvm.load %add5_closure[8] is a shorthand. The [8] isn’t valid MLIR syntax — it stands for llvm.getelementptr (to compute the address of the environment field, 8 bytes after the start of the closure’s data area) followed by llvm.load (to read the pointer at that address). Stage 5 shows the real two-step version. The shorthand here is for readability — the Lox dialect is our notation, and this operation maps to the two-step sequence during lowering.

lox.alloc_environment and lox.alloc_closure are separate heap objects. The environment holds the captured variables; the closure holds a function index and a pointer to the environment. This matches the runtime API from Part 5: alloc_environment creates a slot array, alloc_closure creates a closure struct.

lox.env_set_number and lox.env_get handle captured variable access through the environment API. env_set_number stores an f64 into a slot (the runtime boxes it into a heap Number internally). env_get returns a pointer to the slot’s contents. We use env_set_number instead of the generic env_set (which takes void*) because passing an f64 through a void* parameter violates the C calling convention — on x86-64, f64 is passed in an XMM register while void* goes in a general-purpose register.

The environment pointer is passed as a parameter. @add takes %env: !llvm.ptr as its second argument. The caller extracts the environment pointer from the closure (at data offset +8, after the function_index field) before calling @add. This matches the calling convention established in Part 5: when a closure is called, the runtime reads the environment field from the closure object and passes it to the function. The closure function receives the environment pointer directly — it doesn’t need the function_index because it already knows which function it is.

Why pass the environment pointer instead of the whole closure? The closure function doesn’t need its own function_index — it already knows which function it is. The only thing it needs from the closure is the environment pointer, so that’s all the calling convention passes. A production compiler might pass the whole closure pointer and have the callee extract the environment, but the environment-only convention is simpler and matches how indirect calls work (the runtime reads the environment from the closure before the call).

Stage 5: Lowering (Lox Dialect → Standard + LLVM Dialects)

module {
  func.func @main() {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr

    // Call makeAdder directly — no closure allocation needed for top-level functions
    %five = arith.constant 5.0 : f64
    %add5_closure = func.call @makeAdder(%five) : (f64) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %add5_closure) : (!llvm.ptr, i32, !llvm.ptr) -> ()

    %three = arith.constant 3.0 : f64
    // Extract the environment from the closure before calling @add
    // (matches the calling convention from Part 5)
    %env_ptr_addr = llvm.getelementptr %add5_closure[8] : (!llvm.ptr) -> !llvm.ptr
    %add5_env = llvm.load %env_ptr_addr : !llvm.ptr -> !llvm.ptr
    %result = func.call @add(%three, %add5_env) : (f64, !llvm.ptr) -> f64
    func.call @lox_print(%result) : (f64) -> ()

    func.call @gc_pop_frame() : () -> ()
    func.return
  }

  func.func @makeAdder(%n: f64) -> !llvm.ptr {
    %root_count = arith.constant 1 : i32
    %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr

    // Allocate environment with 1 slot for captured variable 'n'
    %slot_count = arith.constant 1 : i64
    %null_env = llvm.mlir.nullptr : !llvm.ptr  // no enclosing environment
    %env = func.call @lox_runtime_alloc_environment(%slot_count, %null_env) : (i64, !llvm.ptr) -> !llvm.ptr
    %idx0 = arith.constant 0 : i32
    func.call @gc_set_root(%frame, %idx0, %env) : (!llvm.ptr, i32, !llvm.ptr) -> ()

    // Store n into environment slot 0 (env_set_number boxes f64 into heap Number)
    %slot_idx = arith.constant 0 : i64
    func.call @lox_runtime_env_set_number(%env, %slot_idx, %n) : (!llvm.ptr, i64, f64) -> ()

    // Allocate closure for 'add', pointing to the environment
    %func_idx = arith.constant 1 : i64  // function_index for @add
    %add_closure = func.call @lox_runtime_alloc_closure(%func_idx, %env) : (i64, !llvm.ptr) -> !llvm.ptr

    func.call @gc_pop_frame() : () -> ()
    func.return %add_closure : !llvm.ptr
  }

  func.func @add(%x: f64, %env: !llvm.ptr) -> f64 {
    // The environment pointer is passed directly (see calling convention in Part 5)
    // Load captured n from environment slot 0
    %slot_idx = arith.constant 0 : i64
    %n_ptr = func.call @lox_runtime_env_get(%env, %slot_idx) : (!llvm.ptr, i64) -> !llvm.ptr
    %n = llvm.load %n_ptr : !llvm.ptr -> f64
    %sum = arith.addf %x, %n : f64
    func.return %sum : f64
  }
}

The lox.* operations are gone, replaced by func.call to runtime functions and LLVM memory operations. Notice that the closure allocation now properly separates the environment (alloc_environment) from the closure struct (alloc_closure), matching the runtime API from Part 5. Captured variables go through env_set_number / env_get — not raw llvm.store into the closure’s data area.

Stage 6: LLVM IR

define ptr @makeAdder(double %n) {
entry:
  %frame = call ptr @gc_push_frame(i32 1)
  ; Allocate environment with 1 slot for captured variable 'n'
  %env = call ptr @lox_runtime_alloc_environment(i64 1, ptr null)  ; 1 slot, no enclosing env
  call void @gc_set_root(ptr %frame, i32 0, ptr %env)
  ; Store n into environment slot 0 (boxes f64 into heap Number)
  call void @lox_runtime_env_set_number(ptr %env, i64 0, double %n)
  ; Allocate closure pointing to the environment
  %closure = call ptr @lox_runtime_alloc_closure(i64 1, ptr %env)  ; function_index=1 for @add
  call void @gc_pop_frame()
  ret ptr %closure
}

define double @add(double %x, ptr %env) {
entry:
  ; The environment pointer is passed directly (see calling convention in Part 5)
  ; Load captured n from environment slot 0
  %n_ptr = call ptr @lox_runtime_env_get(ptr %env, i64 0)
  %n = load double, ptr %n_ptr
  %sum = fadd double %x, %n
  ret double %sum
}

Notice how the closure becomes a simple pointer in LLVM IR. The %env argument to @add is a ptr — LLVM doesn’t know or care that it points to a GC-managed closure environment. The GC semantics are entirely in the runtime (gc_push_frame, gc_set_root, gc_pop_frame). LLVM sees a pointer, does pointer arithmetic, and moves on.

The LLVM IR now properly separates the environment from the closure. makeAdder allocates an environment, stores the captured variable into it, and then allocates a closure that points to the environment. In @add, the captured variable is accessed through env_get — not by writing directly into the closure’s data area. This matches the runtime API and avoids out-of-bounds writes: the closure struct only has two fields (function_index at offset +0 and environment at offset +8), and captured variables live in the separate environment object.

What’s different from the control flow example?

The closure example has three functions instead of one — each gets its own func.func, its own GC frame, and its own root set. makeAdder allocates an environment (rooting it before any further allocations), stores the captured variable into it, allocates a closure pointing to that environment, and returns the closure. The caller (main) roots the returned closure so the GC doesn’t collect it after makeAdder’s frame is popped, then extracts the environment pointer from the closure before calling @add — this matches the calling convention from Part 5, where the runtime reads the environment field from the closure object and passes it directly. The @add function receives the environment pointer as its second argument and uses env_get to access captured variables. Each function pushes its own GC frame and pops it before returning — roots are scoped to the function that registered them. makeAdder’s roots are popped when it returns; main must root the returned closure in its own frame.

Debugging Your GC

Common Bugs

1. Use After Free

Symptom: Random crashes, corrupted data
Cause: Object freed while still referenced
Fix: Check that all references are registered as roots

2. Memory Leaks

Symptom: Memory grows unbounded
Cause: Objects not being freed
Fix: Check mark phase visits all reachable objects

3. Premature Collection

Symptom: Object disappears mid-function
Cause: Root not registered in shadow stack
Fix: Ensure gc_set_root is called for all local references

Debugging Tools

// Pseudocode — debug utilities implemented in C, matching the runtime
// (liblox_runtime.a). These walk the same global data structures the GC
// uses: the all-objects linked list and the shadow stack.

/* Print all objects in the heap (for debugging) */
void gc_debug_print_heap(void) {
    ObjHeader *current = all_objects;  // global linked list head
    printf("=== HEAP CONTENTS ===\n");
    while (current != NULL) {
        printf("  %p: type=%d, marked=%d, size=%zu\n",
               (void*)current,
               current->obj_type,
               current->marked,
               current->size);
        current = current->next;
    }
}

/* Print the shadow stack (for debugging) */
void gc_debug_print_stack(void) {
    ShadowFrame *current = shadow_stack_head;  // global stack head
    int frame_num = 0;
    printf("=== SHADOW STACK ===\n");
    while (current != NULL) {
        printf("  Frame %d (%zu roots):\n", frame_num, current->root_count);
        for (size_t i = 0; i < current->root_count; i++) {
            printf("    [%zu] = %p\n", i, (void*)current->roots[i]);
        }
        current = current->next;
        frame_num++;
    }
}

Performance Considerations

Allocation Overhead

Operation	Time	Notes
`alloc()`	O(1)	Plus possible GC trigger
`gc_push_frame()`	O(1)	Allocate and link
`gc_pop_frame()`	O(1)	Unlink and free
`gc_collect()`	O(live objects)	Mark + sweep

Tuning Parameters

#![allow(unused)]
fn main() {
// Adjust these based on your workload

/// Trigger GC after this many allocations
const GC_THRESHOLD: usize = 1024;

/// Maximum objects before forced collection
const GC_MAX_OBJECTS: usize = 10000;

/// Enable debug output
const GC_DEBUG: bool = false;
}

Optimization Opportunities

Generational GC: Most objects die young. Only scan recent allocations.
Incremental GC: Don’t pause for full collection. Do it in small chunks.
Parallel GC: Use multiple threads for mark/sweep.
Escape Analysis: If an object doesn’t escape a function, put it on the stack.

What’s Next?

You now have a complete garbage collector. The immediate path forward is getting it running — compile the runtime, write a simple test program, and verify the GC handles the edge cases. Then add more object types: arrays, maps, and classes (which the next part covers in detail).

Beyond that, there are four directions for the collector itself. Generational collection splits the heap into a young generation (frequent, small collections) and an old generation (rare, large collections) — most objects die young, so collecting only the young generation is fast. Incremental collection breaks mark/sweep into small chunks to reduce pause times. Compacting collection moves objects to eliminate fragmentation — this requires precise GC, which we already have. Concurrent collection runs the GC in parallel with program execution, which is complex but powerful.

For further reading, Crafting Interpreters covers the GC chapter we skipped, the GC Handbook is the definitive reference, and MLIR’s documentation covers advanced dialect features. If you want relocating GC, LLVM’s statepoints documentation is the place to start.

Appendix A: Quick Reference Card

┌──────────────────────────────────────────────────────────────┐
│                    GC QUICK REFERENCE                        │
├──────────────────────────────────────────────────────────────┤
│                                                              │
│  MARK-SWEEP ALGORITHM                                        │
│  ─────────────────────                                       │
│  1. Mark all roots (shadow stack)                            │
│  2. Recursively mark referenced objects                      │
│  3. Sweep: free all unmarked objects                         │
│                                                              │
│  ROOTS                                                       │
│  ─────                                                       │
│  • Global variables                                          │
│  • Local variables (in shadow stack frames)                  │
│  • Temporaries (also in shadow stack)                        │
│                                                              │
│  OBJECT LIFECYCLE                                            │
│  ─────────────────                                           │
│  1. alloc() → Object on heap                                 │
│  2. gc_set_root() → Register in shadow stack                 │
│  3. Use object                                               │
│  4. gc_pop_frame() → Remove from shadow stack                │
│  5. gc_collect() → Free if unreachable                       │
│                                                              │
│  CLOSURES                                                    │
│  ────────                                                    │
│  • Captured vars → Environment (heap)                        │
│  • Closure = function pointer + environment pointer          │
│  • Mark closure → Mark environment → Mark captured vars      │
│                                                              │
│  MLIR OPS (conceptual)                                       │
│  ────────────────────                                        │
│  lox.alloc         → Allocate object                         │
│  lox.push_frame    → Push shadow stack frame                 │
│  lox.pop_frame     → Pop shadow stack frame                  │
│  lox.set_root      → Register root                           │
│                                                              │
│  These are the Lox dialect operations (defined in            │
│  lox_dialect.rs). Each lowers to a func.call to a runtime   │
│  function (e.g., lox.alloc → @lox_runtime_alloc) during      │
│  the lowering pass.                                           │
└──────────────────────────────────────────────────────────────┘

Appendix B: Error Messages

Error	Cause	Fix
`No frame to pop!`	`gc_pop_frame` called without matching `gc_push_frame`	Check function prologue/epilogue pairing
`Out of memory!`	Allocation failed	Check for memory leaks, tune GC threshold
`Variable not found: x`	Capture analysis failed	Check scope tracking in analyzer
`Object not marked during sweep`	Internal GC assertion (not user-facing)	Check mark phase — all reachable objects should be marked before sweep

Appendix C: Study Roadmap

Use this checklist to track your progress through the GC tutorial:

Phase 1: Understanding (Read-Only)

Part 2: Understand the GC problem, mark-sweep algorithm, and object allocation
Part 3: Understand shadow stacks, root tracking, and the full mark/sweep cycle
Part 4: Understand the MLIR dialect, lowering, and function code generation
Part 5: Understand closure capture, environments, and closure code generation

Phase 2: Implementation (Hands-On)

Create the runtime module structure
Implement ObjHeader and ObjType
Implement alloc() function
Implement shadow stack (StackFrame, gc_push_frame, gc_pop_frame)
Implement mark phase (gc_mark, mark_object, mark_references)
Implement sweep phase (gc_sweep)
Implement gc_collect() entry point
Test allocation without GC
Test mark phase with simple objects
Test full GC cycle
Add debug utilities (gc_debug_print_heap, etc.)
Implement environment allocation
Implement closure allocation
Test closure GC

Phase 3: MLIR Integration

Set up Melior dependency
Define Lox dialect operations
Implement lowering pass
Generate MLIR for simple functions
Lower to LLVM IR
Link with runtime
Test end-to-end execution

Phase 4: Advanced Features

Add string type
Add instance type
Implement nested closure environments
Add generational GC (optional)
Add incremental collection (optional)

Appendix D: Glossary

Term	Definition
Heap	The pool of memory where objects are allocated
Stack	Memory for function calls and local variables
Root	A pointer that the GC treats as always live (globals, locals)
Reachable	An object that can be found by following pointers from roots
Mark	To flag an object as “still live” during GC
Sweep	To free all objects that weren’t marked
Shadow Stack	A separate stack that tracks GC roots for each function call
Environment	A heap-allocated structure holding captured variables for closures
Closure	A function together with its captured environment
Dialect	A set of operations in MLIR for a specific domain
Lowering	Converting high-level IR to lower-level IR

Appendix E: Further Reading

Books

“Crafting Interpreters” by Bob Nystrom - Chapter on garbage collection
“The Garbage Collection Handbook” by Jones et al. - Comprehensive GC reference
“Engineering a Compiler” by Cooper & Torczon - GC in the context of compilers

Papers

“Uniprocessor Garbage Collection Techniques” - Wilson (1992) - Classic survey
“A Unified Theory of Garbage Collection” - Bacon et al. (2004) - Theoretical foundation

Code

Lua 5.4 source - Simple, readable GC implementation
MicroPython - Mark-sweep GC for embedded systems
mruby - Lightweight Ruby with simple GC

MLIR/LLVM

MLIR Documentation - https://mlir.llvm.org/docs/
Melior Repository - Rust bindings for MLIR
LLVM GC Documentation - https://llvm.org/docs/GarbageCollection.html

Conclusion

The GC’s algorithm is three steps: track what’s reachable, mark everything reachable, free everything else. Everything else — generational collection, incremental collection, compacting collectors — is optimization on top of those three steps.

But the GC is only one piece. This chapter showed the whole system: source text becomes an AST, the AST becomes Lox-dialect MLIR, the Lox dialect lowers to runtime calls and standard MLIR, standard MLIR lowers to LLVM IR, and LLVM IR becomes machine code. The Lox dialect is the narrow waist — the only part that knows about Lox’s semantics. Everything below it is generic infrastructure. That’s the payoff for the layered approach: you write the interesting part, and MLIR handles the boring part.

File Summary

File	Content
Part 2	GC concepts, object headers, allocation
Part 3	Shadow stack, mark phase, sweep phase
Part 4	MLIR dialect, lowering to LLVM
Part 5	Closures and environments

You now have the full picture. Time to add the types that make Lox more than a calculator.

Next: Part 7 — Classes and Instances — Every value in our compiler is an f64. That works for numbers, but Lox has classes, instances, and strings — and they can’t be represented as a single float. It’s time to graduate to tagged unions: an (i8, i64) struct where the tag says what the value is and the payload holds the bits. Every arith.addf becomes tag-check-extract-compute-repack — but only when the tag says it’s a number.

MLIR for Lox: Part 7 — Classes Without a Type System — Compiling Dynamic Dispatch in MLIR

Classes are the last major feature in Crafting Interpreters. They combine everything we’ve built — heap allocation, GC roots, closures — and add a new layer: method dispatch, inheritance, and this binding. Every concept from Parts 2–6 shows up again here, but harder: objects reference other objects (GC marking chains), methods capture this (closure environments), and inheritance walks a linked list of class objects (pointer chasing across the heap).

This part assumes you’ve read Parts 2–6. We’ll extend the GC runtime from Parts 2–3 and the MLIR codegen from Part 4.

What We’re Building

By the end, this Lox program should work:

class Doughnut {
  cook(flavor) {
    print "Frying " + flavor + " doughnut";
  }
}

class FilledDoughnut < Doughnut {
  cook(flavor) {
    super.cook(flavor);
    print "Injecting " + flavor + " filling";
  }
}

var d = FilledDoughnut();
d.cook("custard");

Output:

Frying custard doughnut
Injecting custard filling

The Object Model

Lox has three class-related object types:

Object	What It Holds	Analogy
`ObjClass`	Name, methods, superclass	A blueprint
`ObjInstance`	Class pointer, field table	A house built from the blueprint
`ObjBoundMethod`	Receiver + closure	A method “bound” to an instance

These are all heap-allocated, GC-managed objects. They extend the ObjHeader we built in Part 2.

Updated Object Types

#![allow(unused)]
fn main() {
// src/runtime/object.rs

use std::cell::UnsafeCell;
use std::collections::HashMap;
use crate::runtime::value::LoxValue;
use crate::runtime::heap::{ObjHeader, ObjType};

/// A Lox class object
pub struct ObjClass {
    pub header: ObjHeader,
    pub name: String,
    /// Methods defined directly on this class (not inherited)
    pub methods: HashMap<String, LoxValue>,
    /// Superclass, if any (null for root classes)
    pub superclass: *mut ObjClass,
}

impl ObjClass {
    /// Look up a method, walking the inheritance chain
    pub fn find_method(&self, name: &str) -> Option<LoxValue> {
        if let Some(value) = self.methods.get(name) {
            return Some(value.clone());
        }
        // Walk up the inheritance chain
        if !self.superclass.is_null() {
            // SAFETY: GC guarantees the object is alive when we have a reference
            unsafe { (*self.superclass).find_method(name) }
        } else {
            None
        }
    }
}
}

Updated LoxValue Enum

This part adds three new variants to the LoxValue enum from Part 2. Here’s the cumulative version:

What are GcClass, GcInstance, GcBoundMethod? The Gc<T> wrapper tells the garbage collector that these values live on the heap and need to be traced. Raw pointers are invisible to the GC — if the collector moves or frees the underlying object, you get a dangling pointer. Gc<T> wraps a raw pointer and implements Trace so the collector can walk the object graph.

These are type aliases:
#![allow(unused)]
fn main() {
type GcClass = Gc<ObjClass>;
type GcInstance = Gc<ObjInstance>;
type GcBoundMethod = Gc<ObjBoundMethod>;
}
The aliases exist because writing Gc<ObjClass> everywhere gets old fast. You’ll see the same pattern from Parts 2 (GcString) and 5 (GcClosure).

#![allow(unused)]
fn main() {
// src/runtime/value.rs

#[repr(C)]
#[derive(Debug, Clone)]
pub enum LoxValue {
    Nil,
    Bool(bool),
    Number(f64),
    String(GcString),       // Part 2: heap-allocated string (GC object)
    Closure(GcClosure),     // Part 5: closure with captured environment
    Instance(GcInstance),   // Part 7: class instance
    Class(GcClass),         // Part 7: class object
    BoundMethod(GcBoundMethod), // Part 7: method bound to a receiver
}

impl LoxValue {
    /// Create a bound method value
    ///
    /// `method` is a LoxValue (specifically LoxValue::Closure) rather than a raw
    /// pointer — the GC needs to trace through it. The underlying closure is
    /// wrapped in LoxValue so the collector can walk the object graph.
    pub fn bound_method(receiver: *mut ObjInstance, method: LoxValue) -> Self {
        LoxValue::BoundMethod(GcBoundMethod::new(receiver, method))
    }
}
}

Where does the allocation happen? ObjInstance::get_property calls bound_method when it finds a method on the class and needs to bind it to the receiver. GcBoundMethod::new() allocates the ObjBoundMethod on the GC heap and returns the Gc wrapper. If you’re building the runtime incrementally, you can use Heap::bind_method instead (shown below) — that version takes &mut Heap and calls self.allocate() directly, which is more explicit about the allocation. The two paths produce the same result; the difference is whether the allocation happens through the Gc::new constructor or through an explicit Heap::allocate call.

#![allow(unused)]
fn main() {
use std::collections::HashMap;

/// A method bound to a specific receiver instance
///
/// When you write `instance.method()`, the runtime needs to combine the method's
/// closure with the instance that receives the call (the `this` value). This struct
/// holds both pieces — a pointer to the receiver and the closure that implements the
/// method body. The GC can trace through both: the receiver pointer is a heap object,
/// and the method's `LoxValue` tag tells the collector whether it contains a heap reference.
pub struct ObjBoundMethod {
    pub header: ObjHeader,
    /// The instance that receives the method call (the `this` value)
    pub receiver: *mut ObjInstance,
    /// The underlying closure
    pub method: LoxValue,
}

/// A Lox instance object
pub struct ObjInstance {
    pub header: ObjHeader,
    /// The class this instance belongs to
    pub class: *mut ObjClass,
    /// Instance fields (set at runtime, not on the class)
    ///
    /// Why `UnsafeCell`? `get_property` takes `&self` (shared reference),
    /// but `set_property` needs to mutate the fields. In normal Rust,
    /// `&self` means "no mutation." `UnsafeCell` is Rust's opt-out mechanism
    /// for interior mutability — it's the only way to soundly mutate through
    /// a shared reference. The actual `&mut` extraction still requires an
    /// `unsafe` block (as shown in `get_property` and `set_property` below),
    /// but `UnsafeCell` makes it *legal* — without it, any such mutation
    /// would be undefined behavior, even inside `unsafe`.
    ///
    /// The GC code holds `&ObjInstance` while tracing (read-only) and needs
    /// `&mut HashMap` when setting fields. `UnsafeCell` makes this possible
    /// without restructuring the API into `RefCell<HashMap>` (which would
    /// add runtime borrow-check overhead on every field access).
    pub fields: UnsafeCell<HashMap<String, LoxValue>>,
}

impl ObjInstance {
    pub fn new(class: *mut ObjClass) -> Self {
        Self {
            header: ObjHeader::new(ObjType::Instance),
            class,
            fields: UnsafeCell::new(HashMap::new()),
        }
    }

    /// Get a field value, or look up a method on the class
    pub fn get_property(&self, name: &str) -> Option<LoxValue> {
        // Fields shadow methods
        let fields = unsafe { &*self.fields.get() };
        if let Some(value) = fields.get(name) {
            return Some(value.clone());
        }

        // Look up method on the class
        unsafe {
            (*self.class).find_method(name).map(|method| {
                // Bind the method to this instance
                //
                // The *const → *mut cast looks wrong, but it's a standard
                // pattern in GC-managed code: `self` is `&ObjInstance`, so
                // we can only derive a *const pointer. ObjBoundMethod needs
                // *mut to match the receiver field's type. This is safe
                // because the GC manages the lifetime — the pointer is only
                // dereferenced while the object is alive and the GC isn't
                // collecting.
                //
                // **Soundness caveat:** This only works if `self` refers to
                // a GC-managed heap object (e.g., `self` comes from
                // dereferencing a `Gc<ObjInstance>`). If someone constructs
                // an `ObjInstance` on the stack and calls `get_property`,
                // the resulting pointer will dangle after the stack frame
                // returns. In a complete implementation, `get_property`
                // would receive `&GcInstance` and call `GcInstance::as_ptr()`
                // instead of casting `self` — that guarantees the pointer
                // points to a GC-managed heap object.
                LoxValue::bound_method(self as *const ObjInstance as *mut ObjInstance, method)
            })
        }
    }

    /// Set a field value
    pub fn set_property(&self, name: String, value: LoxValue) {
        let fields = unsafe { &mut *self.fields.get() };
        fields.insert(name, value);
    }
}
}

Updated ObjType Enum

Each part adds new object types. Here’s the cumulative enum after Part 7 — every variant from Parts 2 and 5 is still here, with Class and BoundMethod appended:

#![allow(unused)]
fn main() {
// src/runtime/object.rs (continued)

#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ObjType {
    Number = 0,       // Introduced in Part 2 (GC)
    String = 1,       // Introduced in Part 2 (GC)
    Environment = 2,  // Introduced in Part 5 (closures)
    Closure = 3,      // Introduced in Part 5 (closures)
    Instance = 4,     // Part 7
    Class = 5,        // Part 7
    BoundMethod = 6,  // Part 7
}
}

Why explicit discriminants? The discriminant values are the obj_type byte stored in every heap object’s header. If you change the numbering, the GC’s trace_object dispatch (which reads header.obj_type and matches on the numeric value) will trace the wrong object type. Explicit discriminants make this contract visible.

What’s Number doing here? Number is a stack value, not a heap object — the GC never traces it. So why is it in ObjType? Because the GC runtime is simpler if every variant gets a match arm, even if some arms are no-ops. The GC’s match arm for Number does nothing (there are no outgoing references to trace).

Don’t confuse ObjType discriminants with compiled value tags. These are two different numbering schemes for two different purposes:
ObjType (GC dispatch)          Compiled value tags (codegen)
─────────────────────          ──────────────────────────────
0 = Number  (stack value)      0 = Nil     (stack value)
1 = String  (heap object)      1 = Bool    (stack value)
2 = Environment (GC-internal)   2 = Number  (stack value)
3 = Closure (heap object)      3 = String  (heap pointer)
4 = Instance (heap object)     4 = Closure (heap pointer)
5 = Class   (heap object)      5 = Instance(heap pointer)
6 = BoundMethod (heap object)  6 = Class   (heap pointer)
                                7 = BoundMethod (heap pointer)
Two differences: the compiled tags include Nil and Bool (they’re Lox values the codegen must represent, even though they’re not heap objects), and they exclude Environment (it’s a GC-internal type that never appears as a Lox value). When compiling trace_object, convert ObjType discriminants to compiled value tags — don’t use them interchangeably.

GC Tracing for Classes

Every new object type needs to report its outgoing references. Miss one and you get use-after-free.

Classes force a change to the GC API. Parts 2–6 used standalone mark_object and mark_references functions that operated on raw *mut u8 pointers. That works when every reference is a raw heap pointer. Classes break that assumption: ObjInstance has a raw *mut ObjClass pointer for its class reference, but ObjBoundMethod stores its method as a LoxValue tagged union. You can’t call the same trace function on both — a LoxValue might be Nil or Number (not a heap pointer at all), while a raw pointer is always a heap object. The fix: two trace methods. trace_object handles raw heap pointers. trace_value unwraps LoxValue tags and forwards heap variants to trace_object. Same tracing logic, but the type distinction is now explicit.

#![allow(unused)]
fn main() {
// src/runtime/gc.rs

impl GC {
    /// Trace a tagged LoxValue — dispatches based on the value's tag.
    ///
    /// Heap objects (String, Closure, Instance, Class, BoundMethod) get
    /// forwarded to `trace_object`. Stack values (Nil, Bool, Number) have
    /// no heap references — nothing to trace.
    ///
    /// This method exists because several object types store `LoxValue`
    /// fields (method tables, instance fields, bound methods). We can't
    /// call `trace_object` on a `LoxValue` directly because a `LoxValue`
    /// might be `Nil` or `Number` — not a heap pointer at all.
    pub fn trace_value(&mut self, value: LoxValue) {
        match value {
            LoxValue::Nil | LoxValue::Bool(_) | LoxValue::Number(_) => {
                // Stack values — no heap references to trace
            }
            LoxValue::String(s) => {
                self.trace_object(s.as_ptr() as *mut ObjHeader);
            }
            LoxValue::Closure(c) => {
                self.trace_object(c.as_ptr() as *mut ObjHeader);
            }
            LoxValue::Instance(i) => {
                self.trace_object(i.as_ptr() as *mut ObjHeader);
            }
            LoxValue::Class(c) => {
                self.trace_object(c.as_ptr() as *mut ObjHeader);
            }
            LoxValue::BoundMethod(b) => {
                self.trace_object(b.as_ptr() as *mut ObjHeader);
            }
        }
    }

    /// Trace all reachable objects from `obj`
    pub fn trace_object(&mut self, obj: *mut ObjHeader) {
        let header = unsafe { &mut *obj };
        if header.is_marked {
            return;
        }
        header.is_marked = true;

        match header.obj_type {
            ObjType::Number => {
                // Number is a stack value, not a heap object.
                // It's in the enum for completeness but the GC
                // never actually traces it — there are no outgoing references.
            }
            ObjType::String => {
                // Strings have no outgoing references
            }
            ObjType::Environment => {
                // Environment uses the prepended-header model from Parts 2–4:
                // the ObjHeader sits before the data, so we skip past it.
                // (ObjClass, ObjInstance, and ObjBoundMethod embed the header as
                // their first field, so the cast works directly. Environment and
                // Closure don't — they're allocated with the header prepended.)
                let data_ptr = unsafe { (obj as *const u8).add(std::mem::size_of::<ObjHeader>()) };
                let env = unsafe { &*(data_ptr as *const Environment) };
                // Trace the enclosing environment — if the outer
                // environment is only reachable through this pointer,
                // the GC must walk this edge or it could collect it.
                if !env.enclosing.is_null() {
                    self.trace_object(env.enclosing as *mut ObjHeader);
                }
                // Trace each variable slot.
                for i in 0..env.slot_count {
                    let slot_ptr = unsafe { *env.slots.as_ptr().add(i) };
                    if !slot_ptr.is_null() {
                        self.trace_object(slot_ptr as *mut ObjHeader);
                    }
                }
                // NOTE: The code above uses the *mut u8 slot model from Part 5,
                // where each slot is a raw heap pointer. If you've switched to
                // the tagged-union (i8, i64) model (described in the "What the
                // Generated MLIR Looks Like" section below), replace the null
                // check with a tag check: trace the slot only when the tag byte
                // indicates a heap object (TAG_STRING, TAG_CLOSURE, etc.).
                // Nil, Bool, and Number tags have no heap references.
            }
            ObjType::Closure => {
                // Same prepended-header model as Environment above
                let data_ptr = unsafe { (obj as *const u8).add(std::mem::size_of::<ObjHeader>()) };
                let closure = unsafe { &*(data_ptr as *const Closure) };
                // Trace the captured environment. Part 5's Closure struct has
                // two fields: `function_index` and `environment` (a single
                // pointer to the captured Environment). The environment's slots
                // may contain heap references (strings, other closures), so we
                // trace each slot.
                if !closure.environment.is_null() {
                    let env = unsafe { &*closure.environment };
                    for i in 0..env.slot_count {
                        let slot_ptr = unsafe { *env.slots.as_ptr().add(i) };
                        if !slot_ptr.is_null() {
                            // In the tagged-union model, check the tag before
                            // tracing — only heap objects (String, Closure,
                            // Instance, Class, BoundMethod) need marking.
                            // Nil, Bool, and Number are stack values with no
                            // heap references.
                            self.trace_object(slot_ptr as *mut ObjHeader);
                        }
                    }
                }
                // A full implementation adds an `upvalues` array to the closure
                // — one entry per captured variable — so that the GC can trace
                // each captured value independently. The `environment` pointer
                // from Part 5 is equivalent to a single upvalue (an environment
                // is a flat array of captured values), but real closures often
                // capture from multiple enclosing scopes, which requires
                // separate upvalue objects. The trace logic is the same either
                // way: walk the captured values and mark them. With upvalues:
                //
                //     for &upvalue_ptr in &closure.upvalues {
                //         self.trace_value(unsafe { (*upvalue_ptr).closed });
                //     }
            }
            ObjType::Class => {
                let class = unsafe { &*(obj as *const ObjClass) };
                // Trace method values
                for value in class.methods.values() {
                    self.trace_value(value.clone());
                }
                // Trace superclass
                if !class.superclass.is_null() {
                    self.trace_object(class.superclass as *mut ObjHeader);
                }
            }
            ObjType::Instance => {
                let instance = unsafe { &*(obj as *const ObjInstance) };
                // Trace the class reference
                self.trace_object(instance.class as *mut ObjHeader);
                // Trace all field values
                let fields = unsafe { &*instance.fields.get() };
                for value in fields.values() {
                    self.trace_value(value.clone());
                }
            }
            ObjType::BoundMethod => {
                let bound = unsafe { &*(obj as *const ObjBoundMethod) };
                // Trace the receiver instance
                self.trace_object(bound.receiver as *mut ObjHeader);
                // Trace the method closure
                self.trace_value(bound.method);
            }
        }
    }
}
}

Key insight: ObjClass traces its methods and its superclass. ObjInstance traces its class and all field values. ObjBoundMethod traces both the receiver and the closure. Every edge in the object graph must be walked.

`this` Binding

When a method is called, this refers to the receiver instance. We implement this the same way closures capture upvalues (Part 5) — the method’s closure has an implicit upvalue that points to this.

Implementation note: The compile_method code below compiles methods as regular functions for clarity. A complete implementation would add two implicit upvalue slots to each method’s closure: upvalue[0] = this (the receiver, bound at call time) and upvalue[1] = super (the superclass). The compile_this and compile_super generators assume these slots exist in the variables map — wiring them into the compiler is a straightforward extension of the closure capture logic from Part 5, but we leave the actual upvalue insertion as an exercise to keep the codegen example focused.

How It Works

When we create a class, each method closure gets an extra upvalue slot for this. When a method is bound to an instance (via ObjBoundMethod), we fill that slot with the instance pointer.

#![allow(unused)]
fn main() {
// src/runtime/bind.rs

use crate::runtime::object::{ObjBoundMethod, ObjInstance, ObjHeader, ObjType};
use crate::runtime::value::LoxValue;
use crate::runtime::heap::Heap;

impl Heap {
    /// Bind a method closure to a receiver instance
    pub fn bind_method(
        &mut self,
        receiver: *mut ObjInstance,
        method: LoxValue,
    ) -> *mut ObjBoundMethod {
        let bound = ObjBoundMethod {
            header: ObjHeader::new(ObjType::BoundMethod),
            receiver,
            method,
        };
        self.allocate(bound)
    }
}
}

The Method Call Protocol

When the VM encounters a method call like d.cook("custard"):

Evaluate d → get the ObjInstance pointer
Look up "cook" on the instance → get an ObjBoundMethod
Call the bound method’s closure with the provided arguments
Inside the closure, this resolves to the bound receiver

No vtable needed. Method dispatch is a hash map lookup that walks the superclass chain.

Inheritance: Linked Lists All the Way Down

Lox’s inheritance is single-inheritance only. That means the class hierarchy is a linked list:

FilledDoughnut → Doughnut → nil

When we look up a method, we walk the chain — the find_method method we defined earlier walks the superclass linked list, checking each class’s method table until it finds a match or reaches the end:

`super` Calls

A super.cook(flavor) expression needs two things:

The superclass of the enclosing class (not the receiver’s class)
The method name

We resolve super at compile time, not runtime. During codegen, when we’re inside a class method, we know which class we’re in and therefore what the superclass is. We store this as a hidden upvalue on the closure — the same mechanism as this.

#![allow(unused)]
fn main() {
// During compilation, inside a class method:
// The method closure gets two implicit upvalues:
//   [0] = this   (the receiver instance)
//   [1] = super  (the enclosing class's superclass)
}

This means super is free at runtime — no lookup needed. The superclass pointer is already captured in the closure.

MLIR Code Generation for Classes

Now the interesting part: generating MLIR for class declarations, instance creation, property access, and method calls.

New AST Nodes

#![allow(unused)]
fn main() {
// src/ast.rs (additions)

#[derive(Debug, Clone)]
pub enum Expr {
    // ... existing variants ...
    Get(GetExpr),
    Set(SetExpr),
    This(ThisExpr),
    Super(SuperExpr),
}

#[derive(Debug, Clone)]
pub struct GetExpr {
    pub location: Location,
    pub object: Box<Expr>,
    pub name: String,
}

#[derive(Debug, Clone)]
pub struct SetExpr {
    pub location: Location,
    pub object: Box<Expr>,
    pub name: String,
    pub value: Box<Expr>,
}

#[derive(Debug, Clone)]
pub struct ThisExpr {
    pub location: Location,
}

#[derive(Debug, Clone)]
pub struct SuperExpr {
    pub location: Location,
    pub method: String,
}

#[derive(Debug, Clone)]
pub enum Stmt {
    // ... existing variants ...
    Class(ClassStmt),
}

#[derive(Debug, Clone)]
pub struct ClassStmt {
    pub location: Location,
    pub name: String,
    pub superclass: Option<String>,
    pub methods: Vec<FunctionStmt>,
}
}

Runtime Calls as External Functions

Class operations are too complex for pure MLIR. We emit calls to runtime functions instead:

#![allow(unused)]
fn main() {
// src/codegen/classes.rs

use melior::{
    Context, Location,
    dialect::func,
    ir::{
        attribute::{FlatSymbolRefAttribute, StringAttribute, TypeAttribute},
        r#type::FunctionType,
        Region, Type, Value, Block, BlockLike,
    },
};
use crate::codegen::types::lox_value_type;
// FlatSymbolRefAttribute, Value, and Block are used by compile_class, compile_method,
// and compile_get/compile_set below. They're included here because these functions
// live in the same module.

/// Declare runtime functions needed for class operations
pub fn declare_runtime_functions(context: &Context, module: &mut Module) {
    let location = Location::unknown(context);
    let lox_val = lox_value_type(context);

    // lox.create_class(name_ptr: !llvm.ptr, superclass: lox_val) -> lox_val
    let create_class_type = FunctionType::new(
        context,
        &[Type::parse(context, "!llvm.ptr").unwrap(), lox_val],
        &[lox_val],
    );
    declare_external(module, context, "lox_create_class", create_class_type, location);

    // lox.instance_from_class(class: lox_val) -> lox_val
    let instance_type = FunctionType::new(context, &[lox_val], &[lox_val]);
    declare_external(module, context, "lox_instance_from_class", instance_type, location);

    // lox.get_property(instance: lox_val, name_ptr: !llvm.ptr) -> lox_val
    let get_prop_type = FunctionType::new(
        context,
        &[lox_val, Type::parse(context, "!llvm.ptr").unwrap()],
        &[lox_val],
    );
    declare_external(module, context, "lox_get_property", get_prop_type, location);

    // lox.set_property(instance: lox_val, name_ptr: !llvm.ptr, value: lox_val) -> lox_val
    let set_prop_type = FunctionType::new(
        context,
        &[lox_val, Type::parse(context, "!llvm.ptr").unwrap(), lox_val],
        &[lox_val],
    );
    declare_external(module, context, "lox_set_property", set_prop_type, location);

    // lox.bind_method(receiver: lox_val, method: lox_val) -> lox_val
    let bind_method_type = FunctionType::new(context, &[lox_val, lox_val], &[lox_val]);
    declare_external(module, context, "lox_bind_method", bind_method_type, location);

    // lox.set_method(class: lox_val, name_ptr: !llvm.ptr, method: lox_val) -> lox_val
    // Attaches a compiled method to a class object's method table
    let set_method_type = FunctionType::new(
        context,
        &[lox_val, Type::parse(context, "!llvm.ptr").unwrap(), lox_val],
        &[lox_val],
    );
    declare_external(module, context, "lox_set_method", set_method_type, location);

    // lox.super_lookup(superclass: lox_val, name_ptr: !llvm.ptr, this: lox_val) -> lox_val
    // Walks the class hierarchy starting from the superclass, finds the method,
    // and binds `this` as the receiver — used for `super` method calls
    let super_lookup_type = FunctionType::new(
        context,
        &[lox_val, Type::parse(context, "!llvm.ptr").unwrap(), lox_val],
        &[lox_val],
    );
    declare_external(module, context, "lox_super_lookup", super_lookup_type, location);

    // lox.call(callee: lox_val, arg: lox_val) -> lox_val
    // Invokes a Lox closure — loads the function pointer and environment from
    // the closure object, then performs an indirect call through the function table
    let call_type = FunctionType::new(context, &[lox_val, lox_val], &[lox_val]);
    declare_external(module, context, "lox_call", call_type, location);
}

fn declare_external(
    module: &mut Module,
    context: &Context,
    name: &str,
    fn_type: FunctionType,
    location: Location,
) {
    module.body().append_operation(func::func(
        context,
        StringAttribute::new(context, name),
        TypeAttribute::new(fn_type.into()),
        Region::new(),
        &[],
        location,
    ));
}
}

Compiling Class Declarations

#![allow(unused)]
fn main() {
// src/codegen/generator.rs (additions)
use std::collections::HashMap;

impl<'c> CodeGenerator<'c> {
    fn compile_class(&self, class: &ClassStmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = self.loc(class.location);

        // 1. Resolve superclass (if any)
        let superclass_val = if let Some(super_name) = &class.superclass {
            // Look up the superclass variable — it should be a class object
            self.compile_variable(&VariableExpr {
                location: class.location,
                name: super_name.clone(),
            }, variables)
        } else {
            self.compile_nil(block)
        };

        // 2. Create a global string constant for the class name
        let name_global = self.create_string_constant(&class.name);

        // 3. Call runtime: lox_create_class(name, superclass)
        let lox_val = lox_value_type(self.context);
        let create_class_op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_create_class"),
            &[name_global, superclass_val],
            &[lox_val],
            location,
        );

        let class_val = block.append_operation(create_class_op)
            .result(0).unwrap().into();

        // 4. Store each method on the class
        for method in &class.methods {
            self.compile_method(block, &class.name, class_val, method, variables);
        }

        // 5. Store the class object as a variable
        variables.insert(class.name.clone(), class_val);
    }

    fn compile_method(&self, block: &Block<'c>, class_name: &str, class_val: Value<'c, 'c>, method: &FunctionStmt, variables: &mut HashMap<String, Value<'c, 'c>>) {
        let location = self.loc(method.location);

        // Mangle the method name: Doughnut.cook → Doughnut_cook
        // Two classes can both have a `cook` method, so the MLIR
        // function name must be unique within the module.
        let mangled = format!("{}_{}", class_name, method.name);

        // Compile the method body. We compile it as a regular function
        // here — a complete implementation would add two implicit upvalues
        // (this, super) before compiling the body. See "What We're Simplifying"
        // for the full explanation of what's omitted and why.
        //
        // We pass the mangled name so `compile_function` creates `@Doughnut_cook`
        // instead of `@cook`. The variables map stores the value under the
        // mangled key, so we look it up the same way below.
        let mut method_with_name = method.clone();
        method_with_name.name = mangled.clone();
        self.compile_function(&method_with_name, variables);

        // Retrieve the compiled method's value from the variables map.
        // `compile_function` creates the `func.func` operation in the module
        // and stores a tagged-union LoxValue (TAG_CLOSURE with the function
        // pointer as payload) in the variables map under the function's name —
        // which is now the mangled name. This tagged value is what
        // `lox_set_method` receives: a `(i8, i64)` pair where tag=4 means
        // closure and the i64 payload points to the compiled function.
        let method_val = variables.get(&mangled).copied()
            .expect("compile_method: method not found in variables after compile_function");
        
        // The property name on the class uses the *original* method name,
        // not the mangled one. `d.cook("custard")` looks up "cook",
        // not "Doughnut_cook".
        let name_global = self.create_string_constant(&method.name);
        
        let attach_op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_set_method"),
            &[class_val, name_global, method_val],
            &[lox_value_type(self.context)],
            location,
        );

        block.append_operation(attach_op);
    }
}
}

Compiling Property Access and Assignment

#![allow(unused)]
fn main() {
use std::collections::HashMap;

impl<'c> CodeGenerator<'c> {
    fn compile_get(&self, block: &Block<'c>, get: &GetExpr, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = self.loc(get.location);
        let object = self.compile_expression(&get.object, block, variables);

        // Create a global string constant for the property name
        let name_global = self.create_string_constant(&get.name);

        // Call runtime: lox_get_property(instance, "name")
        let op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_get_property"),
            &[object, name_global],
            &[lox_value_type(self.context)],
            location,
        );

        block.append_operation(op).result(0).unwrap().into()
    }

    fn compile_set(&self, block: &Block<'c>, set: &SetExpr, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = self.loc(set.location);
        let object = self.compile_expression(&set.object, block, variables);
        let value = self.compile_expression(&set.value, block, variables);

        let name_global = self.create_string_constant(&set.name);

        // Call runtime: lox_set_property(instance, "name", value)
        let op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_set_property"),
            &[object, name_global, value],
            &[lox_value_type(self.context)],
            location,
        );

        // set expressions return the assigned value (like assignment)
        block.append_operation(op).result(0).unwrap().into()
    }
}
}

Compiling `this` and `super`

#![allow(unused)]
fn main() {
use std::collections::HashMap;

impl<'c> CodeGenerator<'c> {
    fn compile_this(&self, this: &ThisExpr, variables: &HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        // `this` is a variable lookup — it's stored as an upvalue
        // by the method binding mechanism.
        //
        // NOTE: This only works if `compile_method` has added "this"
        // to the variables map as an implicit upvalue (see the
        // "What We're Simplifying" section). The current `compile_method`
        // omits that wiring — calling `compile_this` will panic at
        // compile time with "Undefined variable: this" unless you
        // add the upvalue insertion described there.
        self.compile_variable(&VariableExpr {
            location: this.location,
            name: "this".to_string(),
        }, variables)
    }

    fn compile_super(&self, block: &Block<'c>, super_expr: &SuperExpr, variables: &HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
        let location = self.loc(super_expr.location);

        // `super.method` resolves to:
        // 1. Get the superclass from the implicit upvalue
        // 2. Look up the method on the superclass
        // 3. Bind it to `this`
        //
        // NOTE: Same caveat as `compile_this` — "super" must be in
        // the variables map as an implicit upvalue. Without it,
        // `compile_variable` panics with "Undefined variable: super".
        // The current `compile_method` doesn't add it. See
        // "What We're Simplifying."

        let super_class = self.compile_variable(&VariableExpr {
            location: super_expr.location,
            name: "super".to_string(),
        }, variables);

        let method_name = self.create_string_constant(&super_expr.method);
        let this_val = self.compile_this(&ThisExpr { location: super_expr.location }, variables);

        // Call runtime: lox_super_lookup(superclass, "method", this)
        let op = func::call(
            self.context,
            FlatSymbolRefAttribute::new(self.context, "lox_super_lookup"),
            &[super_class, method_name, this_val],
            &[lox_value_type(self.context)],
            location,
        );

        block.append_operation(op).result(0).unwrap().into()
    }
}
}

What the Generated MLIR Looks Like

A note before we look at the IR: this is the first part that uses the tagged union representation. Parts 1–6 compiled every value as bare f64 (the “numbers only” model). Classes break that model — you can’t represent a class instance, a bound method, or a string as a floating-point number. The tagged union represents every value as a struct with two fields: a tag byte (i8) that says what kind of value this is, and a payload word (i64) that holds the data. Here’s the mapping:

Tag  Value Type   Payload
───────────────────────────────
 0   Nil          (unused)
 1   Bool         0 or 1
 2   Number       f64 bits (stored as i64)
 3   String       pointer to heap string
 4   Closure      pointer to closure object
 5   Instance     pointer to instance object
 6   Class        pointer to class object
 7   BoundMethod  pointer to bound method

In MLIR, this is !llvm.struct<(i8, i64)>. Every function that produces a Lox value now returns this struct instead of f64. Every operation that was a single arith.addf becomes: check both tags → extract the f64 payloads → add → re-pack as (TAG_NUMBER, result). The core logic doesn’t change, but every operation now carries the bookkeeping of tag checking and repacking. Part 1 introduced this concept (see the “Dynamic Typing with Tagged Unions” subsection) and explained why the numbers-only model defers it; this is where we start using it.

Given our doughnut example from the top:

module {
  // Runtime function declarations
  func.func @lox_create_class(!llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>
  func.func @lox_instance_from_class(!llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>
  func.func @lox_make_string(!llvm.ptr) -> !llvm.struct<(i8, i64)>
  func.func @lox_get_property(!llvm.struct<(i8, i64)>, !llvm.ptr) -> !llvm.struct<(i8, i64)>
  func.func @lox_set_property(!llvm.struct<(i8, i64)>, !llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>
  func.func @lox_bind_method(!llvm.struct<(i8, i64)>, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>
  func.func @lox_set_method(!llvm.struct<(i8, i64)>, !llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

  // Global string constants
  llvm.mlir.global constant @str_0("Doughnut")
  llvm.mlir.global constant @str_1("cook")
  llvm.mlir.global constant @str_2("flavor")
  llvm.mlir.global constant @str_3("FilledDoughnut")
  llvm.mlir.global constant @str_4("custard")

  // Doughnut.cook method — note the mangled name `@Doughnut_cook`.
  // MLIR function names must be unique within the module. Two classes
  // can both have a `cook` method, so we prefix the class name.
  // The `compile_method` code mangles the name before calling
  // `compile_function`, and looks it up in the variables map the same way.
  func.func @Doughnut_cook(%arg0: !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)> {
    // print "Frying " + flavor + " doughnut"
    // ... (string concatenation via runtime calls)
    // %nil_tagged constructed as in @main: tag=0, payload left undef
    func.return %nil_tagged : !llvm.struct<(i8, i64)>
  }

  // FilledDoughnut.cook method — same mangling pattern
  func.func @FilledDoughnut_cook(%arg0: !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)> {
    // super.cook(flavor)
    // print "Injecting " + flavor + " filling"
    // ... (runtime calls)
    // %nil_tagged constructed as in @main: tag=0, payload left undef
    func.return %nil_tagged : !llvm.struct<(i8, i64)>
  }

  // Top-level code
  func.func @main() -> !llvm.struct<(i8, i64)> {
    %nil_tag = arith.constant 0 : i8  // nil's tag is 0 in the tagged union
    %nil_val = llvm.undef : !llvm.struct<(i8, i64)>
    %nil_tagged = llvm.insertvalue %nil_tag, %nil_val[0] : !llvm.struct<(i8, i64)>  // payload left undef — nil has no meaningful payload

    // Create Doughnut class
    %doughnut = func.call @lox_create_class(@str_0, %nil_tagged) : (!llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    // Attach Doughnut.cook method to the class
    // compile_method compiles the method as @Doughnut_cook, then calls
    // lox_set_method to store the closure in the class's method table.
    // Without this, lox_get_property would find an empty method table.
    %doughnut_cook_val = ...  // tagged union: tag=4 (TAG_CLOSURE), payload = pointer to @Doughnut_cook
    func.call @lox_set_method(%doughnut, @str_1, %doughnut_cook_val) : (!llvm.struct<(i8, i64)>, !llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    // Create FilledDoughnut class (inherits from Doughnut)
    %filled = func.call @lox_create_class(@str_3, %doughnut) : (!llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    // Attach FilledDoughnut.cook method to the subclass
    %filled_cook_val = ...  // tagged union: tag=4 (TAG_CLOSURE), payload = pointer to @FilledDoughnut_cook
    func.call @lox_set_method(%filled, @str_1, %filled_cook_val) : (!llvm.struct<(i8, i64)>, !llvm.ptr, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    // var d = FilledDoughnut()
    %d = func.call @lox_instance_from_class(%filled) : (!llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    // d.cook("custard")
    %method = func.call @lox_get_property(%d, @str_1) : (!llvm.struct<(i8, i64)>, !llvm.ptr) -> !llvm.struct<(i8, i64)>
    %custard_ptr = llvm.mlir.addressof @str_4 : !llvm.ptr
    // Wrap the string in a LoxValue: lox_make_string creates a tagged union
    // with tag=3 (TAG_STRING) and the string pointer as the payload.
    %custard = func.call @lox_make_string(%custard_ptr) : (!llvm.ptr) -> !llvm.struct<(i8, i64)>
    func.call @lox_call(%method, %custard) : (!llvm.struct<(i8, i64)>, !llvm.struct<(i8, i64)>) -> !llvm.struct<(i8, i64)>

    func.return %nil_tagged : !llvm.struct<(i8, i64)>
  }
}

The IR is verbose, but that’s the point — it’s an intermediate representation, not hand-written code. Each operation has clear semantics and the lowering passes can optimize it.

The C Runtime

The runtime functions are simple C — they operate on the same tagged union and heap we built in Parts 2–5. The C runtime uses arrays (method_count + methods[]) instead of Rust’s HashMap — C doesn’t have a hash map in the standard library, and the linear scan is fast enough for the small method tables you’d find in a Lox program.

Key types used below: FieldEntry is a simple key-value pair — typedef struct { const char* key; LoxValue value; } FieldEntry;. The gc_reallocate(ptr, old_size, new_size) function resizes a heap allocation, updating the GC’s internal bookkeeping (allocated bytes count, trigger threshold for the next collection). It’s the C equivalent of Rust’s Vec::push amortized-growth strategy — allocate more space than needed so the next few inserts don’t trigger another reallocation. The implementation lives in the companion repo’s runtime/gc.c — we don’t show it inline here because it’s a straightforward wrapper around realloc plus bookkeeping, and the interesting part is how callers use it (like the GC-safety pattern in lox_set_property below).

API note: The MLIR-declared lox_bind_method takes two LoxValue arguments (matching the tagged union type). The Rust implementation extracts the raw *mut ObjInstance from the receiver LoxValue before calling Heap::bind_method. The C runtime works directly with LoxValue arguments and uses AS_INSTANCE() to unwrap them.

// src/runtime/class_runtime.c

#include "runtime.h"
#include "gc.h"
#include <string.h>

// Create a new class object
LoxValue lox_create_class(const char* name, LoxValue superclass) {
    ObjClass* klass = gc_allocate(sizeof(ObjClass));
    klass->header.type = OBJ_CLASS;
    klass->header.is_marked = false;
    klass->name = strdup(name);
    klass->methods = NULL;       // empty method table (linear array, not a hash map)
    klass->method_count = 0;
    klass->superclass = IS_NIL(superclass) ? NULL : AS_CLASS(superclass);
    return MAKE_OBJ(klass);
}

// Create an instance of a class
LoxValue lox_instance_from_class(LoxValue class_val) {
    ObjClass* klass = AS_CLASS(class_val);
    ObjInstance* instance = gc_allocate(sizeof(ObjInstance));
    instance->header.type = OBJ_INSTANCE;
    instance->header.is_marked = false;
    instance->klass = klass;
    instance->fields = NULL;     // empty field array
    instance->field_count = 0;
    return MAKE_OBJ(instance);
}

// Get a property on an instance
LoxValue lox_get_property(LoxValue instance_val, const char* name) {
    ObjInstance* instance = AS_INSTANCE(instance_val);
    
    // Check fields first (fields shadow methods)
    for (int i = 0; i < instance->field_count; i++) {
        if (strcmp(instance->fields[i].key, name) == 0) {
            return instance->fields[i].value;
        }
    }
    
    // Look up method on the class
    ObjClass* klass = instance->klass;
    while (klass != NULL) {
        for (int i = 0; i < klass->method_count; i++) {
            if (strcmp(klass->methods[i].key, name) == 0) {
                // Bind the method to this instance
                return lox_bind_method(instance_val, klass->methods[i].value);
            }
        }
        klass = klass->superclass;
    }
    
    // Runtime error: undefined property
    fprintf(stderr, "Undefined property '%s'.\n", name);
    exit(1);
}

// Set a property on an instance
LoxValue lox_set_property(LoxValue instance_val, const char* name, LoxValue value) {
    ObjInstance* instance = AS_INSTANCE(instance_val);
    
    // Check if field already exists
    for (int i = 0; i < instance->field_count; i++) {
        if (strcmp(instance->fields[i].key, name) == 0) {
            instance->fields[i].value = value;
            return value;
        }
    }
    
    // Add new field
    // IMPORTANT: increment field_count AFTER writing the field.
    // If GC triggers inside gc_reallocate, the GC traces fields[0..field_count].
    // Incrementing before writing would expose an uninitialized FieldEntry
    // to the GC — a dangling key pointer and garbage LoxValue.
    // NOTE: This pattern is safe because our GC is mark-sweep (non-moving).
    // If gc_reallocate triggers a collection, the GC traces
    // fields[0..field_count] — which is correct because we haven't
    // incremented field_count yet. A moving collector would invalidate
    // the old pointer during reallocation, so this pattern would need
    // pinning or a different allocation strategy.
    instance->fields = gc_reallocate(
        instance->fields,
        instance->field_count * sizeof(FieldEntry),
        (instance->field_count + 1) * sizeof(FieldEntry)
    );
    instance->fields[instance->field_count].key = strdup(name);
    instance->fields[instance->field_count].value = value;
    instance->field_count++;
    return value;
}

// Bind a method to a receiver
LoxValue lox_bind_method(LoxValue receiver, LoxValue method) {
    ObjBoundMethod* bound = gc_allocate(sizeof(ObjBoundMethod));
    bound->header.type = OBJ_BOUND_METHOD;
    bound->header.is_marked = false;
    bound->receiver = AS_INSTANCE(receiver);
    bound->method = method;
    return MAKE_OBJ(bound);
}

// Set a method on a class's method table
LoxValue lox_set_method(LoxValue class_val, const char* name, LoxValue method) {
    ObjClass* klass = AS_CLASS(class_val);
    // Same GC-safety pattern as lox_set_property: increment method_count
    // AFTER writing the new entry, so the GC only traces initialized slots.
    klass->methods = gc_reallocate(
        klass->methods,
        klass->method_count * sizeof(FieldEntry),
        (klass->method_count + 1) * sizeof(FieldEntry)
    );
    klass->methods[klass->method_count].key = strdup(name);
    klass->methods[klass->method_count].value = method;
    klass->method_count++;
    return method;
}

// Look up a method starting from the superclass, then bind it to `this`
LoxValue lox_super_lookup(LoxValue superclass_val, const char* name, LoxValue this_val) {
    ObjClass* klass = AS_CLASS(superclass_val);
    // Walk the class hierarchy starting from the superclass
    while (klass != NULL) {
        for (int i = 0; i < klass->method_count; i++) {
            if (strcmp(klass->methods[i].key, name) == 0) {
                // Bind the found method to `this` as the receiver
                return lox_bind_method(this_val, klass->methods[i].value);
            }
        }
        klass = klass->superclass;
    }
    fprintf(stderr, "Undefined property '%s' on superclass.\n", name);
    exit(1);
}

// Call a Lox closure — loads the function pointer and environment from the
// closure object, then performs an indirect call.
//
// The actual implementation depends on your closure calling convention.
// At minimum, it:
//   1. Extracts the function pointer from the closure object
//   2. Passes the environment pointer as an implicit first argument
//   3. Calls through the function pointer with the remaining arguments
//
// A complete implementation is in the companion repo's runtime/closure.c.
LoxValue lox_call(LoxValue callee, LoxValue arg) {
    // See the companion repository for the full implementation.
    // The closure calling convention is covered in Part 5 (Closures).
    fprintf(stderr, "lox_call: not implemented in this excerpt\n");
    exit(1);
}

Why show lox_call as a stub? The closure calling convention (how we pass the environment, how the function pointer is stored in the closure object) is already covered in Part 5. Duplicating it here would add 30+ lines of pointer arithmetic without teaching anything new — the class-specific parts (lookup, binding) are in lox_super_lookup and lox_bind_method. If you’re building along, the companion repo has the full lox_call implementation.

What We’re Simplifying

This part makes several simplifications that a production compiler would handle differently:

Method compilation doesn’t wire this and super upvalues. The compile_method code compiles each method as a regular function. A complete implementation would add two implicit upvalue slots — upvalue[0] = this and upvalue[1] = super — before compiling the method body, and insert them into the compiler’s variable map so that compile_this and compile_super can find them by name. The implementation note in the this Binding section above describes this wiring; adding it to the codegen example would double the code without teaching a new concept — it’s the same closure capture logic from Part 5 applied to two more variables.

No vtable or inline caches. Every method dispatch does a linear scan through the class hierarchy. For small programs this is fine. For programs with deep inheritance chains or hot call sites, you’d want a vtable (to turn the scan into an index lookup) or inline caches (to remember which method a call site resolved to last time). The “Why No VTable?” section below explains the tradeoff.

Two runtime calls per method invocation. lox_get_property returns a bound method, then lox_call invokes it. Each function does one job — find the method, call the closure — which makes them easier to understand and debug. The tradeoff is a temporary ObjBoundMethod allocation per method call. A production compiler would combine these into a single lox_invoke call to avoid the allocation. See the Design Decisions section for the full tradeoff analysis.

C runtime uses linear arrays instead of hash maps. The Rust ObjClass stores methods in a HashMap<String, LoxValue>, but the C runtime uses a methods[] array with linear lookup. C doesn’t have a hash map in the standard library, and the linear scan is fast enough for the small method tables in a Lox program. A real implementation would use a hash map (or a sorted array with binary search) once method count exceeds ~10–20 entries.

Design Decisions and Trade-offs

Why Runtime Calls Instead of Pure MLIR?

You could emit pure MLIR for class operations — llvm.alloca for field storage, llvm.insertvalue/llvm.extractvalue for field access, etc. But:

Approach	Pros	Cons
Runtime calls	Simple, correct, GC-aware	Can’t optimize across boundary
Pure MLIR	Optimizable, no FFI overhead	Must teach MLIR about GC roots, field layout, dispatch

For a tutorial, runtime calls are the right call. A production compiler would progressively move more into MLIR as it proves correctness. Start simple, optimize later.

Why No VTable?

VTables are an optimization for static dispatch. Lox’s dispatch is dynamic — methods can be added at runtime, classes are first-class values. A hash map lookup per dispatch is the honest representation. If profiling shows it’s a bottleneck, you add inline caches later.

Why Not Combine Lookup and Call?

Two separate calls — lox_get_property then lox_call — mean two separate jobs: one function finds the method, another calls it. That’s easier to understand, easier to debug, and means lox_get_property works the same whether you’re calling the result or storing it. The cost is a temporary ObjBoundMethod allocation per method call. A production compiler would merge these into a single lox_invoke(instance, method_name, args) that skips the allocation, but the savings only matter in hot loops — for a tutorial, two functions that each do one thing is the right call.

Full Update to the Expression Compiler

Adding the new expression types to the main compile_expression dispatch:

#![allow(unused)]
fn main() {
// src/codegen/generator.rs (updated match arm)
use std::collections::HashMap;

fn compile_expression(&self, expr: &Expr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
    match expr {
        Expr::Binary(b) => self.compile_binary(b, block, variables),
        Expr::Unary(u) => self.compile_unary(u, block, variables),
        Expr::Literal(l) => self.compile_literal(l, block, variables),
        Expr::Grouping(g) => self.compile_expression(&g.expr, block, variables),
        Expr::Variable(v) => self.compile_variable(v, variables),
        Expr::Assign(a) => self.compile_assign(a, block, variables),
        Expr::Call(c) => self.compile_call(c, block, variables),
        Expr::Logical(l) => self.compile_logical(l, block, variables),
        Expr::Get(g) => self.compile_get(g, block, variables),
        Expr::Set(s) => self.compile_set(s, block, variables),
        Expr::This(t) => self.compile_this(t, variables),
        Expr::Super(s) => self.compile_super(s, block, variables),
    }
}
}

Testing

Unit Tests for Method Lookup

These tests create objects on the stack, not the GC heap. When the test function returns, the stack pointers dangle. This is fine for unit tests — all access happens within the function scope — but don’t copy this pattern into production code. A complete implementation would use Heap::allocate to get GcClass and GcInstance values back.

#![allow(unused)]
fn main() {
use std::collections::HashMap;

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn method_lookup_walks_inheritance() {
        let mut base_class = ObjClass {
            header: ObjHeader::new(ObjType::Class),
            name: "Base".to_string(),
            methods: HashMap::new(),
            superclass: std::ptr::null_mut(),
        };
        base_class.methods.insert("greet".to_string(), LoxValue::Nil);

        let mut derived_class = ObjClass {
            header: ObjHeader::new(ObjType::Class),
            name: "Derived".to_string(),
            methods: HashMap::new(),
            superclass: &base_class as *const ObjClass as *mut ObjClass,
        };

        // Derived doesn't have "greet", but Base does
        assert!(derived_class.find_method("greet").is_some());
        // Derived doesn't have "missing" and neither does Base
        assert!(derived_class.find_method("missing").is_none());
    }

    #[test]
    fn fields_shadow_methods() {
        let mut class = ObjClass {
            header: ObjHeader::new(ObjType::Class),
            name: "Test".to_string(),
            methods: HashMap::new(),
            superclass: std::ptr::null_mut(),
        };
        class.methods.insert("x".to_string(), LoxValue::Number(42.0));

        let mut instance = ObjInstance::new(&class as *const ObjClass as *mut ObjClass);
        // Set field "x" to a different value
        instance.set_property("x".to_string(), LoxValue::Number(99.0));

        // Field should shadow the method
        let result = instance.get_property("x").unwrap();
        assert_eq!(result, LoxValue::Number(99.0));
    }
}
}

How Each Concept Maps to Code

Concept	How We Implemented It
Class declaration	Runtime call `lox_create_class`
Instance creation	Runtime call `lox_instance_from_class`
Property access	Runtime call `lox_get_property` (fields before methods)
Property assignment	Runtime call `lox_set_property`
Method binding	`ObjBoundMethod` wraps receiver + closure
`this`	Implicit upvalue, filled when method is bound
`super`	Implicit upvalue holding the superclass, resolved at compile time
Inheritance	Linked list of superclass pointers, walked during method lookup
GC tracing	Walk methods, superclass, fields, receiver, and bound method

Classes tie together every system we’ve built: the GC heap, closures, upvalues, and MLIR code generation. There’s no new fundamental mechanism — only new combinations of what already exists. That’s how you know the architecture is right.

Next: Part 8 — Why We Did It This Way — Why numbers-only first? Why parameter-passing for blocks instead of a struct field? Why scf.if for logical operators? This chapter answers the questions that came up during review — and the answers tell you as much about MLIR as the code does.

MLIR for Lox: Part 8 — Why We Did It This Way

If you’ve made it this far, you probably have questions. Why f64 for everything? Why pass blocks as parameters? Why scf.if instead of bitwise operators? Good — these are the questions any careful reader would ask. This chapter answers them.

Think of this as a checkpoint before we push further into the runtime and linking. The choices we made in Parts 1–7 were deliberate. Understanding why will make the code in Parts 9–11 make more sense.

The questions:

Why start with a “numbers only” subset when Lox is dynamically typed?
Why pass blocks as parameters instead of storing them in a struct field?
Why scf.if for logical operators instead of arith.andi/arith.ori?
Why not show the lexer?
Why a global RUNTIME instead of passing it through?
How does the transition from f64 to tagged unions work?
What does this tutorial not cover?

Why Start with a “Numbers Only” Subset?

The most consistent feedback: “Lox is dynamically typed, but the codegen treats everything as f64. That’s wrong.”

It is — for a production compiler. But a tutorial that introduces tagged unions, type-tag checking, and the from_raw/to_raw boundary before the reader understands how MLIR blocks, regions, and lowering work would be 200 lines of boilerplate before the first interesting operation. The reader would quit.

The approach this tutorial takes:

Parts 1–6: All values are f64. true is 1.0, false is 0.0, nil is 0.0. Arithmetic is arith.addf. Comparison is arith.cmpf. This lets the reader focus on MLIR’s structure — blocks, regions, control flow, lowering — without drowning in tag-checking boilerplate.
Part 7 (Classes and Instances): Introduces the tagged union representation (!llvm.struct<(i8, i64)>) because classes need LoxValue::Instance, LoxValue::String, and other types that can’t be represented as f64. Every arithmetic operation becomes “check tag → extract payload → operate → re-tag result.” This is the production path.
Parts 9–11: The runtime, error reporting, and cross-module linking use the tagged union representation (Parts 9–10) or reference it. By this point, the reader understands why — they’ve seen the simpler model and can appreciate what the tags buy them. Part 11 (Cross-Module Linking) uses the numbers-only model for its IR examples — the linking concepts are the same regardless of the value representation.

Honest simplification: A real Lox compiler would use tagged unions from the start. The “numbers only” model is a pedagogical choice, not an engineering one. If you’re building a production compiler, skip straight to the tagged representation.

Why Parameter-Passing for Blocks?

The original code used current_block: Option<Block<'c>> — a struct field holding the current MLIR block. Review caught a critical issue: this doesn’t work with Melior’s ownership model.

In Melior, a Block is moved into a Region via region.append_block(block). After that move, the block is consumed — you can’t hold a reference to it in a struct field. The original code tried to do both:

#![allow(unused)]
fn main() {
self.current_block = Some(block);  // store the block
region.append_block(block);        // move the block — DOUBLE USE
}

This won’t compile. Rust’s ownership rules catch it at compile time.

The fix: each compile method takes &Block<'c> as a parameter. The block is created, operations are appended, and it’s moved into a region — all within a single method. No struct field, no double-move.

#![allow(unused)]
fn main() {
use std::collections::HashMap;

fn compile_binary(&self, binary: &BinaryExpr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> Value<'c, 'c> {
    let lhs = self.compile_expression(&binary.left, block, variables);
    let rhs = self.compile_expression(&binary.right, block, variables);
    // ...
}
}

This pattern is more verbose (every method takes block as a parameter) but it’s correct. And it teaches the reader something important: MLIR’s ownership model isn’t a nuisance — it prevents you from accidentally creating dangling references to blocks that have been moved into regions.

Why `scf.if` for Logical Operators?

The original code used arith.andi and arith.ori for and/or. Review caught this: bitwise AND/OR don’t short-circuit.

In Lox, false and crash() should never call crash(). But arith.andi evaluates both operands unconditionally. The fix is scf.if:

// a and b → if a { b } else { a }
// Lox's `and` returns the left operand when it's falsy,
// and the right operand when it's truthy.
%result = scf.if %a -> (f64) {
    scf.yield %b : f64
} else {
    scf.yield %a : f64
}

This is more IR, but it’s correct. Short-circuit evaluation requires control flow, not bitwise operations. We use scf.if rather than generating cf.cond_br directly for the same reason the rest of the codegen uses scf — it’s structurally simpler (regions instead of explicit blocks and branches) and the --convert-scf-to-cf pass handles the lowering. Generating cf directly would mean managing block arguments and branch targets in every compile method, which is exactly the complexity scf hides. In the numbers-only model, when a is falsy, a == 0.0, so yielding a in the else-branch is equivalent to yielding the constant 0.0 — but yielding the left operand directly is more semantically correct and avoids a redundant constant.

Note: In the tagged union model (Part 7+), the else-branch would yield a tagged Nil value when a is falsy, and the then-branch would yield the second operand with its tag preserved.

Why Not Show the Lexer?

The parser references Token, TokenType, and a tokenize() function. None of these are shown in full.

Because the lexer isn’t interesting in the context of MLIR. It’s a standard scanner — skip whitespace, match keywords, read numbers and strings. Crafting Interpreters Chapter 4 covers it in detail, and the implementation here follows that chapter almost exactly. Showing 200 lines of match arms would add length without adding understanding.

The tutorial provides a LexValue enum and a tokenize() stub so the code compiles, then points the reader at Crafting Interpreters for the full implementation.

Why a Global `RUNTIME` Instead of Passing It Through?

The JIT execution engine registers function pointers (like lox_print) at build time. These are raw extern "C" functions — they can’t carry a &mut self parameter or a reference to a Runtime struct. So the runtime lives in a global:

#![allow(unused)]
fn main() {
static RUNTIME: LazyLock<Mutex<Runtime>> = LazyLock::new(|| {
    Mutex::new(Runtime::new())
});
}

This is a pragmatic choice, not an elegant one. Thread-local storage would work, but adds complexity for no benefit in a single-threaded compiler. A global with a Mutex is simple, prevents data races if the compiler ever becomes multi-threaded, and is honest about the constraint.

A production compiler would likely use a more structured approach (like Cranelift’s UserState), but for a tutorial, the global is the right trade-off.

The Tagged Union Graduation

Part 7 (Classes and Instances) switches from f64-only values to the tagged union representation. This is deliberate — classes need tagged unions because LoxValue::Instance, LoxValue::String, and other types can’t be represented as f64. Part 9 (Standard Library and Runtime) continues with tagged unions for the runtime functions (lox_print, lox_clock) that must handle all Lox types.

The transition looks like this:

Before (Parts 1–6, numbers-only):

%one = arith.constant 1.0 : f64
%two = arith.constant 2.0 : f64
%sum = arith.addf %one, %two : f64

After (Part 7+, tagged union):

%one_tag = arith.constant 0 : i8     // TAG_NUMBER
%one_val = arith.constant 1.0 : f64
%one_bits = llvm.bitcast %one_val : f64 to i64  // f64 → i64 for the struct
%one_tmp = llvm.insertvalue %one_bits, %undef[1] : !llvm.struct<(i8, i64)>
%one = llvm.insertvalue %one_tag, %one_tmp[0] : !llvm.struct<(i8, i64)>

%two_tag = arith.constant 0 : i8
%two_val = arith.constant 2.0 : f64
%two_bits = llvm.bitcast %two_val : f64 to i64
%two_tmp = llvm.insertvalue %two_bits, %undef[1] : !llvm.struct<(i8, i64)>
%two = llvm.insertvalue %two_tag, %two_tmp[0] : !llvm.struct<(i8, i64)>

// Add: check tags match, extract payloads, bitcast back to f64, add, re-tag
%lhs_tag = llvm.extractvalue %one[0] : !llvm.struct<(i8, i64)>
%rhs_tag = llvm.extractvalue %two[0] : !llvm.struct<(i8, i64)>
%tags_match = arith.cmpi eq, %lhs_tag, %rhs_tag : i8
// ... (tag check + payload extraction + bitcast i64→f64 + arith.addf + bitcast f64→i64 + re-tag)

The “numbers only” model is 3 lines. The tagged model is 10+ lines with error handling. The reader needs to understand the 3-line version before the 10-line version makes sense.

What This Tutorial Doesn’t Cover

No tutorial covers everything. The tutorial doesn’t write custom optimization passes — a significant topic that deserves its own guide. Debug info emission isn’t covered; MLIR locations are shown, but DWARF debug info generation isn’t. The GC’s integration with the JIT is hand-waved — the GC is implemented, but the interaction between stack maps and code generation requires precise stack roots that a production compiler needs. And string interning in the compiler is skipped — the AST uses String directly, while a production compiler would intern strings for faster comparison and lower memory usage.

Cross-module linking is now covered in Part 11.

Each of these is worth a tutorial of its own. This one focuses on getting the reader from “what is MLIR?” to “I can compile a non-trivial Lox program” without losing them along the way.

Next: Part 9 — Standard Library and Runtime — Our compiler generates MLIR, but it can’t do anything useful yet — there’s no print, no clock, no way to see the output. We’ll build the C runtime that bridges MLIR’s compiled code to the outside world, and see why extern "C" is the one boundary where Rust’s safety guarantees give way to trust.

MLIR for Lox: Part 9 — print and clock — The Built-In Functions Lox Can’t Run Without

Your Lox compiler can parse code, generate MLIR, lower it through dialects, and JIT-compile the result. It has garbage collection, closures, and classes. But when someone writes print "hello"; or var start = clock();, nothing happens. The compiler generates MLIR that calls lox_print and lox_clock, but those functions don’t exist yet. We never built the runtime.

What Is a Runtime, Exactly?

When our compiler lowers Lox to MLIR, it generates calls to functions that don’t exist in MLIR itself. Things like:

lox_print(value) — print a Lox value
lox_clock() — return the current time
lox_alloc(size) — allocate on the GC heap

These are runtime functions. They’re implemented in Rust (or C), compiled into the host binary, and linked at JIT time. The generated MLIR code calls them; the runtime defines them.

This split is fundamental to how compiled languages work. The compiler generates the what; the runtime provides the how. Lox’s runtime is small — only a handful of functions — but the pattern scales to any size.

A Note on Representation: Tagged Unions in the Runtime

Parts 1 through 6 work with f64 values — a numbers-only subset of Lox. Part 7 introduced the tagged union representation because classes need LoxValue::Instance, LoxValue::String, and other types that can’t fit in an f64. The runtime functions in this part work with the full tagged union — lox_print must handle every LoxValue variant, not only numbers.

If you’re coming from the numbers-only model, the change is mechanical: every operation gets wrapped in tag checks, payload extraction, and re-packing. The concept hasn’t changed; the representation is richer. Where the numbers-only model does arith.addf %lhs, %rhs : f64, the tagged union model checks tags, extracts payloads, does the arithmetic, and re-packs the result as (TAG_NUMBER, payload). You’ve seen this pattern in Parts 7 and 8 — the runtime uses the same (i8, i64) pairs, viewed from the Rust side of the boundary.

If you want to build the numbers-only runtime first (a good exercise), have lox_print handle only TAG_NUMBER and return early for everything else. Then add the other tags one at a time.

Reconstructing Lox Values: `from_raw` and `to_raw`

Before we define the runtime functions, we need the bridge between MLIR’s raw (tag, payload) pairs and Rust’s typed LoxValue enum.

LoxValue::from_raw(tag, payload) takes a raw (tag, payload) pair and reconstructs a LoxValue. Its companion to_raw() does the inverse. These are the bridge between the MLIR world (where values are (i8, i64) pairs) and the Rust world (where we have typed enums):

#![allow(unused)]
fn main() {
impl LoxValue {
    /// Reconstruct a LoxValue from its raw (tag, payload) representation.
    ///
    /// The tag identifies the type. The payload is type-specific:
    /// - Nil: payload is 0 (unused)
    /// - Bool: payload is 0 (false) or 1 (true)
    /// - Number: payload is the f64 bitcast to i64
    /// - String: payload is a pointer to the GC-managed string object
    /// - Instance: payload is a pointer to the GC-managed instance object
    pub fn from_raw(tag: u8, payload: i64) -> Self {
        match tag {
            TAG_NIL => LoxValue::Nil,
            TAG_BOOL => LoxValue::Bool(payload != 0),
            TAG_NUMBER => {
                // Bitcast i64 back to f64
                let bytes = payload.to_le_bytes();
                LoxValue::Number(f64::from_le_bytes(bytes))
            }
            TAG_STRING => {
                // payload is a raw pointer to a GcString on the GC heap.
                // Safety: the caller must ensure the value is still rooted
                // (reachable from a GC root). A value that's on the stack
                // or in a runtime function's local variables is rooted by
                // the GC's stack-scanning — the collector won't free it
                // while the pointer is in use.
                //
                // `(*ptr).clone()` copies the GcString pointer wrapper,
                // not the string data on the GC heap. Both the original
                // and the clone point to the same GC object. A deep copy
                // of the string data (e.g., `String::from(&**ptr)`)
                // would allocate outside the GC heap and leak or
                // double-free — but `GcString::clone()` doesn't do that.
                // The mark-sweep collector will find the object as long
                // as any GcString (or other root) still references it.
                let ptr = payload as *const GcString;
                LoxValue::String(unsafe { (*ptr).clone() }) // copy the pointer wrapper, not the data
            }
            TAG_INSTANCE => {
                // Same principle: wrap the existing GC pointer, don't deep-copy.
                let ptr = payload as *const GcInstance;
                LoxValue::Instance(unsafe { (*ptr).clone() }) // wrap existing GC pointer
            }
            // ⚠️ TAG_CLOSURE, TAG_CLASS, and TAG_BOUND hit this catchall
            // and silently become Nil. If you add a runtime function that
            // receives one of these types (e.g., lox_call for closures),
            // add an explicit arm that reconstructs the Gc pointer,
            // following the same pattern as TAG_STRING above.
            _ => LoxValue::Nil, // unknown tag → nil (defensive)
        }
    }

    /// Convert a LoxValue to its raw (tag, payload) representation.
    /// Used when passing values from the runtime back to MLIR.
    pub fn to_raw(&self) -> (u8, i64) {
        match self {
            LoxValue::Nil => (TAG_NIL, 0),
            LoxValue::Bool(b) => (TAG_BOOL, if *b { 1 } else { 0 }),
            LoxValue::Number(n) => {
                let bytes = n.to_le_bytes();
                (TAG_NUMBER, i64::from_le_bytes(bytes))
            }
            LoxValue::String(s) => (TAG_STRING, s as *const GcString as i64),
            LoxValue::Instance(i) => (TAG_INSTANCE, i as *const GcInstance as i64),
            // ⚠️ Same catchall as from_raw — LoxValue::Closure, Class, and
            // Bound silently become (TAG_NIL, 0), which destroys the value.
            // Add explicit arms when you need to pass these types to MLIR.
            _ => (TAG_NIL, 0),
        }
    }
}
}

Why not store LoxValue directly in MLIR? Because MLIR is a low-level IR — it doesn’t know about Rust enums, String, or GcString. It works with primitive types: integers, floats, and pointers. The tagged union representation (i8 tag + i64 payload) is the ABI between generated code and the runtime. from_raw/to_raw translate across that boundary.

Watch out for the catchalls. Both from_raw and to_raw have _ => Nil fallback arms. For from_raw, that means TAG_CLOSURE, TAG_CLASS, and TAG_BOUND silently become LoxValue::Nil — no error, no warning. For to_raw, passing a LoxValue::Closure produces (TAG_NIL, 0), which destroys the value. The catchall exists so the tutorial doesn’t need to show every arm up front, but if you’re building a runtime function that handles closures, classes, or bound methods, add explicit match arms before you wonder why you’re getting nil everywhere.

Why wrap the existing GC pointer instead of cloning the data? A GcString is a thin wrapper around a pointer to the GC heap — it’s not the string data itself. GcString::clone() copies the pointer wrapper, so both the original and the clone point to the same GC object. The mark-sweep collector (from Part 2, with root tracking from Part 3) will find the object as long as any root still references it. A deep copy of the string data — like String::from(&**ptr) — would allocate a new string outside the GC heap, which would leak (never collected) or double-free (if the original’s destructor also frees the data). The key distinction: (*ptr).clone() on a GcString copies the thin pointer wrapper (safe); String::from(&**ptr) copies the actual string bytes into untracked memory (dangerous).

The Runtime Interface

Now we can define the runtime functions. These take raw types (u8, i64, f64), not LoxValue directly — JIT-compiled code works with the raw representation. The runtime uses from_raw to reconstruct LoxValue when it needs to.

We’ll put these in a runtime module:

#![allow(unused)]
fn main() {
// src/runtime/mod.rs

use crate::gc::Gc;
use crate::value::LoxValue;

/// Functions that the JIT-compiled code can call.
/// These are registered as symbols when creating the execution engine.
pub struct Runtime {
    gc: Gc,
}

impl Runtime {
    pub fn new() -> Self {
        Runtime { gc: Gc::new() }
    }

    /// Print a Lox value to stdout.
    /// Called from MLIR as `lox_print(i8, i64) -> void`.
    pub fn lox_print(&self, tag: u8, payload: i64) {
        let value = LoxValue::from_raw(tag, payload);
        match &value {
            LoxValue::Nil => println!("nil"),
            LoxValue::Bool(b) => println!("{}", b),
            LoxValue::Number(n) => println!("{}", n),
            LoxValue::String(s) => {
                // Safety: print doesn't allocate on the GC heap, so the
                // collector can't run while we're using the string pointer.
                // The GC only triggers when the allocator determines a
                // collection is needed (see Part 4). The
                // caller (lox_print_wrapper) holds the RUNTIME mutex, but
                // that's not what makes this safe — it's the absence of
                // any allocation during this call.
                println!("{}", s.as_str());
            }
            LoxValue::Instance(_) => {
                // No safety concern here: we're printing a placeholder,
                // not dereferencing the instance's fields. If a future
                // version prints field values, it would need the same
                // GC safety analysis as the String arm above.
                println!("<instance>")
            }
            _ => println!("<object>"),
        }
    }

    /// Return the current clock time in seconds.
    /// Called from MLIR as `lox_clock() -> f64`.
    pub fn lox_clock(&self) -> f64 {
        use std::time::SystemTime;
        let duration = SystemTime::now()
            .duration_since(std::time::UNIX_EPOCH)
            .expect("time went backwards");
        duration.as_secs_f64()
    }

    /// Allocate a new GC-managed object.
    /// Called from MLIR as `lox_alloc(i32) -> i64` (returns raw pointer).
    ///
    /// `self.gc.allocate()` returns a raw pointer (`*mut u8`) to the
    /// newly allocated heap region. We cast it to `i64` so it can flow
    /// through MLIR as a plain integer (MLIR doesn't have pointer types
    /// at the Lox dialect level).
    ///
    /// **Platform note:** `ptr as i64` is correct on 64-bit platforms
    /// where pointers fit in `i64`. On 128-bit platforms (theoretical)
    /// or 32-bit platforms with address space extension, this would
    /// truncate the pointer. For this tutorial, we target x86-64 / AArch64
    /// where `sizeof(*mut u8) == sizeof(i64)`.
    pub fn lox_alloc(&mut self, size: i32) -> i64 {
        let ptr = self.gc.allocate(size as usize);
        ptr as i64
    }

    /// Trigger a garbage collection cycle.
    /// Called from MLIR as `lox_gc_collect() -> void`.
    pub fn lox_gc_collect(&mut self) {
        self.gc.collect();
    }
}
}

Registering Native Functions with the JIT

Now the question: how does the JIT-compiled MLIR code find these functions? When we create the MLIR execution engine, we need to register symbols — map string names to function pointers.

Melior’s ExecutionEngine wraps LLVM’s ORC JIT. The way to expose symbols depends on your Melior version, but the general approach is:

Compile your runtime wrappers into the host binary
Create the execution engine with shared library paths (for external runtimes), or register symbols directly

Here’s the pattern:

#![allow(unused)]
fn main() {
// src/jit.rs

use melior::{
    execution_engine::ExecutionEngine,
    Context,
};
use std::sync::{LazyLock, Mutex};
use crate::runtime::Runtime;

/// Global runtime instance, shared between the JIT and the host.
///
/// The JIT's symbol resolver works with raw function pointers — there's no way
/// to pass a `&mut Runtime` through a function pointer. Using a global
/// `LazyLock<Mutex<Runtime>>` is the simplest approach. In a production
/// compiler you might use thread-local storage or a more sophisticated
/// registration mechanism, but for a tutorial this keeps things clear.
static RUNTIME: LazyLock<Mutex<Runtime>> = LazyLock::new(|| {
    Mutex::new(Runtime::new())
});

/// Create an execution engine.
///
/// Note: the exact `ExecutionEngine` constructor signature varies between
/// Melior versions. In Melior 0.27, it takes five parameters:
///   ExecutionEngine::new(&module, optimization_level, &shared_lib_paths, enable_object_dump, enable_pic)
/// The last two are `bool` flags — `enable_object_dump` and `enable_pic`.
/// Some versions also take `&module` differently.
///
/// Symbol registration also varies — check the `ExecutionEngine` API docs
/// for your version. The pattern shown below is conceptual; you may need
/// to use `add_symbol` or a different registration method.
pub fn create_engine(module: &melior::ir::Module) -> ExecutionEngine {
    let engine = ExecutionEngine::new(module, 2, &[], false, false);
    // ^^^ Melior 0.27 takes 5 params: module, opt_level, shared_libs, enable_object_dump, enable_pic

    // Register runtime functions as JIT symbols.
    // The MLIR code calls these by name (e.g., `func.call @lox_print(...)`),
    // and the JIT resolves them through the symbol table.
    //
    // In Melior 0.27, symbol registration looks like:
    //   engine.add_symbol("lox_print", lox_print_wrapper as *const ());
    //
    // Check the ExecutionEngine docs for the exact method name and signature.

    engine
}
}

The Wrapper Problem

There’s a catch. The JIT calls functions using the C calling convention, and our Runtime methods take &self or &mut self. We need wrapper functions that bridge from the C ABI to our Rust runtime.

The #[no_mangle] attribute ensures the function name isn’t mangled by the Rust compiler, so the JIT can find it by name. The extern "C" ensures the C calling convention.

#![allow(unused)]
fn main() {
// src/jit.rs (continued)

/// C-compatible wrapper for lox_print.
///
/// MLIR signature: `(i8, i64) -> void`
/// We receive the tagged value as two arguments.
#[no_mangle]
pub extern "C" fn lox_print_wrapper(tag: u8, payload: i64) {
    let rt = RUNTIME.lock().unwrap();
    rt.lox_print(tag, payload);
}

/// C-compatible wrapper for lox_clock.
///
/// MLIR signature: `() -> f64`
#[no_mangle]
pub extern "C" fn lox_clock_wrapper() -> f64 {
    let rt = RUNTIME.lock().unwrap();
    rt.lox_clock()
}

/// C-compatible wrapper for lox_alloc.
///
/// MLIR signature: `(i32) -> i64`
#[no_mangle]
pub extern "C" fn lox_alloc_wrapper(size: i32) -> i64 {
    let mut rt = RUNTIME.lock().unwrap();
    rt.lox_alloc(size)
}
}

Why a global? The JIT’s symbol resolver works with raw function pointers — there’s no way to pass a &mut Runtime through a function pointer. Using a global LazyLock<Mutex<Runtime>> is the simplest approach. In a production compiler you might use thread-local storage or a more sophisticated registration mechanism, but for a tutorial this keeps things clear.

Generating Calls to Runtime Functions

Now that the runtime functions exist and are registered, we need to generate MLIR that calls them. The key principle:

Function declarations go at the module level. Function calls go inside blocks.

A func.func declaration (external function with no body) is a module-level operation. It should be appended to the module’s body block, not inside a function’s body block. The func.call operation is what goes inside a function’s body.

To avoid declaring the same external function multiple times (which would be an error), we declare all runtime functions once during initialization, then only emit func.call operations during codegen. In a complete compiler, declare_runtime_functions would include every runtime function from every part — lox_print and lox_clock from this part, gc_push_frame and gc_pop_frame from Part 3, lox_alloc from Part 4, and the class-related functions (lox_create_class, lox_instance_from_class, lox_get_property, lox_set_property, lox_bind_method, lox_set_method, lox_super_lookup, lox_call) from Part 7. The example below shows only lox_print and lox_clock; add the rest following the same pattern.

Declaring Runtime Functions at Module Level

#![allow(unused)]
fn main() {
use melior::dialect::func;
use melior::ir::attribute::{StringAttribute, TypeAttribute};
use melior::ir::r#type::{FunctionType, Type};
use melior::ir::{Identifier, Location, Region};

// In CodeGenerator — call this once during initialization
fn declare_runtime_functions(&self) {
    let location = Location::unknown(self.context);
    let i8_type: Type = Type::parse(self.context, "i8").unwrap();
    let i64_type: Type = Type::parse(self.context, "i64").unwrap();
    let f64_type = Type::float64(self.context); // used for clock's return type below

    // Declare lox_print: (i8, i64) -> ()
    let print_type = FunctionType::new(self.context, &[i8_type, i64_type], &[]);
    self.module.body().append_operation(func::func(
        self.context,
        StringAttribute::new(self.context, "lox_print"),
        TypeAttribute::new(print_type.into()),
        Region::new(),  // empty region = declaration, not definition
        &[
            (Identifier::new(self.context, "sym_visibility"),
             StringAttribute::new(self.context, "private").into()),
        ],
        location,
    ));

    // Declare lox_clock: () -> f64
    let clock_type = FunctionType::new(self.context, &[], &[f64_type]);
    self.module.body().append_operation(func::func(
        self.context,
        StringAttribute::new(self.context, "lox_clock"),
        TypeAttribute::new(clock_type.into()),
        Region::new(),
        &[
            (Identifier::new(self.context, "sym_visibility"),
             StringAttribute::new(self.context, "private").into()),
        ],
        location,
    ));
}
}

The Print Statement

When the parser sees print expr;, we:

Compile the expression to get a (tag, payload) value
Call lox_print(tag, payload) — this is a func.call, which goes in the block

That first step uses compile_expression_tagged — the tagged-union version of compile_expression. In the numbers-only model (Parts 1–6), compile_expression returns a single Value<'c, 'c> (the f64). In the tagged union model, every expression produces a (tag, payload) pair, so the code generator needs a version that returns both. Every function that produces a LoxValue now returns (Value<'c, 'c>, Value<'c, 'c>) — the tag and the payload.

The transformation is mechanical but not invisible. Here’s how compile_number_literal changes:

#![allow(unused)]
fn main() {
use melior::dialect::arith;
use melior::ir::attribute::{FloatAttribute, IntegerAttribute};
use melior::ir::operation::OperationBuilder;
use melior::ir::r#type::Type;
use melior::ir::{Block, Location, Value};

// Numbers-only (Part 1): returns a single f64 Value
fn compile_number_literal(&self, value: f64, block: &Block<'c>) -> Value<'c, 'c> {
    let location = Location::unknown(self.context);
    let op = arith::constant(self.context, FloatAttribute::new(
        Type::float64(self.context), value
    ).into(), location);
    block.append_operation(op).result(0).unwrap().into()
}

// Tagged union (Part 9): returns (tag, payload) pair
fn compile_number_literal_tagged(&self, value: f64, block: &Block<'c>) -> (Value<'c, 'c>, Value<'c, 'c>) {
    let location = Location::unknown(self.context);

    // tag = TAG_NUMBER
    let tag = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(Type::parse(self.context, "i8").unwrap(), TAG_NUMBER as i64).into(),
        location,
    ));

    // payload = f64 bitcast to i64
    let f64_val = block.append_operation(arith::constant(
        self.context,
        FloatAttribute::new(Type::float64(self.context), value).into(),
        location,
    ));
    // llvm.bitcast reinterprets the f64 bits as i64 (preserving the bit
    // pattern). We use the LLVM dialect's bitcast rather than arith.bitcast
    // because we're generating LLVM dialect IR directly — the arith dialect's
    // bitcast would need to be lowered by `convert-arith-to-llvm` anyway.
    // Both operations perform the same reinterpreted bit-for-bit conversion
    // between types of equal width (including float↔integer). See the
    // "Why llvm.bitcast?" note in compile_clock for the full explanation.
    let payload = block.append_operation(OperationBuilder::new("llvm.bitcast", location)
        .add_operands(&[f64_val.result(0).unwrap()])
        .add_results(&[Type::parse(self.context, "i64").unwrap().into()])
        .build().expect("valid bitcast"));

    (tag.result(0).unwrap().into(), payload.result(0).unwrap().into())
}
}

Same computation at the core — arith.constant to produce the value — but now wrapped in a tag constant and a bitcast. Every expression type gets the same treatment: compute the value, then wrap it in the appropriate tag. Binary operations like lox.add are more involved (they need tag checks on the inputs before extracting payloads), but the pattern is the same.

#![allow(unused)]
fn main() {
use melior::dialect::func;
use melior::ir::attribute::FlatSymbolRefAttribute;
use melior::ir::r#type::Type;
use melior::ir::{Block, Location, Value};
use std::collections::HashMap;

// In CodeGenerator::compile_print
fn compile_print(&self, print: &PrintStmt, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) {
    let location = Location::unknown(self.context);
    let (tag, payload) = self.compile_expression_tagged(&print.value, block);

    // Call the already-declared lox_print function.
    // Only the call goes in the block — the declaration is at module level.
    let call_op = func::call(
        self.context,
        FlatSymbolRefAttribute::new(self.context, "lox_print"),
        &[tag, payload],
        &[],
        location,
    );
    block.append_operation(call_op);
}
}

The Clock Function

clock() is simpler — it takes no arguments and returns f64:

#![allow(unused)]
fn main() {
use melior::dialect::{arith, func};
use melior::ir::attribute::{FlatSymbolRefAttribute, IntegerAttribute};
use melior::ir::operation::OperationBuilder;
use melior::ir::r#type::Type;
use melior::ir::{Block, Location, Value};

fn compile_clock(&self, block: &Block<'c>) -> (Value<'c, 'c>, Value<'c, 'c>) {
    let location = Location::unknown(self.context);

    // Call the already-declared lox_clock function.
    let call_op = func::call(
        self.context,
        FlatSymbolRefAttribute::new(self.context, "lox_clock"),
        &[],
        &[Type::float64(self.context)],
        location,
    );
    let result = block.append_operation(call_op);
    let f64_result: Value = result.result(0).unwrap().into();

    // Box the f64 into a tagged value: tag = TAG_NUMBER, payload = bits
    let tag = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(
            Type::parse(self.context, "i8").unwrap(),
            TAG_NUMBER as i64,
        ).into(),
        location,
    ));

    // Reinterpret the f64 bits as i64 for the payload.
    // We use `llvm.bitcast` — the standard way to reinterpret bits
    // between different types of the same width in MLIR.
    // (We use llvm.bitcast rather than arith.bitcast because we're
    // generating LLVM dialect IR directly. Both operations perform the
    // same float↔integer reinterpreted bit conversion.)
    //
    // Alternative approaches if `llvm.bitcast` is unavailable:
    //   - Store the f64 to memory and load it as i64 (always works)
    //   - Use `builtin.unrealized_conversion_cast` as a placeholder
    //
    // We use `OperationBuilder` with the raw operation name for clarity.
    // Melior may provide a typed `llvm::bitcast()` helper in your
    // version — check the docs if you prefer the idiomatic API.
    let payload = block.append_operation(OperationBuilder::new("llvm.bitcast", location)
        .add_operands(&[f64_result])
        .add_results(&[Type::parse(self.context, "i64").unwrap().into()])
        .build()
        .expect("valid bitcast"));

    (tag.result(0).unwrap().into(), payload.result(0).unwrap().into())
}
}

The key insight: clock() returns an f64, but our Lox values are (tag, payload) pairs. So we tag the result with TAG_NUMBER and reinterpret the f64 bits as i64 for the payload.

Why llvm.bitcast instead of arith.bitcast? Both operations perform the same bit-for-bit reinterpretation between types of equal width — including f64 ↔ i64. The difference is dialect level. We use llvm.bitcast because we’re generating LLVM dialect IR directly (the code uses OperationBuilder with raw operation names). Using arith.bitcast would work too — it would be lowered to llvm.bitcast by the convert-arith-to-llvm pass — but since everything else in the tagged-union code is already in the LLVM dialect, we skip the extra lowering step. If you prefer the higher-level approach, arith.bitcast %val : f64 to i64 produces the same result after lowering.

GC Safety in Runtime Functions

There’s a subtlety with any runtime function that receives a GC-managed pointer. The string data lives on the GC heap — if a collection happens while the runtime function is using that pointer, the memory could be moved or freed.

For print, this is unlikely since println! doesn’t allocate on the GC heap, but it’s a real concern for more complex runtime functions. The general rule:

Any runtime function that receives a GC-managed pointer must ensure the GC cannot collect it while the pointer is in use.

Our lox_print is safe without explicit rooting because print doesn’t allocate on the GC heap, so the collector can’t run during the call. The GC only triggers when the allocator determines a collection is needed (see Part 4). The global RUNTIME mutex in the wrapper function does prevent concurrent GC — lox_gc_collect needs the same lock — but the real guarantee is the absence of allocation, not the lock. A runtime function that does allocate (like lox_concat) would need to root any GC pointers before the allocation point, even with the mutex held, because the allocation itself could trigger collection.

For instances, the current code prints <instance> as a placeholder. A real implementation would need to root the instance with push_root/pop_root (the same pattern from Part 3) before accessing its fields. Since push_root/pop_root require &mut self, and our lox_print takes &self, we’d need to either:

Change lox_print to take &mut self (simple but overkill — print doesn’t allocate)
Use the global RUNTIME mutex’s interior mutability to root temporarily
Accept that print can’t trigger GC (it doesn’t allocate), so rooting isn’t necessary for print specifically — the GC only triggers during allocation (Part 4)

We take the third approach — print doesn’t allocate, so GC can’t run during print. A production runtime would root aggressively and use &mut self.

Honest simplification: We’re trading perfect safety for simplicity here. If you add a runtime function that does allocate (like a concat function that creates new strings), you must root any GC pointers before the allocation point.

Extending the Standard Library

print and clock are the built-ins from Crafting Interpreters, but a real Lox implementation might want more. The pattern is always the same:

Declare the function in the MLIR module as an external func.func — once, at module level
Register the symbol with the JIT execution engine
Write a #[no_mangle] extern "C" wrapper that bridges to the Rust runtime
Generate a call from the code generator when the built-in is used — only func.call, never func.func

Here’s a sketch for adding len(string):

#![allow(unused)]
fn main() {
// Runtime function
pub fn lox_len(&self, tag: u8, payload: i64) -> f64 {
    if tag != TAG_STRING {
        // In a production runtime, you'd return a runtime error
        // (or set an error flag the caller checks). This sketch
        // returns 0.0 as a sentinel — it works for the tutorial's
        // test cases but silently hides type mismatches. A better
        // approach: set a last_error field on the Runtime, return
        // a TAG_ERROR tagged value, or use a result type that the
        // caller must check before using the return value.
        eprintln!("len() expects a string argument");
        // Silent sentinel — not great. Part 10 shows how to report
        // type mismatches as runtime errors with source locations.
        return 0.0;
    }
    // payload is a pointer to the GC string object.
    // We need to dereference it through the GcString type, not as a raw C string.
    let string_obj = unsafe { &*(payload as *const GcString) };
    string_obj.len() as f64
}

// C-compatible wrapper
#[no_mangle]
pub extern "C" fn lox_len_wrapper(tag: u8, payload: i64) -> f64 {
    let rt = RUNTIME.lock().unwrap();
    rt.lox_len(tag, payload)
}
}

And in the code generator, you’d add the declaration in declare_runtime_functions and the call in a new compile_len method — following the same pattern as compile_print and compile_clock.

Connecting to the Class System

In Part 7, we built classes and instances. Instances are GC-allocated objects with a methods table and fields dictionary. When a runtime function receives an instance (tag = INSTANCE_TAG, payload = pointer to GcInstance), it can access the instance’s data:

#![allow(unused)]
fn main() {
pub fn lox_instance_get_field(&self, instance_ptr: i64, field_name_ptr: i64) -> (u8, i64) {
    let instance = unsafe { &*(instance_ptr as *const GcInstance) };
    let field_name = unsafe { &*(field_name_ptr as *const GcString) };

    match instance.fields.get(field_name.as_str()) {
        Some(value) => {
            // Return the full (tag, payload) pair — the caller needs
            // the tag to know what type the field is. Returning only
            // the payload (value.to_raw().1) would lose type information:
            // a string field, an instance field, and a nil field would
            // all produce the same i64, and the caller couldn't tell
            // them apart.
            value.to_raw()
        }
        None => (TAG_NIL, 0), // nil: tag + payload
    }
}
}

Notice the pattern: lox_print uses from_raw to reconstruct a LoxValue because it needs to branch on the tag (different print behavior for each type). lox_instance_get_field works with raw pointers directly because it already knows the types — the tag check happened in the MLIR code before the call. Runtime functions that need to branch on the LoxValue type use from_raw; functions that already know the types work with raw pointers directly.

Safety note: This unsafe block dereferences raw pointers. The MLIR code must guarantee the pointer is valid and the tag is INSTANCE_TAG before calling this. Without that guarantee, the unsafe block is a crash waiting to happen. In a production runtime, you’d add tag-check assertions in debug builds.

The Runtime as a C Library

An alternative approach — and one that’s more common in production compilers — is to write the runtime in C and link it as a shared library. This gives you a stable ABI (no Rust mangling concerns), language independence (the same runtime works with any frontend), and simpler linking (add -lruntime to the linker flags).

The tutorial includes a compilable C runtime in runtime/lox_runtime.h and runtime/lox_runtime.c. Here’s the key idea — every Lox value is a (tag, payload) pair, and the C runtime knows how to print each tag:

// runtime/lox_runtime.h (excerpt)
#define TAG_NIL        0
#define TAG_BOOL       1
#define TAG_NUMBER     2
#define TAG_STRING     3
#define TAG_CLOSURE    4   /* matches compiled value tags from Part 7 */
#define TAG_INSTANCE   5   /* matches compiled value tags from Part 7 */
#define TAG_CLASS      6   /* matches compiled value tags from Part 7 */
#define TAG_BOUND      7   /* matches compiled value tags from Part 7 */

void lox_print(int8_t tag, int64_t payload);
double lox_clock(void);
int64_t lox_alloc_string(const char *data, int64_t length);

// runtime/lox_runtime.c (excerpt)
void lox_print(int8_t tag, int64_t payload) {
    switch (tag) {
        case TAG_NIL:    printf("nil\n"); break;
        case TAG_BOOL:   printf("%s\n", payload ? "true" : "false"); break;
        case TAG_NUMBER: {
            double value;
            memcpy(&value, &payload, sizeof(double));  // reinterpret i64 as f64
            printf("%g\n", value);
            break;
        }
        case TAG_STRING: {
            /*
             * ┌─────────────────────────────────────────────────────────┐
             * │ ⚠️  PAYLOAD FORMAT DIFFERS FROM THE RUST RUNTIME        │
             * │                                                         │
             * │ Rust runtime: payload = pointer to GcString on GC heap  │
             * │ C runtime:    payload = const char* (malloc, null-term)  │
             * │                                                         │
             * │ Same TAG_STRING value, DIFFERENT payload format.        │
             * │ Do NOT mix Rust-allocated strings with C-allocated ones.│
             * │ The C runtime is a standalone test harness only.        │
             * └─────────────────────────────────────────────────────────┘
             */
            const char *str = (const char *)payload;
            if (str) printf("%s\n", str);
            break;
        }
        case TAG_INSTANCE: printf("<instance at %ld>\n", (long)payload); break;
        case TAG_CLOSURE:  printf("<closure at %ld>\n", (long)payload); break;
        case TAG_CLASS:    printf("<class at %ld>\n", (long)payload); break;
        case TAG_BOUND:    printf("<bound method at %ld>\n", (long)payload); break;
        default: printf("<unknown tag %d>\n", tag); break;
    }
}

The C runtime uses null-terminated C strings allocated with malloc — no GC, no GcString wrapper. The lox_alloc_string function creates these strings so the MLIR code generator can store string data. See runtime/README.md for build instructions and a comparison with the Rust runtime.

Build and link:

gcc -shared -fPIC -o liblox_runtime.so runtime/lox_runtime.c

#![allow(unused)]
fn main() {
let engine = ExecutionEngine::new(module, 2, &["liblox_runtime.so"], false, false);
}

The C approach trades the safety of Rust for simplicity and universality. We’ll stick with the Rust runtime for the rest of this tutorial, but the C approach is worth knowing about — many production compilers (like the LLVM project’s own runtimes) use it.

Pulling It All Together

Let’s see the full picture of how a Lox program goes from source to execution, with the runtime in the loop:

// src/main.rs

use anyhow::Result;
use melior::Context;

mod ast;
mod lexer;
mod parser;
mod codegen;
mod runtime;
mod jit;
mod gc;
mod value;

fn main() -> Result<()> {
    let source = r#"
        var start = clock();
        print "Computing...";
        for (var i = 0; i < 1000; i = i + 1) {
            print i;
        }
        var elapsed = clock() - start;
        print elapsed;
    "#;

    // 1. Parse
    let tokens = lexer::tokenize(source)?;
    let ast = parser::Parser::new(tokens).parse()?;

    // 2. Create MLIR context and module
    let context = Context::new();
    let module = codegen::compile(&context, &ast)?;

    // 3. Create execution engine with runtime symbols registered
    let engine = jit::create_engine(&module)?;

    // 4. Invoke the compiled main function
    //
    // Note: `invoke_packed`'s return type and parameter signature
    // vary between Melior versions. In some versions it returns (),
    // in others it returns Result. The `?` handles the Result case;
    // if your version returns (), call without the `?` operator.
    // Note: `invoke_packed` is unsafe — it runs JIT-compiled code with
    // no bounds checks. Part 11 uses `result.map_err(...)` to turn
    // JIT failures into `anyhow` errors; we use `?` here for simplicity.
    // Note: `@main` returns `!llvm.struct<(i8, i64)>` (the tagged-union result),
    // but we don't capture the return value here. `invoke_packed` can ignore returns —
    // the Lox top-level result is effectively discarded. A real REPL would capture and
    // display the result.
    unsafe { engine.invoke_packed("main", &mut [])?; }

    Ok(())
}

When this runs:

clock() calls our lox_clock_wrapper, which returns the current time
print "Computing..." calls lox_print_wrapper with tag=TAG_STRING and a pointer to the string data
print i calls lox_print_wrapper with tag=TAG_NUMBER and the f64 reinterpreted as i64
The elapsed time is computed in MLIR as a float subtraction, then printed

The generated MLIR for this program would look something like:

module {
  func.func private @lox_print(i8, i64)
  func.func private @lox_clock() -> f64

  func.func @main() -> !llvm.struct<(i8, i64)> {
    %clock_result = func.call @lox_clock() : () -> f64
    %start_tag = arith.constant 2 : i8
    %start_payload = llvm.bitcast %clock_result : f64 to i64

    %str_tag = arith.constant 3 : i8
    // String constants use llvm.mlir.global for storage and llvm.mlir.addressof
    // to get a pointer. Shown here as a placeholder — the actual mechanism
    // is covered in the "String Constants" section of Part 1.
    %str_ptr = llvm.mlir.addressof @"str_Computing" : !llvm.ptr
    func.call @lox_print(%str_tag, %str_ptr) : (i8, i64) -> ()

    // ... loop body ...

    %clock_result2 = func.call @lox_clock() : () -> f64
    %elapsed = arith.subf %clock_result2, %clock_result : f64
    %elapsed_tag = arith.constant 2 : i8
    %elapsed_payload = llvm.bitcast %elapsed : f64 to i64
    func.call @lox_print(%elapsed_tag, %elapsed_payload) : (i8, i64) -> ()

    // Construct the nil return value: tag = TAG_NIL (0), payload = 0
    %nil_tag = arith.constant 0 : i8
    %nil_payload = arith.constant 0 : i64
    %nil_undef = llvm.undef : !llvm.struct<(i8, i64)>
    %nil_tagged1 = llvm.insertvalue %nil_tag, %nil_undef[0] : !llvm.struct<(i8, i64)>
    %nil_tagged2 = llvm.insertvalue %nil_payload, %nil_tagged1[1] : !llvm.struct<(i8, i64)>
    func.return %nil_tagged2 : !llvm.struct<(i8, i64)>
  }
}

Notice the structure: func.func private @lox_print is a module-level declaration (no body), while func.call @lox_print(...) is the actual call inside a function. Every func.call to a lox_* function is resolved at JIT time to our registered wrappers, which delegate to the Runtime struct. The MLIR code never knows about Rust, the GC, or LoxValue — it passes raw integers and floats.

What We Built

Across nine parts, we’ve gone from zero to a working Lox compiler. We built an AST and parser faithful to Crafting Interpreters. We generated MLIR through a Lox → MLIR dialect pipeline, then added tagged unions for proper dynamic typing with (tag, payload) pairs. Source locations came via MLIR’s first-class location tracking, and a lowering pipeline carried us from the Lox dialect through standard dialects to LLVM IR. JIT execution let us compile and run in the same process. We added garbage collection with mark-and-sweep and root tracking, closures with captured variables and upvalues, and classes and instances as GC-managed objects with methods and fields. Finally, a standard library with print, clock, and a pattern for extending.

That’s the core. Two more pieces make it production-ready: error reporting that points to the source line (Part 10), and cross-module linking so programs can span multiple files (Part 11).

Where to Go Next

If you want to keep building, here are some directions.

Cross-file programs — Part 11 shows how to compile and link multiple Lox files into a single program.

More built-ins like len(), str(), input(), and sqrt().

Error reporting with runtime type errors and line numbers (now covered in Part 10).

AOT compilation — write the LLVM IR to an object file instead of JIT.

Optimization passes between lowering stages.

A standard library in Lox — write list.lox, math.lox, etc. in Lox itself.

Debugging support — generate DWARF debug info from the source locations.

The runtime pattern we established here — declare once at module level, register with JIT, write a wrapper, call from the block — scales to any number of native functions. Each new built-in follows the same recipe. The hard part isn’t the runtime; it’s making sure the GC stays correct and the type tags are handled consistently. But you already know how to do that.

Next: Part 10 — Error Reporting and Debugging — Our compiler works — until it doesn’t. When a runtime error crashes the program, the user sees Segmentation fault and nothing else. No line number, no function name, no hint about what went wrong. We’ll thread source locations through the MLIR pipeline so runtime errors can say error on line 7: undefined variable 'x' — the difference between a usable compiler and a frustrating one.

MLIR for Lox: Part 10 — Errors That Point to the Line — Source Locations from AST to Runtime

You’ve built a compiler. It works — most of the time. When it doesn’t, you get one of two things: a panic from the runtime, or silent wrong output. Neither helps the programmer figure out what went wrong.

Let’s fix that. We’ll add two things:

Compile-time error reporting — when the parser or code generator hits a problem, the programmer gets a message that points to the exact line of source code.

Runtime error reporting — when the compiled code does something wrong (type mismatch, undefined variable), the error includes the source location, not only “something broke”.

The foundation for both already exists. Part 1 gave us source locations on every AST node, and MLIR’s first-class Location type carries them through the compilation pipeline. Now we use them.

The Problem with Bare Errors

Here’s what a runtime type error looks like today:

cannot use String in arithmetic

Which variable? Which line? Which function? The programmer has no idea. Compare:

error: cannot use String in arithmetic
  ┌─ test.lox:3:14
  │
3 │   var x = name + 1;
  │              ^^^^ String

The second version tells you where and what. The first version is barely better than a segfault. The difference is source locations.

Compile-Time Errors

Parser Errors

Our parser already has error recovery from Crafting Interpreters — it synchronizes on statement boundaries after an error. What it doesn’t have is good error messages. The ParseError type is a bare string:

#![allow(unused)]
fn main() {
// What we have
return Err(ParseError("Expected expression"));
}

We can do better. A parse error should include the token’s location:

#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct ParseError {
    pub message: String,
    pub line: usize,
    pub column: usize,
    pub source_file: String,
}

impl std::fmt::Display for ParseError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(
            f,
            "{}:{}:{}: error: {}",
            self.source_file, self.line, self.column, self.message
        )
    }
}
}

Every Token already carries a line number from the lexer. Use it:

#![allow(unused)]
fn main() {
fn expect(&mut self, expected: TokenType) -> Result<Token, ParseError> {
    if self.check(expected) {
        Ok(self.advance())
    } else {
        let current = self.peek();
        Err(ParseError {
            message: format!("Expected {}, found {}", expected, current.token_type),
            line: current.line,
            column: current.column,
            source_file: self.source_file.clone(),
        })
    }
}
}

This gives parse errors like:

test.lox:7:5: error: Expected ';', found 'print'

Not beautiful, but a huge improvement over “Expected semicolon.”

Pretty Error Formatting

The Display impl gives us test.lox:7:5: error: Expected ';', found 'print' — machine-readable but hard to scan. A formatted error that shows the source line and a caret pointing to the problem is more useful:

This is a simplified first version. The Reporter version below (“Building the Error Reporter”) is the canonical one — it adds context lines, collects multiple errors, and handles edge cases. Read this version for the core idea, then use the Reporter for real code.

#![allow(unused)]
fn main() {
// Simplified first version — works for a single parse error.
// The Reporter version below ("Building the Error Reporter") is the
// canonical one: it adds context lines and collects multiple errors.
pub fn format_error(source: &str, error: &ParseError) -> String {
    let line_text = source.lines().nth(error.line.saturating_sub(1)).unwrap_or("");
    let line_num = error.line;
    let column = error.column;

    let pad = " ".repeat(column.saturating_sub(1));
    format!(
        "error: {}\n  ┌─ {}:{}:{}\n  │\n{:>4} │ {}\n  │   {}^\n",
        error.message,
        error.source_file,
        error.line,
        error.column,
        line_num,
        line_text,
        pad,
    )
}
}

This produces:

error: Expected ';', found 'print'
  ┌─ test.lox:7:5
  │
   7 │ var x = 1
  │     ^

The caret points to exactly where the parser gave up. That’s useful. The Reporter version below adds one line of context before the error and collects multiple errors in a single pass.

Runtime Errors with Source Locations

Runtime errors are harder because by the time we’re executing JIT-compiled code, the original source locations aren’t directly available. The MLIR IR has them (every operation carries a Location), but the JIT doesn’t surface them on error.

There are two approaches:

Approach 1: Pass Locations Through to the Runtime

The obvious approach: extract the line and column from MLIR’s Location type and pass them as extra arguments to every runtime function that can fail. Here’s what it looks like — and why it’s harder than it sounds.

#![allow(unused)]
fn main() {
// MLIR signature: lox_type_check(i8, i64, i32_line, i32_col) -> i1
#[no_mangle]
pub extern "C" fn lox_type_check_wrapper(tag: u8, payload: i64, line: i32, col: i32) -> u8 {
    if tag != TAG_NUMBER {
        eprintln!("{}:{}: error: cannot use {:?} in arithmetic",
            "test.lox", line, tag_to_type_name(tag));
        return 0; // false — type check failed
    }
    1 // true — type check passed
}
}

In the code generator, pass the current source location as arguments:

#![allow(unused)]
fn main() {
fn compile_arithmetic_check(&self, tag: Value<'c, 'c>, block: &Block<'c>, location: Location<'c>) {
    // ⚠️ Extracting line/column from MLIR's Location type is version-
    // dependent. The TaggedValue approach below stores locations before
    // they enter MLIR, which avoids this problem entirely.
    let line = /* extract line from location */;
    let col = /* extract column from location */;

    let line_val = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(Type::parse(self.context, "i32").unwrap(), line as i64).into(),
        Location::unknown(self.context),
    ));
    let col_val = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(Type::parse(self.context, "i32").unwrap(), col as i64).into(),
        Location::unknown(self.context),
    ));

    let check_result = block.append_operation(func::call(
        self.context,
        FlatSymbolRefAttribute::new(self.context, "lox_type_check"),
        &[tag, line_val.result(0).unwrap().into(), col_val.result(0).unwrap().into()],
        &[Type::parse(self.context, "i1").unwrap().into()],
        Location::unknown(self.context),
    ));

    // If type check failed, we could either:
    // (a) continue with an Error value, or
    // (b) branch to a runtime error handler
}
}

This works but adds overhead — every potentially-failing operation gets two extra arguments. There’s also the version problem flagged in the code above: extracting line/column from MLIR’s Location type requires API calls that change between Melior releases. The TaggedValue approach below sidesteps this by storing locations before they go into MLIR, when the code generator still has the raw AST data. We’ll use that approach going forward. But it’s worth understanding this approach first — the runtime function signature (lox_type_check(i8, i64, i32_line, i32_col)) is the same either way. The difference is where the i32_line and i32_col values come from.

Approach 2: Stack Trace via MLIR Locations

A better approach for production: extract a stack trace from the JIT when an error occurs. MLIR operations carry locations, and the LLVM ORC JIT can map program counters back to source locations using debug info.

The setup:

When generating MLIR, every operation gets a real Location (not Location::unknown)
When lowering to LLVM IR, MLIR’s LLVM conversion pass translates FileLineColLocation values on operations into LLVM DILocation metadata. This isn’t a separate “DebugInfo pass” — it happens automatically during the convert-to-llvm pass when Location values are present.
The JIT compiles this debug info into DWARF
On error, use addr2line or LLVM’s symbolizer to map the PC to a source location

This is the approach real compilers use. It’s more complex to set up but produces proper stack traces with zero runtime overhead for the non-error path.

For this tutorial, we’ll use a variant of Approach 1 — the TaggedValue approach below. The code generator stores source locations alongside generated values instead of extracting them back from MLIR, but the runtime API is the same: line and column integers passed to error functions.

Extracting Location Data from MLIR

MLIR locations are one of several types:

Location Type	What It Holds	Example
`NameLocation`	A name string + child location	Function name + body location
`FileLineColLocation`	File name, line, column	`"test.lox:3:14"`
`FusedLocation`	Multiple locations fused together	Inlined-from + call-site
`UnknownLocation`	No information	Fallback

The one we care about is FileLineColLocation. In theory, you’d check if a location is file-line-col and extract its filename, line, and column. In practice, Melior’s location introspection API (is_file_line_col(), filename(), line(), column()) changes between versions — the methods may not exist on your version’s Location type.

There’s a simpler approach that sidesteps the version problem entirely. The code generator creates locations from AST data — it already knows the line and column when it creates each operation. Store that data alongside the generated values instead of extracting it back from MLIR.

Storing Locations in the Code Generator

A simpler approach that avoids Melior API version issues: the code generator already has the AST node’s location when it creates MLIR operations. Store the line/column alongside the generated values:

#![allow(unused)]
fn main() {
/// Tracks source location for error reporting during codegen.
#[derive(Debug, Clone, Copy)]
struct SourceLoc {
    line: usize,
    column: usize,
}

/// A value produced by the code generator, with its source location attached.
struct TaggedValue<'c, 'a> {
    tag: Value<'c, 'a>,
    payload: Value<'c, 'a>,
    loc: SourceLoc,
}
}

When the code generator creates an operation, it records the AST node’s location on the resulting TaggedValue. When a runtime call needs the location, the code generator passes it as arguments. No need to extract it back from MLIR’s Location type.

This is the approach we’ll use. It’s straightforward, works with any Melior version, and keeps the location data where it’s needed.

Building the Error Reporter

Let’s put it all together. We’ll create a Reporter that collects errors and formats them:

#![allow(unused)]
fn main() {
// src/reporter.rs

use std::io::Write;

/// A compile-time or runtime error with source location.
#[derive(Debug, Clone)]
pub struct Error {
    pub message: String,
    pub source_file: String,
    pub line: usize,
    pub column: usize,
}

/// Collects and formats errors.
pub struct Reporter {
    errors: Vec<Error>,
    source_file: String,
    source_text: String,
}

impl Reporter {
    pub fn new(source_file: String, source_text: String) -> Self {
        Reporter {
            errors: Vec::new(),
            source_file,
            source_text,
        }
    }

    /// Report a compile-time error at the given source location.
    pub fn error(&mut self, message: String, line: usize, column: usize) {
        self.errors.push(Error {
            message,
            source_file: self.source_file.clone(),
            line,
            column,
        });
    }

    /// How many errors have been reported?
    pub fn error_count(&self) -> usize {
        self.errors.len()
    }

    /// Did any errors occur?
    pub fn has_errors(&self) -> bool {
        !self.errors.is_empty()
    }

    /// Format a single error with source context.
    fn format_error(&self, error: &Error) -> String {
        let line_text = self.source_text.lines()
            .nth(error.line.saturating_sub(1))
            .unwrap_or("<no source>");

        let caret_padding = " ".repeat(error.column.saturating_sub(1));
        let line_num = error.line;

        // Show one line of context before the error, if available.
        let before = if error.line > 1 {
            self.source_text.lines()
                .nth(error.line - 2)
                .map(|l| format!("{:>4} │ {}\n", error.line - 1, l))
                .unwrap_or_default()
        } else {
            String::new()
        };

        format!(
            "error: {}\n  ┌─ {}:{}:{}\n  │\n{}{:>4} │ {}\n  │   {}^\n",
            error.message,
            error.source_file,
            error.line,
            error.column,
            before,
            line_num,
            line_text,
            caret_padding,
        )
    }

    /// Print all collected errors to stderr.
    pub fn print_errors(&self) {
        for error in &self.errors {
            eprintln!("{}", self.format_error(error));
        }
        if self.has_errors() {
            eprintln!("{} error(s) found.", self.errors.len());
        }
    }
}

/// Report a runtime error. Called from the runtime wrappers.
///
/// This is a free function (not a method on `Reporter`) because runtime
/// errors are immediate — they're printed and the program exits. There's
/// no error collection for runtime failures, so there's no `Reporter` to
/// hold them.
pub fn runtime_error(message: &str, source_file: &str, line: usize, column: usize) {
    // Runtime errors don't have source text in scope — we can't show
    // the source line. Instead, we show the location and a shorter format
    // that doesn't pretend to display the code.
    eprintln!(
        "error: {}\n  ┌─ {}:{}:{}",
        message,
        source_file,
        line,
        column,
    );
}
}

Using the Reporter

In the main compilation function:

#![allow(unused)]
fn main() {
use anyhow::{Result, anyhow};

fn compile(source: &str, source_file: &str) -> Result<()> {
    let mut reporter = Reporter::new(source_file.to_string(), source.to_string());

    // Parse
    let tokens = match lexer::tokenize(source) {
        Ok(t) => t,
        Err(e) => {
            reporter.error(format!("lexer error: {}", e), e.line, e.column);
            reporter.print_errors();
            return Err(anyhow!("lexing failed"));
        }
    };

    let ast = match parser::Parser::new(tokens).parse() {
        Ok(a) => a,
        Err(e) => {
            reporter.error(e.message.clone(), e.line, e.column);
            reporter.print_errors();
            return Err(anyhow!("parsing failed"));
        }
    };

    // Code generation
    let context = Context::new();
    let module = codegen::compile(&context, &ast, &mut reporter)?;

    if reporter.has_errors() {
        reporter.print_errors();
        // Continue to lower and run, but the programmer knows about the issues.
        // Some errors are warnings — the code may still work.
    }

    // ... JIT compilation and execution ...

    Ok(())
}
}

Runtime Error Handler

For runtime errors, we need a way for the JIT-compiled code to signal “something went wrong” and have the host print a useful message. The simplest approach: a runtime function that takes a message and a location.

#![allow(unused)]
fn main() {
// In the runtime module

/// Report a runtime error. Called from MLIR when a type check fails
/// or an undefined variable is accessed.
///
/// MLIR signature: lox_runtime_error(i32_error_kind, i32_line, i32_col) -> void
pub fn lox_runtime_error(error_kind: i32, line: i32, col: i32) {
    let message = match error_kind {
        1 => "undefined variable",
        2 => "type error in arithmetic",
        3 => "type error in comparison",
        4 => "cannot call non-function",
        5 => "undefined property",
        _ => "unknown error",
    };
    eprintln!("error: {} at line {} column {}", message, line, col);
}
}

And the wrapper:

#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn lox_runtime_error_wrapper(error_kind: i32, line: i32, col: i32) {
    lox_runtime_error(error_kind, line, col);
}
}

Generating Runtime Error Calls

When the code generator detects an error condition (like a type mismatch), instead of silently producing wrong code, it emits a call to lox_runtime_error and then continues with an Error value:

#![allow(unused)]
fn main() {
// In compile_arithmetic — after type tag check fails

fn compile_arithmetic(&self, left: &TaggedValue<'c, 'c>, right: &TaggedValue<'c, 'c>, block: &Block<'c>) -> TaggedValue<'c, 'c> {
    let location = Location::unknown(self.context);

    // Check: are both operands numbers?
    let left_is_number = self.emit_tag_check(left, TAG_NUMBER, block);
    let right_is_number = self.emit_tag_check(right, TAG_NUMBER, block);

    // Both checks must pass (logical AND on i1 results).
    // Requires `use melior::dialect::ods::arith;` (not `melior::dialect::arith`,
    // which only has constant/cmpf/cmpi/select). See the version note in
    // Part 1 for the full explanation of both import paths.
    let both_number = block.append_operation(
        arith::andi(left_is_number, right_is_number, location)
    );

    // If not both numbers, report a runtime error
    //
    // The scf::if_ API varies between Melior versions. The conceptual
    // pattern is consistent: create an scf.if that branches on the
    // tag check, report the error in the else region, and compute
    // the result in the then region.
    //
    // Here's what the generated MLIR should look like:
    //
    //   scf.if %both_number -> (i8, i64) {
    //     // then: both are numbers — extract payloads and add
    //     %lhs_f64 = llvm.bitcast %lhs_payload : i64 to f64
    //     %rhs_f64 = llvm.bitcast %rhs_payload : i64 to f64
    //     %sum = arith.addf %lhs_f64, %rhs_f64 : f64
    //     %sum_i64 = llvm.bitcast %sum : f64 to i64
    //     scf.yield %c_num, %sum_i64 : i8, i64
    //   } else {
    //     // else: type error — report and return Error tag
    //     func.call @lox_runtime_error(%err_type, %line, %col)
    //         : (i32, i32, i32) -> ()
    //     scf.yield %c_err, %zero : i8, i64
    //   }
    //
    // For the Melior API call itself, check the docs for your version.
    // The pattern is:
    //   scf::if_(context, result_types, condition,
    //       |then_region| { ... scf.yield ... },
    //       |else_region| { ... scf.yield ... },
    //       location)
    // but parameter order and builder signatures differ across releases.

    // After the scf.if, extract the (tag, payload) results:
    // let result_tag = if_op.result(0).unwrap().into();
    // let result_payload = if_op.result(1).unwrap().into();
    // TaggedValue { tag: result_tag, payload: result_payload, loc: left.loc }
    //
    // The %line and %col arguments in the MLIR above come from the
    // TaggedValue's stored source location. The codegen pattern is the
    // same arith::constant + IntegerAttribute we used for tag values:
    //
    //     let line_val = block.append_operation(arith::constant(
    //         context,
    //         IntegerAttribute::new(
    //             Type::parse(context, "i32").unwrap(),
    //             left.loc.line as i64,
    //         ).into(),
    //         location,
    //     ));
    //     let col_val = block.append_operation(arith::constant(
    //         context,
    //         IntegerAttribute::new(
    //             Type::parse(context, "i32").unwrap(),
    //             left.loc.column as i64,
    //         ).into(),
    //         location,
    //     ));
    //
    // Then pass line_val and col_val as the 2nd and 3rd arguments to
    // the func.call for lox_runtime_error.
}
}

Honest simplification: This is verbose. Real compilers use a different strategy: they generate the “happy path” code (arithmetic without checks) and only insert tag checks when the optimizer can’t prove the type. The scf.if approach above is correct but produces a lot of branches. A production compiler would use a “guard and deopt” pattern — check the type, branch to a deoptimization path if the check fails, and let the optimizer remove checks that are provably unnecessary.

Error Recovery: Continuing After an Error

A good error reporter doesn’t stop at the first error. It collects as many errors as possible before reporting. This is why our Reporter collects errors in a Vec instead of returning immediately.

There are two levels of recovery:

Compile-time recovery — the parser synchronizes and keeps going; the code generator emits placeholder values for broken expressions.

Runtime recovery — after a runtime error, the program continues execution with an Error value.

For compile-time recovery, the code generator uses Type::Error as a sentinel:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

// When codegen hits an error, produce an Error value instead of panicking

fn compile_expression(&self, expr: &Expr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> TaggedValue<'c, 'c> {
    match expr {
        // ... normal cases (Binary, Unary, Literal, etc.) ...
        //
        // If you add error nodes to the AST (a common recovery strategy
        // where the parser produces a placeholder instead of aborting),
        // you'd handle them here. Our parser uses ParseError +
        // synchronization instead of AST error nodes, so this arm
        // wouldn't appear in our Expr enum. The pattern is the same
        // either way: report the error, emit an Error-tagged value,
        // and continue.
        //
        // Example with a hypothetical Expr::Error variant:
        //
        // Expr::Error(e) => {
        //     self.reporter.error(
        //         format!("cannot compile expression: {}", e),
        //         expr.line(),
        //         expr.column(),
        //     );
        //     self.emit_error_value(block, expr.location())
        // }
    }
}

fn emit_error_value(&self, block: &Block<'c>, loc: SourceLoc) -> TaggedValue<'c, 'c> {
    let location = Location::unknown(self.context);
    let tag = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(Type::parse(self.context, "i8").unwrap(), TAG_ERROR as i64).into(),
        location,
    ));
    let payload = block.append_operation(arith::constant(
        self.context,
        IntegerAttribute::new(Type::parse(self.context, "i64").unwrap(), 0).into(),
        location,
    ));
    TaggedValue {
        tag: tag.result(0).unwrap().into(),
        payload: payload.result(0).unwrap().into(),
        loc,
    }
}
}

The TAG_ERROR tag is a special tag value that indicates “something went wrong producing this value.” Any operation that receives an Error-tagged value propagates the error instead of producing a new one. This prevents error cascading — one error doesn’t produce a cascade of spurious errors downstream.

TAG_ERROR isn’t in the ObjType enum from Part 7 (which uses its own numbering: BoundMethod = 6). It’s a compiled value tag — it only exists in the i8 tag byte of the tagged union, not in any heap object’s header. The runtime never produces it; only the code generator writes it when it can’t compile an expression. The compiled value tags from Part 7 use 0–7 (Nil through BoundMethod), so TAG_ERROR is 8.

#![allow(unused)]
fn main() {
// Extended from compiled value tags in Part 7 — compiler-only, not a runtime object type.
// Tags 0–7 are used: Nil(0), Bool(1), Number(2), String(3), Closure(4),
// Instance(5), Class(6), BoundMethod(7). TAG_ERROR is the next available value.
const TAG_ERROR: u8 = 8;
}

This tag value will never appear in a heap object’s ObjType field. It only exists in the compiled tagged union’s i8 tag byte, where it tells downstream operations “this value is invalid — skip your logic and propagate.”

#![allow(unused)]
fn main() {
// In compile_arithmetic — check for Error values first

fn compile_arithmetic(&self, left: &TaggedValue<'c, 'c>, right: &TaggedValue<'c, 'c>, block: &Block<'c>) -> TaggedValue<'c, 'c> {
    // If either operand is an Error, propagate it silently.
    // No need to report another error — the source was already reported.
    //
    // ⚠️ `left.tag` and `right.tag` are MLIR `Value<'c>` (SSA values),
    // not Rust integers. You emit MLIR-level operations, not Rust `if`.
    // The pattern is the same `scf.if` shown above for the type check —
    // with a different condition (checking for TAG_ERROR instead
    // of TAG_NUMBER) and different branches:
    //
    //   %is_left_error  = arith.cmpi eq, %left_tag,  %TAG_ERROR : i8
    //   %is_right_error = arith.cmpi eq, %right_tag, %TAG_ERROR : i8
    //   %any_error      = arith.ori %is_left_error, %is_right_error : i1
    //   %result = scf.if %any_error -> (i8, i64) {
    //       scf.yield %error_tag, %zero_payload : i8, i64
    //   } else {
    //       // ... normal type-checking and arithmetic ...
    //       scf.yield %result_tag, %result_payload : i8, i64
    //   }
    //
    // See Part 1's compile_if for the original scf.if implementation.

    // ... normal type-checking and arithmetic code ...
}
}

This is the same pattern we used in the Lua type checker tutorial (Chapter 3, type compatibility rules): Error is compatible with everything (to prevent cascading), but it propagates silently.

Putting It Together: An Error in Action

Let’s trace a real error through the system. Given:

var name = "Alice";
var x = name + 1;
print x;

Parser — parses successfully. No errors.
Code generator — compiles var name = "Alice" → string constant with TAG_STRING. Compiles name + 1:
- Left operand: TAG_STRING (from name variable lookup)
- Right operand: TAG_NUMBER (from 1 literal)
- Type check: neither is Error, so proceed to tag check
- scf.if branch: left is not TAG_NUMBER → else branch fires
- Else branch: calls lox_runtime_error(2, 2, 9) — “type error in arithmetic” at line 2, column 9
- Result: TAG_ERROR value

Runtime — when the program runs, the output is:

error: type error in arithmetic at line 2 column 9

print x — receives an Error value, prints <error> instead of crashing

Not a beautiful diagnostic, but it’s correct — it tells the programmer exactly what went wrong and where. The next step would be to add the source line display (like the format_error function above), but that requires the runtime to have access to the source text, which means threading it through the Runtime struct or looking it up from a global.

A Note on Production Error Reporting

Real compilers do much more than what we’ve shown here. A compiler shouldn’t report the same error twice, or report errors that are consequences of earlier errors — the Error-tag propagation pattern helps, but it’s not perfect. Production compilers assign numeric codes to errors (E0308: mismatched types in Rust) so users can look them up; our error_kind integer is a step in this direction. Suggestions like “did you mean add instead of aad?” require fuzzy matching on identifiers and are worth adding for a real tool. Terminal colors make errors easier to read — the termcolor crate is the standard approach in Rust. And IDEs and language servers need errors as structured data (JSON), not formatted text — the Reporter could implement a to_json() method for this use case.

These are all incremental improvements on the same foundation: source locations + error collection + formatted output. The pattern doesn’t change; the polish does.

Next: Part 11 — Cross-Module Linking — Every program we’ve compiled lives in a single file. Real programs don’t. We’ll compile two Lox files to two MLIR modules, then merge them into one executable — and see why the linker needs to resolve symbol references across module boundaries.

MLIR for Lox: Part 11 — Two Files, One Program — Cross-Module Linking in MLIR

Every program in this tutorial has been a single file. print 1 + 2; — parse it, generate MLIR, JIT it, done. Real programs don’t work that way. You write math.lox with a square function, then main.lox calls square(4). Two files. Two modules. One program.

This part shows how that works in MLIR. The good news: MLIR was built for this. The symbol model, the module structure, the JIT’s symbol resolver — they all support multiple modules out of the box. The trick is understanding how the pieces connect.

The Problem

Here’s our goal. We have two Lox files:

Note on function signatures: Part 6 uses func.func @main() with no return type (the GC-aware model with numbers-only values). Part 7 uses func.func @main() -> !llvm.struct<(i8, i64)> (tagged-union model). This part uses func.func @main() -> i32 because a multi-file compiler needs a well-defined exit status — the i32 return value is the process exit code.

The IR examples in this part use the numbers-only model (f64 arithmetic) for clarity — the linking concepts are the same regardless of whether you’re passing raw floats or tagged (i8, i64) pairs. In the tagged-union model, every function call passes (i8, i64) pairs instead of bare f64 values, and @lox_print takes two arguments instead of one. The linking, mangling, and merging steps don’t change — only the value representation does.

// math.lox
fun square(n) {
  return n * n;
}

fun add(a, b) {
  return a + b;
}

// main.lox
print square(3);

When we compile main.lox, the code generator produces:

module {
  func.func private @lox_print(f64)

  func.func @main() -> i32 {
    // ... compute square(3) ...
    func.call @square(%three) : (f64) -> f64
    // ... print the result ...
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

The JIT tries to resolve @square and fails — there’s no definition. We never compiled math.lox. The program crashes with a symbol-not-found error.

We need two things:

Compile each file to its own MLIR module. math.lox becomes a module with @square and @add, main.lox becomes a module with @main.

Link the modules together. When main.lox calls @square, the JIT needs to find it in math.lox’s module.

Let’s start with how MLIR thinks about symbols.

How MLIR Symbols Work

MLIR has a built-in symbol system. It works like you’d expect if you’ve used linkers before, but it’s worth spelling out because MLIR’s version has some specific rules.

Symbol Tables and Visibility

Every builtin.module has a symbol table. Operations that live in a module’s symbol table — func.func, llvm.mlir.global, llvm.func — are symbols. They have a name (like @square, @lox_print, or @main) and a visibility — public, private, or nested.

module {
  func.func @main() -> i32 { ... }       // public by default
  func.func private @helper() -> f64 { ... }  // private — not visible outside this module
}

The default visibility is public — any module can reference a public symbol. private means “this module only.” We also use private for runtime declarations like func.func private @lox_print(f64) — but that’s a different use: it’s a declaration (no body), not a private implementation. The JIT resolves @lox_print from registered host symbols, not from other Lox modules. We’ll explain the distinction shortly.

Why private for runtime declarations? A func.func private @lox_print(f64) with no body is a declaration — it says “this symbol exists, but I’m not defining it here.” The JIT resolves it from the registered runtime symbols. We use private to signal that these are implementation details, not part of the Lox program’s public API. After merging, all functions end up in the same module, so private doesn’t restrict access — any function in the merged module can call @lox_print. (MLIR always allows declarations without definitions, regardless of visibility — the error would come from the JIT at runtime, not the verifier.)

Nested Modules

MLIR allows modules inside modules:

module {
  module @math {
    func.func @square(%arg0: f64) -> f64 { ... }
    func.func @add(%arg0: f64, %arg1: f64) -> f64 { ... }
  }
  module @main_mod {
    func.func @main() -> i32 {
      // This CANNOT directly call @math::@square
      // Cross-module references use symbol references
    }
  }
}

Nested modules have their own symbol tables. A function in @main_mod can’t write func.call @square(...) directly — that name only exists in @math’s symbol table, not @main_mod’s.

You can reference symbols across modules using MLIR’s @module::@symbol syntax, but in practice, most compilers take a simpler approach: separate modules, link at JIT time.

That’s the approach we’ll use. Each Lox file compiles to a standalone builtin.module. The JIT links them together when it resolves symbols. This mirrors how real compilers work — each .c file compiles to a .o file, and the linker connects them.

Compiling Multiple Files

The compilation pipeline needs one change: instead of compiling one file, we compile a list of files, then feed all the resulting modules to the JIT.

The Multi-File Compiler

#![allow(unused)]
fn main() {
// src/compiler.rs

use anyhow::Result;
use melior::{
    ir::Module,
    Context,
};
use std::path::Path;

/// Compile a Lox source file into an MLIR module.
///
/// This is the same compilation pipeline from Parts 1–11. The only
/// difference is that we return a Module instead of immediately
/// running it through the JIT.
fn compile_file(context: &Context, source: &str, filename: &str) -> Result<Module<'_>> {
    let tokens = lexer::tokenize(source);
    let ast = parser::parse(tokens)?;
    let module = codegen::generate(context, &ast, filename)?;
    Ok(module)
}

/// Compile multiple Lox files and return all MLIR modules.
///
/// Each file gets its own module. All modules are later merged into one
/// before JIT execution (see `link_and_run`), so compilation order doesn't
/// affect symbol resolution — every symbol ends up in the same table
/// regardless of which module was compiled first. Order *would* matter if
/// we used per-module JIT linking (ORC multi-JITDylib) instead of merging.
pub fn compile_files(
    context: &Context,
    files: &[(String, String)],  // (filename, source)
) -> Result<Vec<Module<'_>>> {
    let mut modules = Vec::new();

    for (filename, source) in files {
        let module = compile_file(context, source, filename)?;
        modules.push(module);
    }

    Ok(modules)
}
}

We compile each file independently and collect the modules. The key question is what each module looks like.

What `math.lox` Compiles To

module {
  func.func @square(%arg0: f64) -> f64 {
    %result = arith.mulf %arg0, %arg0 : f64
    func.return %result : f64
  }

  func.func @add(%arg0: f64, %arg1: f64) -> f64 {
    %result = arith.addf %arg0, %arg1 : f64
    func.return %result : f64
  }
}

Both functions are public (default visibility). No @main function — this module is a library.

What `main.lox` Compiles To

module {
  func.func private @lox_print(f64)

  func.func @main() -> i32 {
    %three = arith.constant 3.0 : f64
    %result = func.call @square(%three) : (f64) -> f64
    // ... call lox_print with result ...
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

@main calls @square — but @square isn’t defined in this module. In single-file mode, this would fail verification. In multi-file mode, the JIT resolves it from the other module after merging.

This is a simplification. A real compiler would verify that @square exists somewhere in the linked modules before running the JIT. We’re relying on the JIT’s symbol resolver to catch missing definitions at runtime. For a tutorial, this is fine — the error message is clear (“symbol not found: @square”). A production compiler would want a separate linking/verification step.

Linking at JIT Time

MLIR’s JIT uses LLVM’s ORC JIT underneath. ORC maintains a symbol table — when the JIT compiles @main and encounters func.call @square(...), it looks up @square in the symbol table. If it finds it (because we already compiled @math’s module), the call works. If not, symbol-not-found.

The linking strategy is straightforward: compile dependency modules first, then the main module.

#![allow(unused)]
fn main() {
// src/jit.rs

use anyhow::{Result, anyhow};
use melior::{
    execution_engine::ExecutionEngine,
    ir::Module,
    Context,
};
use melior::ir::RegionLike;

/// Link multiple MLIR modules and execute @main.
///
/// The approach: merge all modules into one, then JIT the merged module.
/// After merging, every `func.call` has its target in the same symbol
/// table — the JIT resolves symbols the same way it does for a
/// single-file program.
///
/// A more sophisticated approach would use ORC's multi-JITDylib API
/// to add each module to the JIT separately and let the symbol
/// resolver connect them. Merging is simpler and works for our use case.
///
/// This is the same pattern as object-file linking: compile .c files
/// to .o files, then link them into a single executable. Our merge step
/// is the "link" — we do it at the MLIR level, not the object-file level.
pub fn link_and_run(modules: Vec<Module<'_>>, context: &Context) -> Result<()> {
    // We need one execution engine that all modules share.
    // The first module creates the engine; subsequent modules are
    // added to the same JIT session.
    //
    // Note: Melior's ExecutionEngine API varies between versions.
    // In Melior 0.27, you typically:
    //   1. Create the engine from the first module
    //   2. Register runtime symbols (lox_print, lox_clock, etc.)
    //   3. Invoke @main

    // For multi-module support, we merge all modules into one
    // before creating the engine. This is simpler than managing
    // multiple JIT sessions and works for our use case.

    let merged = merge_modules(&modules, context);

    let engine = ExecutionEngine::new(&merged, 2, &[], false, false);
    //                               module  opt   libs  dump  pic ^
    //
    // ExecutionEngine::new takes five parameters in Melior 0.27:
    //   - module: the MLIR module to JIT
    //   - optimization_level: 0, 1, or 2
    //   - shared_library_paths: paths to shared libraries for the JIT
    //   - enable_object_dump: whether to allow dumping to object files
    //   - enable_pic: whether to enable position-independent code
    //
    // The API changes between Melior versions — verify against your version's docs.
    unsafe { register_runtime_symbols(&engine); }

    // Invoke @main
    //
    // The MLIR declares @main() -> i32 (exit status), but invoke_packed
    // here uses &mut [] (no return buffer). This means we don't capture
    // the exit code. A production compiler would pass &mut [0i64] to
    // capture the i32 return value. For this tutorial, success/failure
    // is enough.
    let result = unsafe {
        engine.invoke_packed("main", &mut [])
    };

    result.map_err(|e| anyhow!("JIT execution failed: {:?}", e))?;
    Ok(())
}
}

The register_runtime_symbols function wraps the individual register_symbol calls we showed in Part 9. It registers each runtime function that the JIT needs to resolve — lox_print, gc_push_frame, gc_pop_frame, and the allocator — so that func.call @lox_print(...) in our MLIR can find the actual Rust implementation at execution time:

#![allow(unused)]
fn main() {
/// Register runtime symbols with the execution engine.
///
/// Each `register_symbol` call maps a symbol name (like "lox_print")
/// to a Rust function pointer. When the JIT encounters
/// `func.call @lox_print`, it looks up the symbol here. Without
/// this registration, the JIT throws a symbol-not-found error at
/// runtime.
///
/// In Melior 0.27, the method is `register_symbol` and it takes
/// `*mut ()` — a raw mutable pointer. The `unsafe` block is
/// required because registering a symbol makes the pointer
/// accessible to JIT'd code. If the pointer is invalid or
/// misaligned, this is undefined behavior.
///
/// See Part 9 (Standard Library and Runtime) for the full details
/// on each runtime function and how the wrapper types work.
unsafe fn register_runtime_symbols(engine: &ExecutionEngine) {
    engine.register_symbol("lox_print", lox_print_wrapper as *mut ());
    engine.register_symbol("gc_push_frame", gc_push_frame_wrapper as *mut ());
    engine.register_symbol("gc_pop_frame", gc_pop_frame_wrapper as *mut ());
    engine.register_symbol("lox_runtime_alloc", lox_runtime_alloc_wrapper as *mut ());
}
}

The wrapper functions (lox_print_wrapper, etc.) are thin extern "C" functions that bridge between the LLVM calling convention and our Rust runtime. Part 9 shows how to define them. The key point for cross-module linking: every module’s runtime calls resolve through the same symbol table. Whether @lox_print appears in math.lox’s MLIR or main.lox’s MLIR, it finds the same lox_print_wrapper — because the merged module shares one engine, one symbol table.

Merging Modules

The simplest linking approach: merge all modules into one. Take every func.func from every module and put them in a single builtin.module:

#![allow(unused)]
fn main() {
use std::collections::HashSet;

/// Merge multiple MLIR modules into one.
///
/// This is the "poor man's linker" — we concatenate all the function
/// definitions into a single module. The MLIR verifier is happy because
/// every symbol reference now has a definition in the same symbol table.
///
/// A production compiler might use MLIR's symbol resolution or ORC's
/// proper multi-JITDylib linking. For a tutorial, merging is simpler
/// and achieves the same result.
fn merge_modules(modules: &[Module<'_>], context: &Context) -> Module<'_> {
    // Note: Location is from melior::ir::Location
    let merged = Module::new(Location::unknown(context));

    // Track which symbols we've already added to the merged module.
    // MLIR's verifier rejects duplicate symbols — even duplicate
    // *declarations* — in the same module. Every source module
    // has its own @lox_print declaration, so we'd get duplicates
    // if we blindly appended everything.
    let mut seen_symbols: HashSet<String> = HashSet::new();

    for module in modules {
        // Module::body() returns a Region, and iterating a Region
        // yields Blocks, not Operations. A builtin.module always has
        // exactly one block in its body region — we iterate the
        // operations *within* that block.
        for op in module.body().first_block().unwrap().iter() {
            // Check if this operation defines a symbol.
            // In Melior, OperationLike::name() returns the operation's
            // dialect.name (e.g., "func.func"). If the operation has a
            // symbol name attribute, we read it via
            //   op.as_operation().attribute("sym_name")
            // which returns an Attribute containing the @-prefixed name
            // like "@lox_print". We strip the @ to get the bare name
            // for deduplication.
            //
            // Not all operations in a module body are symbols — but
            // in practice, the only operations at module level in our
            // compiler are func.func definitions and declarations.
            // Both have sym_name attributes.
            if let Some(sym_attr) = op.as_operation().attribute("sym_name") {
                let sym_name = sym_attr.to_string();
                // sym_name comes back as "@lox_print" or "@main" etc.
                // Strip the @ for the HashSet key.
                let bare_name = sym_name.trim_start_matches('@');
                if seen_symbols.contains(bare_name) {
                    // Skip this operation — we already have a symbol
                    // with this name in the merged module.
                    //
                    // This is correct for declarations (they're interchangeable)
                    // and for definitions (name mangling should prevent duplicate
                    // definitions — see the "Duplicate Symbol Errors" section).
                    continue;
                }
                seen_symbols.insert(bare_name.to_string());
            }

            // Operation::clone() creates a deep copy (via mlirOperationClone)
            // so we can insert the copy into the merged module. The original
            // stays in the source module. This is safe but not free — each
            // clone allocates a new MLIR operation. For a tutorial compiler,
            // this is fine.
            merged.body().append_operation(op.clone());
        }
    }

    merged
}
}

After merging, our two modules become one:

module {
  // From math.lox:
  func.func @square(%arg0: f64) -> f64 {
    %result = arith.mulf %arg0, %arg0 : f64
    func.return %result : f64
  }
  func.func @add(%arg0: f64, %arg1: f64) -> f64 {
    %result = arith.addf %arg0, %arg1 : f64
    func.return %result : f64
  }

  // From main.lox:
  func.func private @lox_print(f64)
  func.func @main() -> i32 {
    %three = arith.constant 3.0 : f64
    %result = func.call @square(%three) : (f64) -> f64
    // ... print result ...
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

Now @main’s call to @square resolves — the definition is in the same symbol table. The JIT compiles this merged module and everything works.

About private declarations after merging. The @lox_print declaration is marked private — a hint that these are implementation details, not part of the Lox program’s public API. After merging, all functions are in the same module, so private doesn’t restrict access.

Before merging, each source module has its own @lox_print declaration. MLIR allows the same declaration to appear in different modules — having func.func private @lox_print(f64) in both math.lox’s module and main.lox’s module is fine because they’re in separate symbol tables.

After merging, those separate symbol tables become one. MLIR’s verifier rejects duplicate symbols in a single module — even duplicate declarations. That’s why merge_modules uses a HashSet to track which symbols it has already added. The second @lox_print declaration is silently skipped — the first one is sufficient, and the JIT resolves it from the registered runtime symbols.

For runtime declarations that every module needs, public visibility would also work — we use private purely as a signal to the reader.

Handling Name Collisions

What happens when two files define a function with the same name?

// utils.lox
fun process(x) {
  return x + 1;
}

// main.lox
fun process(x) {
  return x * 2;
}

print process(3);

Both files define @process. After merging, we’d have two func.func @process definitions in the same module — MLIR’s verifier rejects this. Duplicate symbols.

Real compilers solve this with name mangling — each source file’s symbols get a unique prefix:

module {
  func.func @utils__process(%arg0: f64) -> f64 { ... }
  func.func @main__process(%arg0: f64) -> f64 { ... }
  func.func @main() -> i32 {
    // Calls main's process, not utils's
    func.call @main__process(%three) : (f64) -> f64
  }
}

The mangling scheme is up to you. Common patterns:

Scheme	Example	Pros	Cons
File prefix	`@math__square`	Simple, readable	Doesn’t handle nested modules
Encoded (C++ Itanium)	`@_Z6square`	Handles all cases	Unreadable, C++ ABI baggage
Path-based	`@src_math_square`	Unambiguous	Verbose

For Lox, where we don’t have namespaces or imports, file-prefix mangling is the right starting point:

#![allow(unused)]
fn main() {
use std::path::Path;

/// Mangle a function name with its source file.
///
/// "square" from "math.lox" becomes "math__square".
/// "main" from any file stays "main" (the JIT entry point).
/// MLIR's printer adds the `@` prefix when rendering the IR — the function
/// returns bare names because `FlatSymbolRefAttribute` expects them without `@`.
fn mangle_name(function_name: &str, source_file: &str) -> String {
    if function_name == "main" {
        return "main".to_string();  // Entry point is always @main in the rendered IR
    }
    // Strip extension and path, use only the base filename.
    // This means src/math.lox and test/math.lox would produce the same
    // mangled name — fine for a flat directory of Lox files, but a real
    // compiler would need to include the directory path to avoid collisions.
    let file_stem = Path::new(source_file)
        .file_stem()
        .and_then(|s| s.to_str())
        .unwrap_or("unknown");

    format!("{}__{}", file_stem, function_name)
}
}

⚠️ The main name is reserved. If a Lox program defines fun main() { ... }, it collides with the compiler-generated @main entry point. The mangling function silently lets this happen — it returns "main" for any function named main, regardless of the source file. The collect_symbols function catches this at compile time: it reports a duplicate function error ("duplicate function 'main': defined in both 'main.lox' and 'math.lox'") if two files define main. A production compiler would either reserve the name (and report a user error if the Lox program tries to define it), or use a different entry point convention (e.g., @__lox_main).

When main.lox calls square(3), the code generator needs to know which file defines square. This is the symbol resolution problem — and it’s where things get interesting.

Symbol Resolution: How `main.lox` Finds `square`

In Lox, there’s no import statement. Every function is visible to every other function in the same program. This is the same as C’s model: all functions share a single global namespace.

When the code generator for main.lox encounters square(3), it needs to emit func.call @math__square(...). But at compile time, main.lox doesn’t know that square lives in math.lox. We need a resolution step.

Two-Pass Approach

First pass — collect declarations. Scan all source files. For each function, record its name and source file. Build a symbol table:

#![allow(unused)]
fn main() {
use anyhow::{Result, anyhow};
use std::collections::HashMap;
use std::path::Path;

use crate::ast::Stmt;

/// A resolved symbol — a function name plus the file it's defined in.
struct ResolvedSymbol {
    mangled_name: String,   // "math__square" (no @ prefix — FlatSymbolRefAttribute adds it)
    source_file: String,    // "math.lox"
    original_name: String,  // "square"
}

/// Build a symbol table from all source files.
///
/// This is the compiler's "header" phase — we don't compile anything,
/// we learn what functions exist and where they live.
///
/// We parse each file twice — once here to collect symbols, once in
/// compile_file_with_symbols to generate code. A production compiler
/// would cache the AST from the first pass instead of reparsing.
fn collect_symbols(files: &[(String, String)]) -> Result<HashMap<String, ResolvedSymbol>> {
    let mut symbols = HashMap::new();

    for (filename, source) in files {
        let tokens = lexer::tokenize(source);
        let ast = parser::parse(tokens)?;

        // Walk the top-level statements for function declarations.
        // We only need function names — no bodies, no type info.
        for stmt in &ast {
            if let Stmt::Function(f) = stmt {
                // Duplicate check omitted for clarity — see the
                // "Duplicate Symbol Errors" section below for the
                // version that reports duplicate function names.
                let name = &f.name;
                let mangled = mangle_name(name, filename);
                symbols.insert(name.clone(), ResolvedSymbol {
                    mangled_name: mangled,
                    source_file: filename.clone(),
                    original_name: name.clone(),
                });
            }
        }
    }

    Ok(symbols)
}
}

Second pass — compile with resolved names. Each call to square(3) becomes func.call @math__square(%three) : (f64) -> f64 because the code generator knows square is in math.lox.

#![allow(unused)]
fn main() {
use anyhow::Result;

/// Compile all files with symbol resolution.
///
/// The flow:
///   1. Collect symbols from all files (what functions exist, where)
///   2. Compile each file, using the symbol table to resolve call targets
///   3. Merge all modules and run
pub fn compile_and_run(files: &[(String, String)]) -> Result<()> {
    let context = Context::new();

    // Step 1: Collect symbols
    let symbols = collect_symbols(files)?;

    // Step 2: Compile each file with symbol resolution
    let mut modules = Vec::new();
    for (filename, source) in files {
        let module = compile_file_with_symbols(&context, source, filename, &symbols)?;
        modules.push(module);
    }

    // Step 3: Merge and run
    link_and_run(modules, &context)
}
}

The code generator needs one change: when emitting a func.call, look up the function name in the symbol table instead of using the raw name.

#![allow(unused)]
fn main() {
// Before (single-file mode):
fn emit_call(&self, name: &str, args: &[Value<'c, '_>], block: &Block<'c>) -> Value<'c, '_> {
    let callee = FlatSymbolRefAttribute::new(self.context, name);
    // func.call @square(...)
}

// After (multi-file mode) — emit_call uses self.symbols from the CodeGenerator struct:
fn emit_call(&self, name: &str, args: &[Value<'c, '_>], block: &Block<'c>) -> Value<'c, '_> {
    // mangle_name returns names WITHOUT the @ prefix —
    // FlatSymbolRefAttribute::new expects bare names like "math__square",
    // and MLIR's printer adds the @ when rendering the IR.
    let resolved = self.symbols
        .and_then(|sym| sym.get(name))
        .map(|s| s.mangled_name.as_str())
        .unwrap_or(name);  // Safe in single-file mode (symbols is None,
                           // so and_then short-circuits). In multi-file mode,
                           // a missing entry means the function wasn't found
                           // in any compiled file — the JIT will throw a
                           // symbol-not-found error at runtime.
    let callee = FlatSymbolRefAttribute::new(self.context, resolved);
    // func.call @math__square(...)
}
}

That’s it. The code generator emits mangled names, the merged module has mangled definitions, and everything lines up.

The `compile_file_with_symbols` Function

The compile_and_run function above calls compile_file_with_symbols, which is the same as compile_file from earlier but passes the symbol table to the code generator:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use anyhow::Result;

/// Compile a single Lox source file with symbol resolution.
///
/// Same as compile_file, but the code generator uses the symbol
/// table to resolve function names to their mangled equivalents.
fn compile_file_with_symbols<'c>(
    context: &'c Context,
    source: &str,
    filename: &str,
    symbols: &HashMap<String, ResolvedSymbol>,
) -> Result<Module<'c>> {
    let tokens = lexer::tokenize(source);
    let ast = parser::parse(tokens)?;
    let module = codegen::generate_with_symbols(context, &ast, filename, symbols)?;
    Ok(module)
}
}

The only difference from compile_file: the code generator receives the symbol table and uses it in emit_call. Here’s how the CodeGenerator struct changes:

#![allow(unused)]
fn main() {
use std::collections::HashMap;

/// The code generator — unchanged from earlier parts, except for
/// the new `symbols` field for multi-file compilation.
struct CodeGenerator<'c, 'a> {
    context: &'c Context,
    symbols: Option<&'a HashMap<String, ResolvedSymbol>>,
    // ... other fields from earlier parts ...
}

impl<'c, 'a> CodeGenerator<'c, 'a> {
    /// Create a code generator for single-file mode (no symbol resolution).
    fn new(context: &'c Context) -> Self {
        Self { context, symbols: None }
    }

    /// Create a code generator with symbol resolution for multi-file mode.
    fn with_symbols(context: &'c Context, symbols: &'a HashMap<String, ResolvedSymbol>) -> Self {
        Self { context, symbols: Some(symbols) }
    }
}

/// The public entry point: compile an AST to MLIR with optional symbol resolution.
pub fn generate_with_symbols<'c>(
    context: &'c Context,
    ast: &[Statement],
    filename: &str,
    symbols: &HashMap<String, ResolvedSymbol>,
) -> Result<Module<'c>> {
    let mut generator = CodeGenerator::with_symbols(context, symbols);
    generator.compile(ast, filename)
}

/// Single-file mode: same as before, no symbol table.
pub fn generate<'c>(
    context: &'c Context,
    ast: &[Statement],
    filename: &str,
) -> Result<Module<'c>> {
    let mut generator = CodeGenerator::new(context);
    generator.compile(ast, filename)
}
}

The emit_call method checks self.symbols — when it’s Some, it resolves the function name through the symbol table. When it’s None (single-file mode), it falls back to the raw function name. What happens if self.symbols is Some but the function name isn’t in the table? The unwrap_or(name) fallback produces the raw, unmangled name — which won’t match any symbol in the merged module. The JIT will throw a symbol-not-found error at runtime. This is intentional: a missing symbol table entry in multi-file mode means the function wasn’t defined in any compiled file, and failing at JIT time with a clear error is better than silently producing incorrect code. If you’d rather catch this at compile time, replace the unwrap_or with expect or return a Result.

The Complete Flow

Let’s trace through our example end-to-end.

Input files:

// math.lox
fun square(n) { return n * n; }
fun add(a, b) { return a + b; }

// main.lox
print square(3);

Step 1: Collect symbols

"square" → ResolvedSymbol { mangled: "math__square", file: "math.lox" }
"add"    → ResolvedSymbol { mangled: "math__add",    file: "math.lox" }

Step 2: Compile each file

math.lox →:

module {
  func.func @math__square(%arg0: f64) -> f64 {
    %result = arith.mulf %arg0, %arg0 : f64
    func.return %result : f64
  }
  func.func @math__add(%arg0: f64, %arg1: f64) -> f64 {
    %result = arith.addf %arg0, %arg1 : f64
    func.return %result : f64
  }
}

main.lox →:

module {
  func.func private @lox_print(f64)
  func.func @main() -> i32 {
    %three = arith.constant 3.0 : f64
    %result = func.call @math__square(%three) : (f64) -> f64
    func.call @lox_print(%result) : (f64) -> ()
    %zero = arith.constant 0 : i32
    func.return %zero : i32
  }
}

Step 3: Merge and run

The merged module has @math__square, @math__add, @lox_print, and @main. The JIT resolves all symbols. Output: 9.

What About the Garbage Collector?

One thing to watch: the GC’s shadow stack. Each function pushes a frame on entry and pops it on exit. When main.lox calls math__square, the call crosses module boundaries — but the GC doesn’t care about modules. It cares about the call stack.

The shadow stack operations (lox.push_frame, lox.pop_frame) are already in each function’s MLIR. After lowering, they become func.call @gc_push_frame(...) and func.call @gc_pop_frame(...) — the connection between the Lox dialect operations and the lowered runtime calls is covered in Part 4’s lowering pipeline. These resolve to the same global RUNTIME we set up in Part 9. Cross-module calls work the same as intra-module calls — the GC frame management is per-function, not per-module.

func.func @math__square(%arg0: f64) -> f64 {
  %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
  // ... compute n * n ...
  func.call @gc_pop_frame() : () -> ()
  func.return %result : f64
}

func.func @main() -> i32 {
  %frame = func.call @gc_push_frame(%root_count) : (i32) -> !llvm.ptr
  %result = func.call @math__square(%three) : (f64) -> f64
  func.call @gc_pop_frame() : () -> ()
  // ...
}

gc_push_frame returns a roots array pointer (!llvm.ptr) — the same pointer the compiler uses with lox.set_root to store root values via GEP + store (see Part 4 for the lowering details). Even though math__square doesn’t need to root any values in this simplified example (its single f64 argument lives in a register), the function still pushes a frame — the GC needs to know about all live frames, even ones without roots, so it can walk the complete call stack during collection.

When @main calls @math__square, the runtime call stack looks like:

main → gc_push_frame
     → math__square → gc_push_frame
                     → gc_pop_frame  (square's frame removed)
     → gc_pop_frame  (main's frame removed)

Two separate frames on the shadow stack, two separate pops. The GC sees the full call chain. No special handling needed.

Duplicate Symbol Errors

What if two files define the same function? The symbol table catches it during the collection phase — here’s the collect_symbols function from earlier, with the duplicate check added:

#![allow(unused)]
fn main() {
use anyhow::{Result, anyhow};
use std::collections::HashMap;
use crate::ast::Stmt;

fn collect_symbols(files: &[(String, String)]) -> Result<HashMap<String, ResolvedSymbol>> {
    let mut symbols = HashMap::new();

    for (filename, source) in files {
        let tokens = lexer::tokenize(source);
        let ast = parser::parse(tokens)?;

        for stmt in &ast {
            if let Stmt::Function(f) = stmt {
                let name = &f.name;
                if let Some(existing) = symbols.get(name) {
                    return Err(anyhow!(
                        "duplicate function '{}': defined in both '{}' and '{}'",
                        name, existing.source_file, filename
                    ));
                }
                let mangled = mangle_name(name, filename);
                symbols.insert(name.clone(), ResolvedSymbol {
                    mangled_name: mangled,
                    source_file: filename.clone(),
                    original_name: name.clone(),
                });
            }
        }
    }

    Ok(symbols)
}
}

This is better than letting MLIR’s verifier catch it — we can report the error with file names and line numbers, not only “duplicate symbol @process.”

This is a simplification. Lox doesn’t have namespaces or a module system, so every function shares one global namespace. Real languages have import, pub, or visibility modifiers. The two-pass approach still works — you need richer symbol entries that carry visibility and namespace information alongside the name. The core idea is the same: collect names first, then compile with resolution.

What We Changed

The multi-file compiler adds three things on top of the single-file pipeline:

Symbol collection — scan all files for function names before compiling.

Name mangling — prefix each function with its source file to avoid collisions.

Module merging — combine all compiled modules into one before JIT execution.

Here’s the updated pipeline:

┌─────────────┐     ┌──────────────┐     ┌──────────────┐     ┌─────────┐
│ Source files │────▶│  Collect     │────▶│  Compile     │────▶│  Merge  │
│ *.lox        │     │  symbols     │     │  each file   │     │  modules│
└─────────────┘     └──────────────┘     └──────────────┘     └────┬────┘
                                                                  │
                                            ┌─────────────────────┘
                                            ▼
                                     ┌──────────────┐     ┌──────┐
                                     │  Lower + JIT │────▶│ Run  │
                                     └──────────────┘     └──────┘

The single-file pipeline was:

┌─────────────┐     ┌──────────────┐     ┌──────┐
│ Source file  │────▶│  Compile +   │────▶│ Run  │
│ *.lox        │     │  Lower + JIT │     │      │
└─────────────┘     └──────────────┘     └──────┘

Same destination, more steps. The extra steps are all about connecting symbols across files — the compilation within each file hasn’t changed.

Going Further

This is the last planned part of the tutorial. Here are directions you could take from here:

AOT compilation. We’ve used the JIT throughout, but MLIR supports ahead-of-time compilation too. The lowering pipeline already produces LLVM dialect MLIR — but you can’t pipe that directly to llc. First, you need to translate LLVM dialect MLIR to actual LLVM IR using mlir-translate --mlir-to-llvmir (or the Melior translation API). Then compile the resulting LLVM IR with llc and link it into a standalone executable. The runtime symbols (lox_print, gc_push_frame, etc.) become ordinary C functions you link against. This is how a real Lox compiler would ship.

The command-line pipeline looks like this:

Before you start: mlir-opt and mlir-translate are command-line tools from the MLIR project. If you installed MLIR from source, they’re in your build directory. If you’re using Melior only, you may not have them — Melior embeds the MLIR C library but doesn’t ship the command-line tools. In that case, use the Melior API to run the passes and translation in-process.

On Ubuntu 24.04+, you can install them from the LLVM project’s apt repository: apt install mlir-22-tools (adjust the version number to match your LLVM). On macOS, brew install llvm includes them as llvm/mlir-opt and llvm/mlir-translate. If neither option works, build from source — the MLIR Getting Started guide has step-by-step instructions.

# 1. Lower the Lox dialect MLIR to LLVM dialect MLIR
#    (same passes we run in the JIT: scf-to-cf, cf-to-llvm, arith-to-llvm, func-to-llvm)
mlir-opt --convert-scf-to-cf --convert-cf-to-llvm \
  --convert-arith-to-llvm --convert-func-to-llvm \
  output.mlir -o lowered.mlir

# 2. Translate LLVM dialect MLIR to LLVM IR
mlir-translate --mlir-to-llvmir lowered.mlir -o output.ll

# 3. Compile LLVM IR to an object file
llc -filetype=obj output.ll -o output.o

# 4. Link with the runtime
clang output.o liblox_runtime.a -o lox_program

Or, if you’re using Melior’s Rust API instead of command-line tools, the melior::dialect::llvm::translate_module function handles step 2 programmatically. Steps 3–4 stay the same — llc and the linker are external tools regardless of how you produce the LLVM IR.

Optimization passes. We run the standard conversion passes (scf-to-cf, cf-to-llvm, arith-to-llvm) but no optimization passes. MLIR has a rich set: common subexpression elimination, dead code elimination, loop invariant code motion, and more. Adding -O1 or -O2 level passes between the Lox dialect and the lowering pipeline would make the generated code significantly faster.

Type inference. Part 10 inserted runtime tag checks before every arithmetic operation — scf.if branches that call a type error function when the tag isn’t TAG_NUMBER. That’s a guard, not an analysis. The next step is a type inference pass that eliminates unnecessary checks: if a branch already tested the tag (e.g., if type(x) == "number"), the code inside that branch doesn’t need to check again. This is a flow-sensitive analysis — the type of each variable depends on the control-flow path that reached it. A simple version tracks tag sets per variable per block; a more sophisticated version uses abstract interpretation to infer types across loop boundaries.

Closures across modules. Part 5 showed closures within a single file. With cross-module linking, closures can capture variables defined in other modules. The capture analysis needs to work across the symbol table — when main.lox’s closure captures a variable from utils.lox, the code generator needs to resolve the variable’s location through the symbol table, not only the local scope. This is harder than function-level symbol resolution: captured variables need upvalue indices that are stable across modules, and the GC needs to know about cross-module references for rooting. A production implementation might use a module-level symbol table for functions but need a separate mechanism (e.g., cross-module upvalue indices) for captured variables.

Debugger integration. Part 10’s Approach 2 explains how debug info flows through the compilation pipeline — Location → DILocation → DWARF → addr2line. The next step is integrating with an actual debugger: producing DWARF that GDB or LLDB can consume, adding variable descriptions (DILocalVariable) so the debugger can show Lox variable names and values, and ensuring the debug info survives optimization passes (which can reorder or eliminate operations, invalidating stale location data). This turns the compiler from something that produces error messages into something you can step through interactively.

This is the final part of the MLIR for Lox tutorial series. The Review & Response section covers known limitations and potential improvements.

Keyboard shortcuts

MLIR for Lox: A Compiler Tutorial