REVIEW.md — MLIR for Lox Tutorial (Rust + Melior)
Reviewed by Esme, 2026-04-21
Overall
This tutorial covers ambitious ground: a full Lox compiler using MLIR via the Melior crate. The scope is impressive — AST, parsing, codegen, string constants, tagged unions, source locations, lowering — all in one document. However, this breadth comes at the cost of depth and correctness. The code examples have significant issues that would prevent them from compiling, and some MLIR concepts are presented in ways that could mislead. The strongest sections are the conceptual explanations (why MLIR, tagged unions, string constants); the weakest are the codegen examples.
Part 1: Setup
[error] The melior = "0.27" version should be verified. Melior's API changes frequently between minor versions, and the code examples use APIs that may not exist in 0.27. The arith::addf, arith::cmpf, func::func, etc. helper functions have changed signatures across versions.
[error] LLVM 22 is very recent. Most readers won't have it installed, and the Linux instructions ("You may need to build from source or use a PPA") are unhelpful. Consider targeting LLVM 18 or 19, which are more widely available, or provide actual build-from-source instructions.
[suggestion] The nom = "7.1" dependency is listed as optional but then never used. The parser in Part 3 is hand-written. Remove it or explain when it would be used.
Part 2: The Lox AST
[clarity] The AST definition is thorough and well-structured. The Location on every node is a good practice that pays off in Part 5. However, this is ~250 lines of struct definitions with no explanation of design choices. Why separate BinaryExpr, UnaryExpr, etc. instead of an enum with inline variants? (Answer: because each variant needs named fields and location tracking.) A brief note would help.
[suggestion] The LoxValue enum has String(String) — heap-allocated. For a compiler that's generating MLIR, you'd want to use interned strings or string references. This is fine for the AST level but worth a note.
Part 3: The Parser
[error] The parser references Token, TokenType, and ParseError types that are never defined. The Token struct and TokenType enum are needed for the parser to compile but aren't provided. A lexer is referenced (lexer::tokenize in Part 7) but never shown.
[error] The LoxValue::Number(f64) variant's as_number() and as_string() methods are used in primary() but never defined.
[clarity] The parser is a nearly line-for-line port of Crafting Interpreters' Java parser. This is fine for familiarity, but a note saying "This follows Crafting Interpreters Chapter 6 — we're not covering the scanner here" would set expectations.
[suggestion] The if_statement and while_statement methods wrap single statements in vec![] (then_branch: vec![self.statement()?]). This differs from Crafting Interpreters, which uses a single declaration. It's fine but means the then_branch and else_branch are always blocks, which affects codegen later.
Part 4: MLIR Code Generation (THE BIG ONE)
This is the core of the tutorial and has the most issues.
[error — CRITICAL] The CodeGenerator struct holds current_block: Option<Block<'c>> and attempts to append operations to blocks that are already inserted into regions. In Melior's ownership model, blocks are moved into regions via region.append_block(). After that, you can't hold a reference to the block separately — you access it through the region. The code does things like self.current_block = Some(block) and then later region.append_block(block) — but the block is already stored in self.current_block. This creates a double-use of the same moved value, which won't compile in Rust. The entire current_block pattern needs rethinking to work with Melior's ownership model.
[error — CRITICAL] The compile_function method does:
#![allow(unused)] fn main() { let block = Block::new(...); // ... add operations to block via self.current_block ... self.current_block = Some(block); // ... compile body ... region.append_block(block); // ERROR: block was already moved into self.current_block }
This won't compile. The block is moved into self.current_block, then the code tries to move it again into region.append_block().
[error] The compile_if method creates then_region and appends a block to it, then tries to use that block via self.current_block. But after then_region.append_block(then_block), the then_block is consumed and can't be referenced later. The pattern of "create block, insert into region, then reference via Option" doesn't work with Melior's move semantics.
[error] The compile_while has the same block-ownership issues as compile_if.
[error] The compile_logical method uses arith::andi and arith::ori for short-circuit logical operators. This is WRONG — bitwise AND/OR do NOT short-circuit. Lox's and and or must use scf.if for short-circuit evaluation. The code even has a comment saying "Logical operations short-circuit, so we need scf.if" but then uses bitwise ops instead!
[error] The compile_unary method for Not creates a constant true value and uses arith::xori. But the operand might be an i1 (boolean) or f64 — the types don't match. Lox's not operator needs to work on the tagged union, not on raw i1 values.
[error] The code generator treats all values as f64 but then introduces a tagged union !llvm.struct<(i8, i64)> in Part 4's types section. These two approaches are inconsistent. The codegen uses arith::addf (float add) but a dynamically-typed language needs type-tag checking before arithmetic. This is a fundamental architectural problem.
[clarity] The "String Constants" section is conceptually great — explaining that string literals go in the data section, not the heap. But the compile_string method in the generator just returns a placeholder i64 constant. The string constant infrastructure (create_string_constant) is never connected to the code generator.
[suggestion] The compile_print method emits a lox.print custom operation. This requires defining a Lox dialect, but no dialect definition is shown. In Melior, you'd need to either (a) define a custom dialect, or (b) use func.call to an external runtime function. The tutorial should pick one approach and show it.
Part 5: Source Locations
[error] The Location::new(context, filename, line, column) API shown doesn't match Melior 0.27's actual API. In Melior, file locations are typically created via Location::new(context, filename, line, column) or similar, but the exact constructor signature varies by version. Needs verification.
[error] Location::callsite and Location::fused may not exist in Melior 0.27's API. The demonstrate_locations function is pseudocode.
[clarity] The concept is well-explained: "Unlike LLVM where debug info is optional, in MLIR locations are core to the IR." This is a valuable insight that distinguishes MLIR from LLVM.
[suggestion] The "Before/After" IR comparison showing locations is excellent. More before/after IR examples throughout would help the reader see what they're building.
Part 6: Complete Example
[error] The example uses pass_manager.add_pass(pass::convert_scf_to_cf()) etc., but these pass functions aren't imported and may not exist in the Melior API with those exact names. The pass module path is wrong — in Melior, conversion passes are typically in melior::pass::convert::*.
[error] The compile_to_llvm function references lexer::tokenize which is never defined.
[error] The module.as_operation().verify() call returns a Result in Melior, not a bool. The assert!() usage is incorrect.
Part 7: Project Structure
[suggestion] The project structure shows runtime/print.c but no C runtime code is provided. For a dynamically-typed language, the runtime needs at least: print, type-tag checking, and string comparison functions. A minimal runtime would make the tutorial complete.
Cross-Cutting Issues
[error — CRITICAL] The fundamental problem: the codegen treats Lox as if it were statically typed with all-f64 values, but Lox is dynamically typed. The tagged union type (!llvm.struct<(i8, i64)>) is introduced but never used in codegen. Every arithmetic operation should check the type tag, but none do. This means the generated code would produce wrong results for any non-trivial Lox program. The tutorial needs to either:
- (a) Commit to the tagged union approach and show the tag-checking code (verbose but correct), or
- (b) Start with a "numbers only" subset of Lox and explicitly state the simplification, then add dynamic typing later
[clarity] The "Why MLIR for Lox?" section is compelling. The table comparing "What LLVM doesn't know" vs "What MLIR lets you do" is effective.
[clarity] The "Differences from C++ MLIR" table is helpful. The "No TableGen — Melior builds dialects directly in Rust" point is a key selling feature that deserves more emphasis.
[style] The tutorial is structured as a single long document with "Parts" as sections. This makes it hard to navigate. Consider splitting into separate files like the other tutorials.
[suggestion] The "Quick Reference: Lox → MLIR Mapping" table is excellent. More reference tables like this throughout.
[suggestion] No example shows the actual MLIR output of compiling a non-trivial Lox program. Showing the IR at each stage (Lox source → Lox dialect MLIR → Standard dialect MLIR → LLVM dialect MLIR) would make the lowering pipeline tangible.
What's Working
- The conceptual explanations are strong: why MLIR, tagged unions, string constants, source locations
- The AST is well-designed with proper location tracking
- The parser is a faithful port of Crafting Interpreters
- The before/after IR comparison in Part 5 is effective
- The quick reference tables are useful
- The C++ comparison table helps readers coming from the official MLIR tutorial
Priority Fixes
- Fix block ownership — the
current_block: Option<Block<'c>>pattern doesn't work with Melior's move semantics. This is the single biggest correctness issue. - Fix logical operators —
andi/oridon't short-circuit; usescf.if. - Commit to one typing strategy — either use tagged unions throughout or explicitly scope the tutorial to "numbers only" Lox.
- Provide missing definitions — Token, TokenType, lexer, LoxValue methods.
- Verify all Melior API calls against the actual 0.27 API.