MLIR for Lox: A Rust Guide (using Melior)
This guide shows how to build a Lox compiler using Rust and the Melior crate instead of C++. If you know Crafting Interpreters, this should feel familiar — just with MLIR instead of tree-walk interpretation.
Why MLIR for Lox?
LLVM is powerful but low-level. It doesn't know about:
- Variable scoping rules
- Closure captures
- Dynamic typing
- Lox-specific optimizations
MLIR lets you define a dialect that represents Lox semantics directly, then progressively lower it to LLVM IR. This is how modern languages like Swift, Rust, and Julia work.
Why Rust + Melior?
- Memory safety without garbage collection
- Pattern matching for clean AST traversal
- No TableGen — Melior builds dialects directly in Rust
- Excellent for compiler toolchains
Part 1: Setup
Dependencies
# Cargo.toml
[package]
name = "lox-mlir"
version = "0.1.0"
edition = "2021"
[dependencies]
melior = "0.27"
Install MLIR
# macOS
brew install llvm@22
# Linux (Ubuntu/Debian)
# You may need to build from source or use a PPA for LLVM 22
# See: https://apt.llvm.org/
# Add to your shell config:
export LLVM_SYS_220_PREFIX=/opt/homebrew/opt/llvm@22 # macOS
# or
export LLVM_SYS_220_PREFIX=/usr/lib/llvm-22 # Linux
Part 2: The Lox AST
Hand-written Rust — not generated. This is exactly what you'd write following Crafting Interpreters.
#![allow(unused)] fn main() { // src/ast.rs use std::fmt; /// Source location for error messages #[derive(Debug, Clone, Copy)] pub struct Location { pub line: usize, pub column: usize, } /// A Lox value (dynamically typed) #[derive(Debug, Clone)] pub enum LoxValue { Nil, Bool(bool), Number(f64), String(String), } // ======================================================================== // Expressions // ======================================================================== #[derive(Debug, Clone)] pub enum Expr { Binary(BinaryExpr), Unary(UnaryExpr), Literal(LiteralExpr), Grouping(GroupingExpr), Variable(VariableExpr), Assign(AssignExpr), Call(CallExpr), Logical(LogicalExpr), } impl Expr { pub fn location(&self) -> Location { match self { Expr::Binary(e) => e.location, Expr::Unary(e) => e.location, Expr::Literal(e) => e.location, Expr::Grouping(e) => e.location, Expr::Variable(e) => e.location, Expr::Assign(e) => e.location, Expr::Call(e) => e.location, Expr::Logical(e) => e.location, } } } #[derive(Debug, Clone)] pub struct BinaryExpr { pub location: Location, pub left: Box<Expr>, pub op: BinaryOp, pub right: Box<Expr>, } #[derive(Debug, Clone, Copy)] pub enum BinaryOp { Add, Sub, Mul, Div, Less, LessEqual, Greater, GreaterEqual, Equal, NotEqual, } #[derive(Debug, Clone)] pub struct UnaryExpr { pub location: Location, pub op: UnaryOp, pub right: Box<Expr>, } #[derive(Debug, Clone, Copy)] pub enum UnaryOp { Negate, Not, } #[derive(Debug, Clone)] pub struct LiteralExpr { pub location: Location, pub value: LoxValue, } #[derive(Debug, Clone)] pub struct GroupingExpr { pub location: Location, pub expr: Box<Expr>, } #[derive(Debug, Clone)] pub struct VariableExpr { pub location: Location, pub name: String, } #[derive(Debug, Clone)] pub struct AssignExpr { pub location: Location, pub name: String, pub value: Box<Expr>, } #[derive(Debug, Clone)] pub struct CallExpr { pub location: Location, pub callee: Box<Expr>, pub arguments: Vec<Expr>, } #[derive(Debug, Clone)] pub struct LogicalExpr { pub location: Location, pub left: Box<Expr>, pub op: LogicalOp, pub right: Box<Expr>, } #[derive(Debug, Clone, Copy)] pub enum LogicalOp { And, Or, } // ======================================================================== // Statements // ======================================================================== #[derive(Debug, Clone)] pub enum Stmt { Function(FunctionStmt), Return(ReturnStmt), Var(VarStmt), If(IfStmt), While(WhileStmt), Print(PrintStmt), Block(BlockStmt), Expression(ExpressionStmt), } #[derive(Debug, Clone)] pub struct FunctionStmt { pub location: Location, // Location of 'fun' keyword pub name: String, pub name_location: Location, // Location of the function name pub params: Vec<String>, pub param_locations: Vec<Location>, // Location of each parameter name pub body: Vec<Stmt>, } #[derive(Debug, Clone)] pub struct ReturnStmt { pub location: Location, // Location of 'return' keyword pub value: Option<Expr>, } #[derive(Debug, Clone)] pub struct VarStmt { pub location: Location, // Location of 'var' keyword pub name: String, pub name_location: Location, // Location of the variable name pub init: Expr, } #[derive(Debug, Clone)] pub struct IfStmt { pub location: Location, // Location of 'if' keyword pub condition: Expr, pub then_branch: Vec<Stmt>, pub else_branch: Vec<Stmt>, } #[derive(Debug, Clone)] pub struct WhileStmt { pub location: Location, // Location of 'while' keyword pub condition: Expr, pub body: Vec<Stmt>, } #[derive(Debug, Clone)] pub struct PrintStmt { pub location: Location, // Location of 'print' keyword pub value: Expr, } #[derive(Debug, Clone)] pub struct BlockStmt { pub location: Location, // Location of opening '{' pub statements: Vec<Stmt>, } #[derive(Debug, Clone)] pub struct ExpressionStmt { pub location: Location, // Start of the expression pub expr: Expr, } // ======================================================================== // Program // ======================================================================== /// A Lox program is a list of top-level statements #[derive(Debug, Clone)] pub struct Program { pub statements: Vec<Stmt>, } // ======================================================================== // Helper trait for getting locations // ======================================================================== impl Stmt { /// Get the primary location of this statement pub fn location(&self) -> Location { match self { Stmt::Function(f) => f.location, Stmt::Return(r) => r.location, Stmt::Var(v) => v.location, Stmt::If(i) => i.location, Stmt::While(w) => w.location, Stmt::Print(p) => p.location, Stmt::Block(b) => b.location, Stmt::Expression(e) => e.location, } } } }
Part 3: The Parser
The parser follows Crafting Interpreters Chapter 6 closely. It expects a stream of Token values from a scanner. Here are the token types — the scanner itself is a standard lexical analysis exercise (see Crafting Interpreters Chapter 4), and not the focus of this tutorial.
#![allow(unused)] fn main() { // src/lexer.rs use crate::ast::Location; /// A single token produced by the scanner #[derive(Debug, Clone)] pub struct Token { pub token_type: TokenType, pub lexeme: String, pub literal: Option<LexValue>, pub location: Location, } /// The category of a token #[derive(Debug, Clone, Copy, PartialEq)] pub enum TokenType { // Single-character tokens LeftParen, RightParen, LeftBrace, RightBrace, Comma, Dot, Minus, Plus, Semicolon, Slash, Star, // One or two character tokens Bang, BangEqual, Equal, EqualEqual, Greater, GreaterEqual, Less, LessEqual, // Literals Identifier, String, Number, // Keywords And, Class, Else, False, Fun, For, If, Nil, Or, Print, Return, Super, This, Var, While, // Special Eof, } /// A literal value from the source #[derive(Debug, Clone)] pub enum LexValue { Boolean(bool), F64(f64), Str(String), } impl LexValue { pub fn as_number(&self) -> f64 { match self { LexValue::F64(n) => *n, _ => 0.0, } } pub fn as_string(&self) -> String { match self { LexValue::Str(s) => s.clone(), _ => String::new(), } } } impl TokenType { pub fn name(&self) -> &'static str { match self { TokenType::LeftParen => "(", TokenType::RightParen => ")", TokenType::LeftBrace => "{", TokenType::RightBrace => "}", TokenType::Comma => ",", TokenType::Dot => ".", TokenType::Minus => "-", TokenType::Plus => "+", TokenType::Semicolon => ";", TokenType::Slash => "/", TokenType::Star => "*", TokenType::Bang => "!", TokenType::BangEqual => "!=", TokenType::Equal => "=", TokenType::EqualEqual => "==", TokenType::Greater => ">", TokenType::GreaterEqual => ">=", TokenType::Less => "<", TokenType::LessEqual => "<=", TokenType::Identifier => "identifier", TokenType::String => "string", TokenType::Number => "number", TokenType::And => "and", TokenType::Class => "class", TokenType::Else => "else", TokenType::False => "false", TokenType::Fun => "fun", TokenType::For => "for", TokenType::If => "if", TokenType::Nil => "nil", TokenType::Or => "or", TokenType::Print => "print", TokenType::Return => "return", TokenType::Super => "super", TokenType::This => "this", TokenType::Var => "var", TokenType::While => "while", TokenType::Eof => "eof", } } } }
Now the parser, adapted to produce the AST from Part 2:
#![allow(unused)] fn main() { // src/parser.rs use crate::ast::*; use crate::lexer::{Token, TokenType, LexValue}; #[derive(Debug)] pub struct ParseError { pub message: String, pub location: Location, } impl ParseError { pub fn new(message: &str, location: Location) -> Self { Self { message: message.to_string(), location } } } pub struct Parser { tokens: Vec<Token>, current: usize, } impl Parser { pub fn new(tokens: Vec<Token>) -> Self { Self { tokens, current: 0 } } pub fn parse(&mut self) -> Result<Program, ParseError> { let mut statements = Vec::new(); while !self.is_at_end() { statements.push(self.declaration()?); } Ok(Program { statements }) } // ======================================================================== // Statement parsing // ======================================================================== fn declaration(&mut self) -> Result<Stmt, ParseError> { if self.match_token(TokenType::Fun) { return self.function_declaration(); } if self.match_token(TokenType::Var) { return self.var_declaration(); } self.statement() } fn function_declaration(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'fun' let name = self.consume(TokenType::Identifier, "Expect function name.")?; let name_location = name.location; let name_str = name.lexeme.clone(); self.consume(TokenType::LeftParen, "Expect '(' after function name.")?; let mut params = Vec::new(); let mut param_locations = Vec::new(); if !self.check(TokenType::RightParen) { loop { let param = self.consume(TokenType::Identifier, "Expect parameter name.")?; params.push(param.lexeme.clone()); param_locations.push(param.location); if !self.match_token(TokenType::Comma) { break; } } } self.consume(TokenType::RightParen, "Expect ')' after parameters.")?; self.consume(TokenType::LeftBrace, "Expect '{' before function body.")?; let body = self.block()?; Ok(Stmt::Function(FunctionStmt { location, name: name_str, name_location, params, param_locations, body })) } fn var_declaration(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'var' let name = self.consume(TokenType::Identifier, "Expect variable name.")?; let name_location = name.location; let name_str = name.lexeme.clone(); let init = if self.match_token(TokenType::Equal) { self.expression()? } else { Expr::Literal(LiteralExpr { location, value: LoxValue::Nil }) }; self.consume(TokenType::Semicolon, "Expect ';' after variable declaration.")?; Ok(Stmt::Var(VarStmt { location, name: name_str, name_location, init })) } fn statement(&mut self) -> Result<Stmt, ParseError> { if self.match_token(TokenType::Print) { return self.print_statement(); } if self.match_token(TokenType::If) { return self.if_statement(); } if self.match_token(TokenType::While) { return self.while_statement(); } if self.match_token(TokenType::Return) { return self.return_statement(); } if self.match_token(TokenType::LeftBrace) { let location = self.previous().location; return Ok(Stmt::Block(BlockStmt { location, statements: self.block()? })); } self.expression_statement() } fn print_statement(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'print' let value = self.expression()?; self.consume(TokenType::Semicolon, "Expect ';' after value.")?; Ok(Stmt::Print(PrintStmt { location, value })) } fn if_statement(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'if' self.consume(TokenType::LeftParen, "Expect '(' after 'if'.")?; let condition = self.expression()?; self.consume(TokenType::RightParen, "Expect ')' after if condition.")?; let then_branch = vec![self.statement()?]; let else_branch = if self.match_token(TokenType::Else) { vec![self.statement()?] } else { Vec::new() }; Ok(Stmt::If(IfStmt { location, condition, then_branch, else_branch })) } fn while_statement(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'while' self.consume(TokenType::LeftParen, "Expect '(' after 'while'.")?; let condition = self.expression()?; self.consume(TokenType::RightParen, "Expect ')' after while condition.")?; let body = vec![self.statement()?]; Ok(Stmt::While(WhileStmt { location, condition, body })) } fn return_statement(&mut self) -> Result<Stmt, ParseError> { let location = self.previous().location; // Location of 'return' let value = if !self.check(TokenType::Semicolon) { Some(self.expression()?) } else { None }; self.consume(TokenType::Semicolon, "Expect ';' after return value.")?; Ok(Stmt::Return(ReturnStmt { location, value })) } fn expression_statement(&mut self) -> Result<Stmt, ParseError> { let expr = self.expression()?; let location = expr.location(); // Start of the expression self.consume(TokenType::Semicolon, "Expect ';' after expression.")?; Ok(Stmt::Expression(ExpressionStmt { location, expr })) } fn block(&mut self) -> Result<Vec<Stmt>, ParseError> { let mut statements = Vec::new(); while !self.check(TokenType::RightBrace) && !self.is_at_end() { statements.push(self.declaration()?); } self.consume(TokenType::RightBrace, "Expect '}' after block.")?; Ok(statements) } // ======================================================================== // Expression parsing (precedence climbing) // ======================================================================== fn expression(&mut self) -> Result<Expr, ParseError> { self.assignment() } fn assignment(&mut self) -> Result<Expr, ParseError> { let expr = self.or_expr()?; if self.match_token(TokenType::Equal) { let location = self.previous().location; let value = self.assignment()?; if let Expr::Variable(var) = expr { return Ok(Expr::Assign(AssignExpr { location, name: var.name, value: Box::new(value), })); } return Err(ParseError::new("Invalid assignment target.", location)); } Ok(expr) } fn or_expr(&mut self) -> Result<Expr, ParseError> { let mut expr = self.and_expr()?; while self.match_token(TokenType::Or) { let location = self.previous().location; let right = self.and_expr()?; expr = Expr::Logical(LogicalExpr { location, left: Box::new(expr), op: LogicalOp::Or, right: Box::new(right), }); } Ok(expr) } fn and_expr(&mut self) -> Result<Expr, ParseError> { let mut expr = self.equality()?; while self.match_token(TokenType::And) { let location = self.previous().location; let right = self.equality()?; expr = Expr::Logical(LogicalExpr { location, left: Box::new(expr), op: LogicalOp::And, right: Box::new(right), }); } Ok(expr) } fn equality(&mut self) -> Result<Expr, ParseError> { let mut expr = self.comparison()?; while self.match_any(&[TokenType::EqualEqual, TokenType::BangEqual]) { let location = self.previous().location; let op = match self.previous().token_type { TokenType::EqualEqual => BinaryOp::Equal, TokenType::BangEqual => BinaryOp::NotEqual, _ => unreachable!(), }; let right = self.comparison()?; expr = Expr::Binary(BinaryExpr { location, left: Box::new(expr), op, right: Box::new(right), }); } Ok(expr) } fn comparison(&mut self) -> Result<Expr, ParseError> { let mut expr = self.term()?; while self.match_any(&[ TokenType::Greater, TokenType::GreaterEqual, TokenType::Less, TokenType::LessEqual, ]) { let location = self.previous().location; let op = match self.previous().token_type { TokenType::Greater => BinaryOp::Greater, TokenType::GreaterEqual => BinaryOp::GreaterEqual, TokenType::Less => BinaryOp::Less, TokenType::LessEqual => BinaryOp::LessEqual, _ => unreachable!(), }; let right = self.term()?; expr = Expr::Binary(BinaryExpr { location, left: Box::new(expr), op, right: Box::new(right), }); } Ok(expr) } fn term(&mut self) -> Result<Expr, ParseError> { let mut expr = self.factor()?; while self.match_any(&[TokenType::Plus, TokenType::Minus]) { let location = self.previous().location; let op = match self.previous().token_type { TokenType::Plus => BinaryOp::Add, TokenType::Minus => BinaryOp::Sub, _ => unreachable!(), }; let right = self.factor()?; expr = Expr::Binary(BinaryExpr { location, left: Box::new(expr), op, right: Box::new(right), }); } Ok(expr) } fn factor(&mut self) -> Result<Expr, ParseError> { let mut expr = self.unary()?; while self.match_any(&[TokenType::Star, TokenType::Slash]) { let location = self.previous().location; let op = match self.previous().token_type { TokenType::Star => BinaryOp::Mul, TokenType::Slash => BinaryOp::Div, _ => unreachable!(), }; let right = self.unary()?; expr = Expr::Binary(BinaryExpr { location, left: Box::new(expr), op, right: Box::new(right), }); } Ok(expr) } fn unary(&mut self) -> Result<Expr, ParseError> { if self.match_any(&[TokenType::Bang, TokenType::Minus]) { let location = self.previous().location; let op = match self.previous().token_type { TokenType::Bang => UnaryOp::Not, TokenType::Minus => UnaryOp::Negate, _ => unreachable!(), }; let right = Box::new(self.unary()?); return Ok(Expr::Unary(UnaryExpr { location, op, right })); } self.primary() } fn primary(&mut self) -> Result<Expr, ParseError> { let token = self.peek(); let location = token.location; if self.match_token(TokenType::False) { return Ok(Expr::Literal(LiteralExpr { location, value: LoxValue::Bool(false) })); } if self.match_token(TokenType::True) { return Ok(Expr::Literal(LiteralExpr { location, value: LoxValue::Bool(true) })); } if self.match_token(TokenType::Nil) { return Ok(Expr::Literal(LiteralExpr { location, value: LoxValue::Nil })); } if self.match_token(TokenType::Number) { let value = token.literal.as_ref().unwrap().as_number(); return Ok(Expr::Literal(LiteralExpr { location, value: LoxValue::Number(value) })); } if self.match_token(TokenType::String) { let value = token.literal.as_ref().unwrap().as_string(); return Ok(Expr::Literal(LiteralExpr { location, value: LoxValue::String(value) })); } if self.match_token(TokenType::Identifier) { return Ok(Expr::Variable(VariableExpr { location, name: token.lexeme.clone() })); } if self.match_token(TokenType::LeftParen) { let expr = self.expression()?; self.consume(TokenType::RightParen, "Expect ')' after expression.")?; return Ok(Expr::Grouping(GroupingExpr { location, expr: Box::new(expr), })); } Err(ParseError::new("Expect expression.", location)) } // ======================================================================== // Helper methods // ======================================================================== fn match_token(&mut self, token_type: TokenType) -> bool { if self.check(token_type) { self.advance(); return true; } false } fn match_any(&mut self, types: &[TokenType]) -> bool { for t in types { if self.check(*t) { self.advance(); return true; } } false } fn check(&self, token_type: TokenType) -> bool { !self.is_at_end() && self.peek().token_type == token_type } fn advance(&mut self) -> Token { if !self.is_at_end() { self.current += 1; } self.previous() } fn is_at_end(&self) -> bool { self.peek().token_type == TokenType::Eof } fn peek(&self) -> Token { self.tokens[self.current].clone() } fn previous(&self) -> Token { self.tokens[self.current - 1].clone() } fn consume(&mut self, token_type: TokenType, message: &str) -> Result<Token, ParseError> { if self.check(token_type) { return Ok(self.advance()); } Err(ParseError::new(message, self.peek().location)) } } #[derive(Debug)] pub struct ParseError { pub message: String, pub location: Location, } impl ParseError { pub fn new(message: &str, location: Location) -> Self { Self { message: message.to_string(), location } } } }
Part 4: MLIR Code Generation with Melior
The core of the compiler — walks the AST and emits MLIR.
Note on the codegen model: The codegen uses a simplified
current_block: Option<Block>pattern for exposition. In production Melior code, blocks are created and immediately appended to regions (the "build then insert" pattern). Thecurrent_blockownership issue — where a block is moved into a region and can't be accessed again — is real, and production code avoids it by building regions first, then appending. For this tutorial, the simplified pattern helps you follow the logic without getting lost in ownership boilerplate. See the review notes for the production approach.
Scope note: In this part, we compile a subset of Lox that only supports numbers and arithmetic. This isn't a limitation of MLIR — it's a pedagogical choice. Dynamic typing with tagged unions adds 3-4x more code for every operation (check the tag, dispatch, unbox, compute, re-box). We'll cover dynamic typing in the "Tagged Unions" section below. For now, every value is an
f64.What this means in practice:
- All values are
f64—trueis1.0,falseis0.0,nilis0.0- Arithmetic operations use
arith.addf,arith.mulf, etc.- Comparisons use
arith.cmpf- Logical operators use
scf.iffor short-circuit evaluationlox.printcalls a runtime function that prints a floatThis is a simplification. A production Lox compiler would use tagged unions. But starting with "everything is a float" lets us focus on MLIR concepts without drowning in type-tag boilerplate.
Basic Code Generator
#![allow(unused)] fn main() { // src/codegen/mod.rs mod generator; pub use generator::generate_module; }
#![allow(unused)] fn main() { // src/codegen/generator.rs use crate::ast::*; use melior::{ Context, dialect::{arith, func, scf, DialectRegistry}, ir::{ attribute::{StringAttribute, TypeAttribute, FloatAttribute, IntegerAttribute}, r#type::FunctionType, Location, Module, Region, Block, Type, Value, operation::OperationBuilder, }, utility::register_all_dialects, }; /// State for code generation pub struct CodeGenerator<'c> { context: &'c Context, module: Module<'c>, current_block: Option<Block<'c>>, // Variable storage: maps variable names to their SSA values variables: std::collections::HashMap<String, Value<'c>>, } impl<'c> CodeGenerator<'c> { pub fn new(context: &'c Context) -> Self { let location = Location::unknown(context); let module = Module::new(location); Self { context, module, current_block: None, variables: std::collections::HashMap::new(), } } // ======================================================================== // Entry point // ======================================================================== pub fn generate(mut self, program: &Program) -> Module<'c> { for stmt in &program.statements { self.compile_statement(stmt); } self.module } // ======================================================================== // Statement compilation // ======================================================================== fn compile_statement(&mut self, stmt: &Stmt) { match stmt { Stmt::Function(f) => self.compile_function(f), Stmt::Return(r) => self.compile_return(r), Stmt::Var(v) => self.compile_var(v), Stmt::If(i) => self.compile_if(i), Stmt::While(w) => self.compile_while(w), Stmt::Print(p) => self.compile_print(p), Stmt::Block(b) => self.compile_block(b), Stmt::Expression(e) => { self.compile_expression(&e.expr); } } } fn compile_function(&mut self, func: &FunctionStmt) { let location = Location::unknown(self.context); let float_type = Type::float64(self.context); // Create parameter types (all f64 for now - dynamic typing) let param_types: Vec<Type> = func.params.iter().map(|_| float_type).collect(); let return_type = float_type; // Create the function type let function_type = FunctionType::new(self.context, ¶m_types, &[return_type]); // Create the function body region let region = Region::new(); let block = Block::new( ¶m_types.iter().map(|&t| (t, location)).collect::<Vec<_>>() ); // Store parameters as variables for (i, param_name) in func.params.iter().enumerate() { let arg = block.argument(i).unwrap(); self.variables.insert(param_name.clone(), arg.into()); } // Set current block for body compilation self.current_block = Some(block); // Compile the function body for stmt in &func.body { self.compile_statement(stmt); } // Add implicit return nil if no return at end let nil_value = self.compile_nil(); if let Some(block) = &self.current_block { block.append_operation(func::r#return(&[nil_value], location)); } // Append block to region if let Some(block) = self.current_block.take() { region.append_block(block); } // Add function to module self.module.body().append_operation(func::func( self.context, StringAttribute::new(self.context, &func.name), TypeAttribute::new(function_type.into()), region, &[], location, )); // Clear variables after function self.variables.clear(); } fn compile_return(&mut self, ret: &ReturnStmt) { let location = Location::unknown(self.context); let value = match &ret.value { Some(expr) => self.compile_expression(expr), None => self.compile_nil(), }; if let Some(block) = &self.current_block { block.append_operation(func::r#return(&[value], location)); } } fn compile_var(&mut self, var: &VarStmt) { let value = self.compile_expression(&var.init); self.variables.insert(var.name.clone(), value); } fn compile_if(&mut self, if_stmt: &IfStmt) { let location = Location::unknown(self.context); let condition = self.compile_expression(&if_stmt.condition); // Create scf.if operation let then_region = Region::new(); let then_block = then_region.append_block(Block::new(&[])); // Compile then branch let prev_block = self.current_block.replace(then_block); for stmt in &if_stmt.then_branch { self.compile_statement(stmt); } // Handle else branch let else_region = if !if_stmt.else_branch.is_empty() { let else_region = Region::new(); let else_block = else_region.append_block(Block::new(&[])); self.current_block = Some(else_block); for stmt in &if_stmt.else_branch { self.compile_statement(stmt); } // Restore block reference (Melior ownership is tricky here) // In a real impl, we'd need to handle this more carefully Some(else_region) } else { None }; self.current_block = prev_block; // Append scf.if to current block // Note: MLIR scf.if requires BOTH then and else regions. // An empty else region is used for single-branch if statements. if let Some(block) = &self.current_block { let else_region = Region::new(); let if_op = OperationBuilder::new("scf.if", location) .add_operand(condition) .add_region(then_region) .add_region(else_region) .build() .unwrap(); block.append_operation(if_op); } } fn compile_while(&mut self, while_stmt: &WhileStmt) { let location = Location::unknown(self.context); // For scf.while, we need: // 1. A "before" region that computes the condition // 2. An "after" region that is the loop body let before_region = Region::new(); let before_block = before_region.append_block(Block::new(&[])); // Compile condition in before block let prev_block = self.current_block.replace(before_block); let condition = self.compile_expression(&while_stmt.condition); // Add scf.condition let condition_op = OperationBuilder::new("scf.condition", location) .add_operand(condition) .build() .unwrap(); before_block.append_operation(condition_op); // Create after region (loop body) let after_region = Region::new(); let after_block = after_region.append_block(Block::new(&[])); self.current_block = Some(after_block); for stmt in &while_stmt.body { self.compile_statement(stmt); } // Add scf.yield let yield_op = OperationBuilder::new("scf.yield", location) .build() .unwrap(); if let Some(block) = &self.current_block { block.append_operation(yield_op); } self.current_block = prev_block; // Create scf.while if let Some(block) = &self.current_block { let while_op = OperationBuilder::new("scf.while", location) .add_region(before_region) .add_region(after_region) .build() .unwrap(); block.append_operation(while_op); } } fn compile_print(&mut self, print: &PrintStmt) { let location = Location::unknown(self.context); let value = self.compile_expression(&print.value); // Create a call to a runtime print function // For simplicity, we'll use a placeholder operation let print_op = OperationBuilder::new("lox.print", location) .add_operand(value) .build() .unwrap(); if let Some(block) = &self.current_block { block.append_operation(print_op); } } fn compile_block(&mut self, block: &BlockStmt) { for stmt in &block.statements { self.compile_statement(stmt); } } // ======================================================================== // Expression compilation // ======================================================================== fn compile_expression(&mut self, expr: &Expr) -> Value<'c> { let location = Location::unknown(self.context); let op = match expr { Expr::Binary(b) => self.compile_binary(b), Expr::Unary(u) => self.compile_unary(u), Expr::Literal(l) => return self.compile_literal(l), Expr::Grouping(g) => return self.compile_expression(&g.expr), Expr::Variable(v) => return self.compile_variable(v), Expr::Assign(a) => return self.compile_assign(a), Expr::Call(c) => self.compile_call(c), Expr::Logical(l) => self.compile_logical(l), }; // Get the result from the operation op.result(0).unwrap().into() } fn compile_binary(&mut self, binary: &BinaryExpr) -> melior::ir::Operation<'c> { let location = Location::unknown(self.context); let lhs = self.compile_expression(&binary.left); let rhs = self.compile_expression(&binary.right); let op = match binary.op { BinaryOp::Add => arith::addf(lhs, rhs, location), BinaryOp::Sub => arith::subf(lhs, rhs, location), BinaryOp::Mul => arith::mulf(lhs, rhs, location), BinaryOp::Div => arith::divf(lhs, rhs, location), BinaryOp::Less => arith::cmpf(self.context, arith::CmpfPredicate::Olt, lhs, rhs, location), BinaryOp::LessEqual => arith::cmpf(self.context, arith::CmpfPredicate::Ole, lhs, rhs, location), BinaryOp::Greater => arith::cmpf(self.context, arith::CmpfPredicate::Ogt, lhs, rhs, location), BinaryOp::GreaterEqual => arith::cmpf(self.context, arith::CmpfPredicate::Oge, lhs, rhs, location), BinaryOp::Equal => arith::cmpf(self.context, arith::CmpfPredicate::Oeq, lhs, rhs, location), BinaryOp::NotEqual => arith::cmpf(self.context, arith::CmpfPredicate::Une, lhs, rhs, location), }; // Append to current block if let Some(block) = &self.current_block { block.append_operation(op.clone()); } op } fn compile_unary(&mut self, unary: &UnaryExpr) -> melior::ir::Operation<'c> { let location = Location::unknown(self.context); let operand = self.compile_expression(&unary.right); match unary.op { UnaryOp::Negate => { let op = arith::negf(operand, location); if let Some(block) = &self.current_block { block.append_operation(op.clone()); } op } UnaryOp::Not => { // In our "numbers only" model, `not x` is: // if x == 0.0 { 1.0 } else { 0.0 } // We use scf.if since there's no direct float negation of boolean sense. let zero_const = arith::constant( self.context, FloatAttribute::new(self.context, 0.0, Type::float64(self.context)).into(), Location::unknown(self.context), ); // Build then region (x == 0.0 → result = 1.0) let then_region = { let r = Region::new(); let b = r.append_block(Block::new(&[])); let one = arith::constant( self.context, FloatAttribute::new(self.context, 1.0, Type::float64(self.context)).into(), Location::unknown(self.context), ); b.append_operation(one); r }; // Build else region (x != 0.0 → result = 0.0) let else_region = { let r = Region::new(); let b = r.append_block(Block::new(&[])); let zero = arith::constant( self.context, FloatAttribute::new(self.context, 0.0, Type::float64(self.context)).into(), Location::unknown(self.context), ); b.append_operation(zero); r }; // Compare operand to 0.0 if let Some(block) = &self.current_block { block.append_operation(zero_const.clone()); let is_zero = arith::cmpf( self.context, arith::CmpfPredicate::Oeq, operand, zero_const.result(0).unwrap().into(), Location::unknown(self.context), ); block.append_operation(is_zero); // scf.if(is_zero) { then_region } { else_region } -> f64 let if_op = OperationBuilder::new("scf.if", location) .add_operand(is_zero.result(0).unwrap().into()) .add_result(Type::float64(self.context)) .add_region(then_region) .add_region(else_region) .build() .unwrap(); block.append_operation(if_op); if_op } else { // Fallback if no current block (shouldn't happen in practice) zero_const } } } } fn compile_literal(&mut self, literal: &LiteralExpr) -> Value<'c> { let location = Location::unknown(self.context); let op = match &literal.value { LoxValue::Nil => { return self.compile_nil(); } LoxValue::Bool(b) => { // In our "numbers only" model, booleans are 1.0 and 0.0 arith::constant( self.context, FloatAttribute::new(self.context, if *b { 1.0 } else { 0.0 }, Type::float64(self.context)).into(), location, ) } LoxValue::Number(n) => { arith::constant( self.context, FloatAttribute::new(self.context, *n, Type::float64(self.context)).into(), location, ) } LoxValue::String(s) => { // String constants are global - no heap allocation! // See the String Constants section below return self.compile_string(s, location); } }; if let Some(block) = &self.current_block { block.append_operation(op.clone()); } op.result(0).unwrap().into() } fn compile_nil(&mut self) -> Value<'c> { // In our "numbers only" subset, nil is represented as 0.0 f64. // This is consistent with the simplified typing model. // A full implementation would use tagged unions. let location = Location::unknown(self.context); let op = arith::constant( self.context, FloatAttribute::new(self.context, 0.0, Type::float64(self.context)).into(), location, ); if let Some(block) = &self.current_block { block.append_operation(op.clone()); } op.result(0).unwrap().into() } fn compile_variable(&mut self, var: &VariableExpr) -> Value<'c> { // Look up the variable in the current scope self.variables.get(&var.name) .copied() .unwrap_or_else(|| self.compile_nil()) } fn compile_assign(&mut self, assign: &AssignExpr) -> Value<'c> { let value = self.compile_expression(&assign.value); self.variables.insert(assign.name.clone(), value); value } fn compile_call(&mut self, call: &CallExpr) -> melior::ir::Operation<'c> { let location = Location::unknown(self.context); // Compile arguments let args: Vec<Value> = call.arguments.iter() .map(|arg| self.compile_expression(arg)) .collect(); // For now, assume callee is a direct function call if let Expr::Variable(var) = call.callee.as_ref() { let call_op = func::call( self.context, melior::ir::attribute::FlatSymbolRefAttribute::new(self.context, &var.name), &args, &[Type::float64(self.context)], location, ); if let Some(block) = &self.current_block { block.append_operation(call_op.clone()); } return call_op; } // Indirect call (first-class function) - not implemented unimplemented!("Indirect function calls not yet supported") } fn compile_logical(&mut self, logical: &LogicalExpr) -> melior::ir::Operation<'c> { let location = Location::unknown(self.context); // Logical operations short-circuit in Lox, so we MUST use scf.if. // Using arith::andi/ori would be WRONG — those are bitwise, not short-circuit. // // `a and b` → if a { b } else { false } // `a or b` → if a { true } else { b } let left = self.compile_expression(&logical.left); // Convert left to i1 for the condition (nonzero = true) let zero = arith::constant( self.context, FloatAttribute::new(self.context, 0.0, Type::float64(self.context)).into(), location, ); if let Some(block) = &self.current_block { block.append_operation(zero.clone()); } let cond = arith::cmpf( self.context, arith::CmpfPredicate::One, // ordered not-equal (nonzero = true) left, zero.result(0).unwrap().into(), location, ); if let Some(block) = &self.current_block { block.append_operation(cond.clone()); } match logical.op { LogicalOp::And => { // if left { right } else { 0.0 } let then_block = Block::new(&[]); let else_block = Block::new(&[]); let prev = self.current_block.replace(then_block); let right = self.compile_expression(&logical.right); let then_block = self.current_block.take().unwrap(); self.current_block = Some(else_block); let false_val = arith::constant( self.context, FloatAttribute::new(self.context, 0.0, Type::float64(self.context)).into(), location, ); if let Some(block) = &self.current_block { block.append_operation(false_val.clone()); } let else_block = self.current_block.take().unwrap(); self.current_block = prev; let if_op = OperationBuilder::new("scf.if", location) .add_operand(cond.result(0).unwrap().into()) .add_result(Type::float64(self.context)) .add_region({ let mut region = Region::new(); region.append_block(then_block); region }) .add_region({ let mut region = Region::new(); region.append_block(else_block); region }) .build()?; if let Some(block) = &self.current_block { block.append_operation(if_op.clone()); } if_op } LogicalOp::Or => { // if left { 1.0 } else { right } let then_block = Block::new(&[]); let else_block = Block::new(&[]); let prev = self.current_block.replace(then_block); let true_val = arith::constant( self.context, FloatAttribute::new(self.context, 1.0, Type::float64(self.context)).into(), location, ); if let Some(block) = &self.current_block { block.append_operation(true_val.clone()); } let then_block = self.current_block.take().unwrap(); self.current_block = Some(else_block); let right = self.compile_expression(&logical.right); let else_block = self.current_block.take().unwrap(); self.current_block = prev; let if_op = OperationBuilder::new("scf.if", location) .add_operand(cond.result(0).unwrap().into()) .add_result(Type::float64(self.context)) .add_region({ let mut region = Region::new(); region.append_block(then_block); region }) .add_region({ let mut region = Region::new(); region.append_block(else_block); region }) .build()?; if let Some(block) = &self.current_block { block.append_operation(if_op.clone()); } if_op } } } /// Compile a string literal to a global constant fn compile_string(&mut self, value: &str, location: Location<'c>) -> Value<'c> { // Placeholder - see String Constants section below for full implementation let op = arith::constant( self.context, IntegerAttribute::new(0, Type::integer(self.context, 64)).into(), location, ); if let Some(block) = &self.current_block { block.append_operation(op.clone()); } op.result(0).unwrap().into() } } /// Main entry point for code generation pub fn generate_module(context: &Context, program: &Program) -> Module { let generator = CodeGenerator::new(context); generator.generate(program) } }
String Constants (No Allocation Needed!)
String literals are constants, not heap allocations. They live in the binary's data section, just like in C.
#![allow(unused)] fn main() { // src/codegen/strings.rs use melior::{ Context, Location, dialect::llvm, ir::{ attribute::StringAttribute, Type, Module, operation::OperationBuilder, }, }; /// Create a global string constant in the LLVM dialect /// /// This creates something like: /// llvm.mlir.global constant @str_0("hello") /// /// No heap allocation - the string lives in the data section. pub fn create_string_constant( module: &Module, context: &Context, name: &str, // e.g., "str_0" value: &str, location: Location, ) { module.body().append_operation( llvm::r#const( context, name, Type::parse(context, &format!("!llvm.array<{} x i8>", value.len())).unwrap(), StringAttribute::new(context, value), location, ) ); } }
Why This Works
| Approach | Memory Location | Allocation? |
|---|---|---|
llvm.mlir.global constant | Data section | No (static) |
| Heap allocation (malloc) | Heap | Yes (runtime) |
| Stack allocation (alloca) | Stack | No, but per-call |
String literals are static data — they exist for the lifetime of the program, embedded in the binary. No runtime cost.
Dynamic Typing with Tagged Unions
Lox is dynamically typed, so a function parameter can receive any type:
fun printValue(x) {
print x; // x could be number, string, bool, nil, or object
}
We need a tagged union type:
#![allow(unused)] fn main() { // src/codegen/types.rs use melior::ir::Type; use melior::Context; /// A Lox value is a tagged union: struct { tag: i8, data: i64 } pub fn lox_value_type(context: &Context) -> Type { Type::parse(context, "!llvm.struct<(i8, i64)>").unwrap() } /// Tag values for each Lox type pub const TAG_NIL: i8 = 0; pub const TAG_BOOL: i8 = 1; pub const TAG_NUMBER: i8 = 2; pub const TAG_STRING: i8 = 3; pub const TAG_OBJECT: i8 = 4; pub const TAG_CLOSURE: i8 = 5; }
Part 5: Source Locations in MLIR
Every operation in MLIR has a source location. Unlike LLVM where debug info is optional, in MLIR locations are core to the IR.
The Location API
MLIR operations carry source locations for error messages and debug output. Melior provides several ways to create them:
#![allow(unused)] fn main() { // src/location.rs use melior::ir::Location; use melior::Context; pub fn demonstrate_locations(context: &Context) { // Unknown location — for generated code with no source mapping let unknown = Location::unknown(context); // File location — specific file, line, column let file_loc = Location::new(context, "test.lox", 10, 5); // Name location — for generated code, use a descriptive name let name_loc = Location::name(context, "implicit_return"); } }
Updated Code Generator with Proper Locations
#![allow(unused)] fn main() { pub struct CodeGenerator<'c> { context: &'c Context, module: Module<'c>, current_block: Option<Block<'c>>, variables: std::collections::HashMap<String, Value<'c>>, filename: String, } impl<'c> CodeGenerator<'c> { pub fn new(context: &'c Context, filename: &str) -> Self { let location = Location::new(context, filename, 1, 1); let module = Module::new(location); Self { context, module, current_block: None, variables: std::collections::HashMap::new(), filename: filename.to_string(), } } /// Convert an AST location to an MLIR location fn loc(&self, ast_loc: crate::ast::Location) -> Location<'c> { Location::new(self.context, &self.filename, ast_loc.line, ast_loc.column) } /// Get a location for generated/implicit code fn generated_loc(&self, description: &str) -> Location<'c> { Location::name(self.context, description) } } }
What the IR Looks Like With Locations
Before (using Location::unknown):
module {
func.func @add(%arg0: f64, %arg1: f64) -> f64 {
%0 = arith.addf %arg0, %arg1 : f64
return %0 : f64
}
}
After (with proper locations, shown with -mlir-print-debuginfo):
module {
func.func @add(%arg0: f64, %arg1: f64) -> f64
loc("test.lox":1:1)
{
%0 = arith.addf %arg0, %arg1 : f64
loc("test.lox":2:14)
return %0 : f64 loc("test.lox":2:3)
} loc("test.lox":1:1)
} loc("test.lox":1:1)
Part 6: A Complete Example
// examples/simple_add.rs use melior::{ Context, dialect::{arith, func, DialectRegistry}, ir::{ attribute::{StringAttribute, TypeAttribute, FloatAttribute}, r#type::FunctionType, Location, Module, Region, Block, Type, }, utility::register_all_dialects, }; fn main() -> Result<(), Box<dyn std::error::Error>> { let registry = DialectRegistry::new(); register_all_dialects(®istry); let context = Context::new(); context.append_dialect_registry(®istry); context.load_all_available_dialects(); let location = Location::unknown(&context); let module = Module::new(location); // Create function type: (f64, f64) -> f64 let float_type = Type::float64(&context); let function_type = FunctionType::new(&context, &[float_type, float_type], &[float_type]); // Create function body let region = Region::new(); let block = region.append_block(Block::new(&[ (float_type, location), (float_type, location), ])); // %sum = arith.addf %arg0, %arg1 : f64 let sum = block.append_operation(arith::addf( block.argument(0).unwrap().into(), block.argument(1).unwrap().into(), location, )); // return %sum : f64 block.append_operation(func::r#return( &[sum.result(0).unwrap().into()], location, )); // Create the function module.body().append_operation(func::func( &context, StringAttribute::new(&context, "add"), TypeAttribute::new(function_type.into()), region, &[], location, )); // Verify and print assert!(module.as_operation().verify()); println!("{}", module.as_operation()); Ok(()) }
Output:
module {
func.func @add(%arg0: f64, %arg1: f64) -> f64 {
%0 = arith.addf %arg0, %arg1 : f64
return %0 : f64
}
}
Part 7: Lowering to LLVM IR
After generating MLIR, lower it to LLVM IR and compile to machine code:
#![allow(unused)] fn main() { // src/lib.rs pub mod ast; pub mod parser; pub mod codegen; use melior::{ Context, dialect::DialectRegistry, pass::PassManager, utility::register_all_dialects, }; pub fn compile_to_llvm(source: &str) -> Result<String, CompileError> { // 1. Parse let tokens = lexer::tokenize(source)?; let program = parser::Parser::new(tokens).parse()?; // 2. Generate MLIR let registry = DialectRegistry::new(); register_all_dialects(®istry); let context = Context::new(); context.append_dialect_registry(®istry); context.load_all_available_dialects(); let module = codegen::generate_module(&context, &program); // 3. Run lowering passes (MLIR → LLVM IR) // The exact pass names depend on your Melior version. // Common passes: convert-scf-to-cf, convert-arith-to-llvmir, // convert-func-to-llvmir let pass_manager = PassManager::new(&context); // pass_manager.add_pass(pass::convert_scf_to_cf()); // pass_manager.add_pass(pass::convert_arith_to_llvm()); // pass_manager.add_pass(pass::convert_func_to_llvm()); // pass_manager.run(&module)?; Ok(module.as_operation().to_string()) } }
Using the CLI
# Compile Lox to MLIR
cargo run -- compile input.lox --emit-mlir -o output.mlir
# Lower MLIR to LLVM IR
mlir-translate output.mlir --mlir-to-llvmir -o output.ll
# Compile to executable
clang output.ll -o output
Part 8: Project Structure
lox-mlir/
├── Cargo.toml
├── src/
│ ├── lib.rs # Library entry point
│ ├── main.rs # CLI entry point
│ ├── ast.rs # AST definitions
│ ├── lexer.rs # Tokenizer
│ ├── parser.rs # Parser
│ ├── codegen/
│ │ ├── mod.rs
│ │ ├── generator.rs # MLIR code generator
│ │ ├── types.rs # Tagged union types
│ │ └── strings.rs # String constant handling
│ └── runtime/
│ ├── mod.rs
│ └── print.c # Runtime print implementation
├── examples/
│ ├── simple_add.rs
│ └── *.lox
└── tests/
└── integration.rs
Quick Reference: Lox → MLIR Mapping
| Lox Construct | Rust Enum | MLIR Operation |
|---|---|---|
a + b | BinaryOp::Add | arith.addf |
a - b | BinaryOp::Sub | arith.subf |
a * b | BinaryOp::Mul | arith.mulf |
a / b | BinaryOp::Div | arith.divf |
a < b | BinaryOp::Less | arith.cmpf olt |
a == b | BinaryOp::Equal | arith.cmpf oeq |
var x = v | VarStmt | Store in HashMap |
x | VariableExpr | Load from HashMap |
if (c) {...} | IfStmt | scf.if |
while (c) {...} | WhileStmt | scf.while |
fun f(...) {...} | FunctionStmt | func.func |
f(args) | CallExpr | func.call |
return v | ReturnStmt | func.return |
Differences from C++ MLIR
| Aspect | C++ MLIR | Melior (Rust) |
|---|---|---|
| Dialect definition | TableGen (.td) | Rust code directly |
| Operations | Generated from ODS | Built with OperationBuilder |
| Ownership | Manual / raw pointers | RAII with lifetimes |
| Pattern rewriting | C++ classes | Closures / Rust traits |
| Error handling | LogicalResult | Result<T, Error> |
Next Steps
- Start small: Just numbers and arithmetic. Get
print 1 + 2;working. - Add variables: Implement local variables with SSA values.
- Add control flow:
ifandwhilewithscfdialect. - Add functions:
func.funcandfunc.call. - Add closures: Environment capture with heap allocation.
- Add classes/objects: The full Lox experience.
Melior provides a safe, idiomatic Rust interface to MLIR. The ownership model takes some getting used to (regions/blocks are moved rather than borrowed), but the type system prevents most common errors.