MLIR for Lox: Part 10 — Errors That Point to the Line — Source Locations from AST to Runtime
You’ve built a compiler. It works — most of the time. When it doesn’t, you get one of two things: a panic from the runtime, or silent wrong output. Neither helps the programmer figure out what went wrong.
Let’s fix that. We’ll add two things:
Compile-time error reporting — when the parser or code generator hits a problem, the programmer gets a message that points to the exact line of source code.
Runtime error reporting — when the compiled code does something wrong (type mismatch, undefined variable), the error includes the source location, not only “something broke”.
The foundation for both already exists. Part 1 gave us source locations on every AST node, and MLIR’s first-class Location type carries them through the compilation pipeline. Now we use them.
The Problem with Bare Errors
Here’s what a runtime type error looks like today:
cannot use String in arithmetic
Which variable? Which line? Which function? The programmer has no idea. Compare:
error: cannot use String in arithmetic
┌─ test.lox:3:14
│
3 │ var x = name + 1;
│ ^^^^ String
The second version tells you where and what. The first version is barely better than a segfault. The difference is source locations.
Compile-Time Errors
Parser Errors
Our parser already has error recovery from Crafting Interpreters — it synchronizes on statement boundaries after an error. What it doesn’t have is good error messages. The ParseError type is a bare string:
#![allow(unused)]
fn main() {
// What we have
return Err(ParseError("Expected expression"));
}
We can do better. A parse error should include the token’s location:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone)]
pub struct ParseError {
pub message: String,
pub line: usize,
pub column: usize,
pub source_file: String,
}
impl std::fmt::Display for ParseError {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{}:{}:{}: error: {}",
self.source_file, self.line, self.column, self.message
)
}
}
}
Every Token already carries a line number from the lexer. Use it:
#![allow(unused)]
fn main() {
fn expect(&mut self, expected: TokenType) -> Result<Token, ParseError> {
if self.check(expected) {
Ok(self.advance())
} else {
let current = self.peek();
Err(ParseError {
message: format!("Expected {}, found {}", expected, current.token_type),
line: current.line,
column: current.column,
source_file: self.source_file.clone(),
})
}
}
}
This gives parse errors like:
test.lox:7:5: error: Expected ';', found 'print'
Not beautiful, but a huge improvement over “Expected semicolon.”
Pretty Error Formatting
The Display impl gives us test.lox:7:5: error: Expected ';', found 'print' — machine-readable but hard to scan. A formatted error that shows the source line and a caret pointing to the problem is more useful:
This is a simplified first version. The
Reporterversion below (“Building the Error Reporter”) is the canonical one — it adds context lines, collects multiple errors, and handles edge cases. Read this version for the core idea, then use the Reporter for real code.
#![allow(unused)]
fn main() {
// Simplified first version — works for a single parse error.
// The Reporter version below ("Building the Error Reporter") is the
// canonical one: it adds context lines and collects multiple errors.
pub fn format_error(source: &str, error: &ParseError) -> String {
let line_text = source.lines().nth(error.line.saturating_sub(1)).unwrap_or("");
let line_num = error.line;
let column = error.column;
let pad = " ".repeat(column.saturating_sub(1));
format!(
"error: {}\n ┌─ {}:{}:{}\n │\n{:>4} │ {}\n │ {}^\n",
error.message,
error.source_file,
error.line,
error.column,
line_num,
line_text,
pad,
)
}
}
This produces:
error: Expected ';', found 'print'
┌─ test.lox:7:5
│
7 │ var x = 1
│ ^
The caret points to exactly where the parser gave up. That’s useful. The Reporter version below adds one line of context before the error and collects multiple errors in a single pass.
Runtime Errors with Source Locations
Runtime errors are harder because by the time we’re executing JIT-compiled code, the original source locations aren’t directly available. The MLIR IR has them (every operation carries a Location), but the JIT doesn’t surface them on error.
There are two approaches:
Approach 1: Pass Locations Through to the Runtime
The obvious approach: extract the line and column from MLIR’s Location type and pass them as extra arguments to every runtime function that can fail. Here’s what it looks like — and why it’s harder than it sounds.
#![allow(unused)]
fn main() {
// MLIR signature: lox_type_check(i8, i64, i32_line, i32_col) -> i1
#[no_mangle]
pub extern "C" fn lox_type_check_wrapper(tag: u8, payload: i64, line: i32, col: i32) -> u8 {
if tag != TAG_NUMBER {
eprintln!("{}:{}: error: cannot use {:?} in arithmetic",
"test.lox", line, tag_to_type_name(tag));
return 0; // false — type check failed
}
1 // true — type check passed
}
}
In the code generator, pass the current source location as arguments:
#![allow(unused)]
fn main() {
fn compile_arithmetic_check(&self, tag: Value<'c, 'c>, block: &Block<'c>, location: Location<'c>) {
// ⚠️ Extracting line/column from MLIR's Location type is version-
// dependent. The TaggedValue approach below stores locations before
// they enter MLIR, which avoids this problem entirely.
let line = /* extract line from location */;
let col = /* extract column from location */;
let line_val = block.append_operation(arith::constant(
self.context,
IntegerAttribute::new(Type::parse(self.context, "i32").unwrap(), line as i64).into(),
Location::unknown(self.context),
));
let col_val = block.append_operation(arith::constant(
self.context,
IntegerAttribute::new(Type::parse(self.context, "i32").unwrap(), col as i64).into(),
Location::unknown(self.context),
));
let check_result = block.append_operation(func::call(
self.context,
FlatSymbolRefAttribute::new(self.context, "lox_type_check"),
&[tag, line_val.result(0).unwrap().into(), col_val.result(0).unwrap().into()],
&[Type::parse(self.context, "i1").unwrap().into()],
Location::unknown(self.context),
));
// If type check failed, we could either:
// (a) continue with an Error value, or
// (b) branch to a runtime error handler
}
}
This works but adds overhead — every potentially-failing operation gets two extra arguments. There’s also the version problem flagged in the code above: extracting line/column from MLIR’s Location type requires API calls that change between Melior releases. The TaggedValue approach below sidesteps this by storing locations before they go into MLIR, when the code generator still has the raw AST data. We’ll use that approach going forward. But it’s worth understanding this approach first — the runtime function signature (lox_type_check(i8, i64, i32_line, i32_col)) is the same either way. The difference is where the i32_line and i32_col values come from.
Approach 2: Stack Trace via MLIR Locations
A better approach for production: extract a stack trace from the JIT when an error occurs. MLIR operations carry locations, and the LLVM ORC JIT can map program counters back to source locations using debug info.
The setup:
- When generating MLIR, every operation gets a real
Location(notLocation::unknown) - When lowering to LLVM IR, MLIR’s LLVM conversion pass translates
FileLineColLocationvalues on operations into LLVMDILocationmetadata. This isn’t a separate “DebugInfo pass” — it happens automatically during theconvert-to-llvmpass whenLocationvalues are present. - The JIT compiles this debug info into DWARF
- On error, use
addr2lineor LLVM’s symbolizer to map the PC to a source location
This is the approach real compilers use. It’s more complex to set up but produces proper stack traces with zero runtime overhead for the non-error path.
For this tutorial, we’ll use a variant of Approach 1 — the TaggedValue approach below. The code generator stores source locations alongside generated values instead of extracting them back from MLIR, but the runtime API is the same: line and column integers passed to error functions.
Extracting Location Data from MLIR
MLIR locations are one of several types:
| Location Type | What It Holds | Example |
|---|---|---|
NameLocation | A name string + child location | Function name + body location |
FileLineColLocation | File name, line, column | "test.lox:3:14" |
FusedLocation | Multiple locations fused together | Inlined-from + call-site |
UnknownLocation | No information | Fallback |
The one we care about is FileLineColLocation. In theory, you’d check if a location is file-line-col and extract its filename, line, and column. In practice, Melior’s location introspection API (is_file_line_col(), filename(), line(), column()) changes between versions — the methods may not exist on your version’s Location type.
There’s a simpler approach that sidesteps the version problem entirely. The code generator creates locations from AST data — it already knows the line and column when it creates each operation. Store that data alongside the generated values instead of extracting it back from MLIR.
Storing Locations in the Code Generator
A simpler approach that avoids Melior API version issues: the code generator already has the AST node’s location when it creates MLIR operations. Store the line/column alongside the generated values:
#![allow(unused)]
fn main() {
/// Tracks source location for error reporting during codegen.
#[derive(Debug, Clone, Copy)]
struct SourceLoc {
line: usize,
column: usize,
}
/// A value produced by the code generator, with its source location attached.
struct TaggedValue<'c, 'a> {
tag: Value<'c, 'a>,
payload: Value<'c, 'a>,
loc: SourceLoc,
}
}
When the code generator creates an operation, it records the AST node’s location on the resulting TaggedValue. When a runtime call needs the location, the code generator passes it as arguments. No need to extract it back from MLIR’s Location type.
This is the approach we’ll use. It’s straightforward, works with any Melior version, and keeps the location data where it’s needed.
Building the Error Reporter
Let’s put it all together. We’ll create a Reporter that collects errors and formats them:
#![allow(unused)]
fn main() {
// src/reporter.rs
use std::io::Write;
/// A compile-time or runtime error with source location.
#[derive(Debug, Clone)]
pub struct Error {
pub message: String,
pub source_file: String,
pub line: usize,
pub column: usize,
}
/// Collects and formats errors.
pub struct Reporter {
errors: Vec<Error>,
source_file: String,
source_text: String,
}
impl Reporter {
pub fn new(source_file: String, source_text: String) -> Self {
Reporter {
errors: Vec::new(),
source_file,
source_text,
}
}
/// Report a compile-time error at the given source location.
pub fn error(&mut self, message: String, line: usize, column: usize) {
self.errors.push(Error {
message,
source_file: self.source_file.clone(),
line,
column,
});
}
/// How many errors have been reported?
pub fn error_count(&self) -> usize {
self.errors.len()
}
/// Did any errors occur?
pub fn has_errors(&self) -> bool {
!self.errors.is_empty()
}
/// Format a single error with source context.
fn format_error(&self, error: &Error) -> String {
let line_text = self.source_text.lines()
.nth(error.line.saturating_sub(1))
.unwrap_or("<no source>");
let caret_padding = " ".repeat(error.column.saturating_sub(1));
let line_num = error.line;
// Show one line of context before the error, if available.
let before = if error.line > 1 {
self.source_text.lines()
.nth(error.line - 2)
.map(|l| format!("{:>4} │ {}\n", error.line - 1, l))
.unwrap_or_default()
} else {
String::new()
};
format!(
"error: {}\n ┌─ {}:{}:{}\n │\n{}{:>4} │ {}\n │ {}^\n",
error.message,
error.source_file,
error.line,
error.column,
before,
line_num,
line_text,
caret_padding,
)
}
/// Print all collected errors to stderr.
pub fn print_errors(&self) {
for error in &self.errors {
eprintln!("{}", self.format_error(error));
}
if self.has_errors() {
eprintln!("{} error(s) found.", self.errors.len());
}
}
}
/// Report a runtime error. Called from the runtime wrappers.
///
/// This is a free function (not a method on `Reporter`) because runtime
/// errors are immediate — they're printed and the program exits. There's
/// no error collection for runtime failures, so there's no `Reporter` to
/// hold them.
pub fn runtime_error(message: &str, source_file: &str, line: usize, column: usize) {
// Runtime errors don't have source text in scope — we can't show
// the source line. Instead, we show the location and a shorter format
// that doesn't pretend to display the code.
eprintln!(
"error: {}\n ┌─ {}:{}:{}",
message,
source_file,
line,
column,
);
}
}
Using the Reporter
In the main compilation function:
#![allow(unused)]
fn main() {
use anyhow::{Result, anyhow};
fn compile(source: &str, source_file: &str) -> Result<()> {
let mut reporter = Reporter::new(source_file.to_string(), source.to_string());
// Parse
let tokens = match lexer::tokenize(source) {
Ok(t) => t,
Err(e) => {
reporter.error(format!("lexer error: {}", e), e.line, e.column);
reporter.print_errors();
return Err(anyhow!("lexing failed"));
}
};
let ast = match parser::Parser::new(tokens).parse() {
Ok(a) => a,
Err(e) => {
reporter.error(e.message.clone(), e.line, e.column);
reporter.print_errors();
return Err(anyhow!("parsing failed"));
}
};
// Code generation
let context = Context::new();
let module = codegen::compile(&context, &ast, &mut reporter)?;
if reporter.has_errors() {
reporter.print_errors();
// Continue to lower and run, but the programmer knows about the issues.
// Some errors are warnings — the code may still work.
}
// ... JIT compilation and execution ...
Ok(())
}
}
Runtime Error Handler
For runtime errors, we need a way for the JIT-compiled code to signal “something went wrong” and have the host print a useful message. The simplest approach: a runtime function that takes a message and a location.
#![allow(unused)]
fn main() {
// In the runtime module
/// Report a runtime error. Called from MLIR when a type check fails
/// or an undefined variable is accessed.
///
/// MLIR signature: lox_runtime_error(i32_error_kind, i32_line, i32_col) -> void
pub fn lox_runtime_error(error_kind: i32, line: i32, col: i32) {
let message = match error_kind {
1 => "undefined variable",
2 => "type error in arithmetic",
3 => "type error in comparison",
4 => "cannot call non-function",
5 => "undefined property",
_ => "unknown error",
};
eprintln!("error: {} at line {} column {}", message, line, col);
}
}
And the wrapper:
#![allow(unused)]
fn main() {
#[no_mangle]
pub extern "C" fn lox_runtime_error_wrapper(error_kind: i32, line: i32, col: i32) {
lox_runtime_error(error_kind, line, col);
}
}
Generating Runtime Error Calls
When the code generator detects an error condition (like a type mismatch), instead of silently producing wrong code, it emits a call to lox_runtime_error and then continues with an Error value:
#![allow(unused)]
fn main() {
// In compile_arithmetic — after type tag check fails
fn compile_arithmetic(&self, left: &TaggedValue<'c, 'c>, right: &TaggedValue<'c, 'c>, block: &Block<'c>) -> TaggedValue<'c, 'c> {
let location = Location::unknown(self.context);
// Check: are both operands numbers?
let left_is_number = self.emit_tag_check(left, TAG_NUMBER, block);
let right_is_number = self.emit_tag_check(right, TAG_NUMBER, block);
// Both checks must pass (logical AND on i1 results).
// Requires `use melior::dialect::ods::arith;` (not `melior::dialect::arith`,
// which only has constant/cmpf/cmpi/select). See the version note in
// Part 1 for the full explanation of both import paths.
let both_number = block.append_operation(
arith::andi(left_is_number, right_is_number, location)
);
// If not both numbers, report a runtime error
//
// The scf::if_ API varies between Melior versions. The conceptual
// pattern is consistent: create an scf.if that branches on the
// tag check, report the error in the else region, and compute
// the result in the then region.
//
// Here's what the generated MLIR should look like:
//
// scf.if %both_number -> (i8, i64) {
// // then: both are numbers — extract payloads and add
// %lhs_f64 = llvm.bitcast %lhs_payload : i64 to f64
// %rhs_f64 = llvm.bitcast %rhs_payload : i64 to f64
// %sum = arith.addf %lhs_f64, %rhs_f64 : f64
// %sum_i64 = llvm.bitcast %sum : f64 to i64
// scf.yield %c_num, %sum_i64 : i8, i64
// } else {
// // else: type error — report and return Error tag
// func.call @lox_runtime_error(%err_type, %line, %col)
// : (i32, i32, i32) -> ()
// scf.yield %c_err, %zero : i8, i64
// }
//
// For the Melior API call itself, check the docs for your version.
// The pattern is:
// scf::if_(context, result_types, condition,
// |then_region| { ... scf.yield ... },
// |else_region| { ... scf.yield ... },
// location)
// but parameter order and builder signatures differ across releases.
// After the scf.if, extract the (tag, payload) results:
// let result_tag = if_op.result(0).unwrap().into();
// let result_payload = if_op.result(1).unwrap().into();
// TaggedValue { tag: result_tag, payload: result_payload, loc: left.loc }
//
// The %line and %col arguments in the MLIR above come from the
// TaggedValue's stored source location. The codegen pattern is the
// same arith::constant + IntegerAttribute we used for tag values:
//
// let line_val = block.append_operation(arith::constant(
// context,
// IntegerAttribute::new(
// Type::parse(context, "i32").unwrap(),
// left.loc.line as i64,
// ).into(),
// location,
// ));
// let col_val = block.append_operation(arith::constant(
// context,
// IntegerAttribute::new(
// Type::parse(context, "i32").unwrap(),
// left.loc.column as i64,
// ).into(),
// location,
// ));
//
// Then pass line_val and col_val as the 2nd and 3rd arguments to
// the func.call for lox_runtime_error.
}
}
Honest simplification: This is verbose. Real compilers use a different strategy: they generate the “happy path” code (arithmetic without checks) and only insert tag checks when the optimizer can’t prove the type. The
scf.ifapproach above is correct but produces a lot of branches. A production compiler would use a “guard and deopt” pattern — check the type, branch to a deoptimization path if the check fails, and let the optimizer remove checks that are provably unnecessary.
Error Recovery: Continuing After an Error
A good error reporter doesn’t stop at the first error. It collects as many errors as possible before reporting. This is why our Reporter collects errors in a Vec instead of returning immediately.
There are two levels of recovery:
Compile-time recovery — the parser synchronizes and keeps going; the code generator emits placeholder values for broken expressions.
Runtime recovery — after a runtime error, the program continues execution with an Error value.
For compile-time recovery, the code generator uses Type::Error as a sentinel:
#![allow(unused)]
fn main() {
use std::collections::HashMap;
// When codegen hits an error, produce an Error value instead of panicking
fn compile_expression(&self, expr: &Expr, block: &Block<'c>, variables: &mut HashMap<String, Value<'c, 'c>>) -> TaggedValue<'c, 'c> {
match expr {
// ... normal cases (Binary, Unary, Literal, etc.) ...
//
// If you add error nodes to the AST (a common recovery strategy
// where the parser produces a placeholder instead of aborting),
// you'd handle them here. Our parser uses ParseError +
// synchronization instead of AST error nodes, so this arm
// wouldn't appear in our Expr enum. The pattern is the same
// either way: report the error, emit an Error-tagged value,
// and continue.
//
// Example with a hypothetical Expr::Error variant:
//
// Expr::Error(e) => {
// self.reporter.error(
// format!("cannot compile expression: {}", e),
// expr.line(),
// expr.column(),
// );
// self.emit_error_value(block, expr.location())
// }
}
}
fn emit_error_value(&self, block: &Block<'c>, loc: SourceLoc) -> TaggedValue<'c, 'c> {
let location = Location::unknown(self.context);
let tag = block.append_operation(arith::constant(
self.context,
IntegerAttribute::new(Type::parse(self.context, "i8").unwrap(), TAG_ERROR as i64).into(),
location,
));
let payload = block.append_operation(arith::constant(
self.context,
IntegerAttribute::new(Type::parse(self.context, "i64").unwrap(), 0).into(),
location,
));
TaggedValue {
tag: tag.result(0).unwrap().into(),
payload: payload.result(0).unwrap().into(),
loc,
}
}
}
The TAG_ERROR tag is a special tag value that indicates “something went wrong producing this value.” Any operation that receives an Error-tagged value propagates the error instead of producing a new one. This prevents error cascading — one error doesn’t produce a cascade of spurious errors downstream.
TAG_ERROR isn’t in the ObjType enum from Part 7 (which uses its own numbering: BoundMethod = 6). It’s a compiled value tag — it only exists in the i8 tag byte of the tagged union, not in any heap object’s header. The runtime never produces it; only the code generator writes it when it can’t compile an expression. The compiled value tags from Part 7 use 0–7 (Nil through BoundMethod), so TAG_ERROR is 8.
#![allow(unused)]
fn main() {
// Extended from compiled value tags in Part 7 — compiler-only, not a runtime object type.
// Tags 0–7 are used: Nil(0), Bool(1), Number(2), String(3), Closure(4),
// Instance(5), Class(6), BoundMethod(7). TAG_ERROR is the next available value.
const TAG_ERROR: u8 = 8;
}
This tag value will never appear in a heap object’s ObjType field. It only exists in the compiled tagged union’s i8 tag byte, where it tells downstream operations “this value is invalid — skip your logic and propagate.”
#![allow(unused)]
fn main() {
// In compile_arithmetic — check for Error values first
fn compile_arithmetic(&self, left: &TaggedValue<'c, 'c>, right: &TaggedValue<'c, 'c>, block: &Block<'c>) -> TaggedValue<'c, 'c> {
// If either operand is an Error, propagate it silently.
// No need to report another error — the source was already reported.
//
// ⚠️ `left.tag` and `right.tag` are MLIR `Value<'c>` (SSA values),
// not Rust integers. You emit MLIR-level operations, not Rust `if`.
// The pattern is the same `scf.if` shown above for the type check —
// with a different condition (checking for TAG_ERROR instead
// of TAG_NUMBER) and different branches:
//
// %is_left_error = arith.cmpi eq, %left_tag, %TAG_ERROR : i8
// %is_right_error = arith.cmpi eq, %right_tag, %TAG_ERROR : i8
// %any_error = arith.ori %is_left_error, %is_right_error : i1
// %result = scf.if %any_error -> (i8, i64) {
// scf.yield %error_tag, %zero_payload : i8, i64
// } else {
// // ... normal type-checking and arithmetic ...
// scf.yield %result_tag, %result_payload : i8, i64
// }
//
// See Part 1's compile_if for the original scf.if implementation.
// ... normal type-checking and arithmetic code ...
}
}
This is the same pattern we used in the Lua type checker tutorial (Chapter 3, type compatibility rules): Error is compatible with everything (to prevent cascading), but it propagates silently.
Putting It Together: An Error in Action
Let’s trace a real error through the system. Given:
var name = "Alice";
var x = name + 1;
print x;
- Parser — parses successfully. No errors.
- Code generator — compiles
var name = "Alice"→ string constant withTAG_STRING. Compilesname + 1:- Left operand:
TAG_STRING(fromnamevariable lookup) - Right operand:
TAG_NUMBER(from1literal) - Type check: neither is Error, so proceed to tag check
scf.ifbranch: left is notTAG_NUMBER→ else branch fires- Else branch: calls
lox_runtime_error(2, 2, 9)— “type error in arithmetic” at line 2, column 9 - Result:
TAG_ERRORvalue
- Left operand:
- Runtime — when the program runs, the output is:
error: type error in arithmetic at line 2 column 9 print x— receives an Error value, prints<error>instead of crashing
Not a beautiful diagnostic, but it’s correct — it tells the programmer exactly what went wrong and where. The next step would be to add the source line display (like the format_error function above), but that requires the runtime to have access to the source text, which means threading it through the Runtime struct or looking it up from a global.
A Note on Production Error Reporting
Real compilers do much more than what we’ve shown here. A compiler shouldn’t report the same error twice, or report errors that are consequences of earlier errors — the Error-tag propagation pattern helps, but it’s not perfect. Production compilers assign numeric codes to errors (E0308: mismatched types in Rust) so users can look them up; our error_kind integer is a step in this direction. Suggestions like “did you mean add instead of aad?” require fuzzy matching on identifiers and are worth adding for a real tool. Terminal colors make errors easier to read — the termcolor crate is the standard approach in Rust. And IDEs and language servers need errors as structured data (JSON), not formatted text — the Reporter could implement a to_json() method for this use case.
These are all incremental improvements on the same foundation: source locations + error collection + formatted output. The pattern doesn’t change; the polish does.
Next: Part 11 — Cross-Module Linking — Every program we’ve compiled lives in a single file. Real programs don’t. We’ll compile two Lox files to two MLIR modules, then merge them into one executable — and see why the linker needs to resolve symbol references across module boundaries.