Chapter 2: Parsing Lua — From Source Text to a Queryable AST
You have a Lua file. You want to type-check it. First problem: you need to parse it. But Lua’s grammar has 80+ production rules, operator precedence tables, and ambiguous constructs like a:b(c) vs a.b(c). Writing a Lua parser from scratch would be its own tutorial.
Good news: the analisar crate already parses Lua. We’ll use it. The real work isn’t parsing — it’s wiring the parser into Salsa so that changing the source text automatically re-parses and re-type-checks.
Why Separate Parsing from Type-Checking?
Here’s a question you might be asking: why not type-check directly from the source text? Why have a parsing step at all?
Parsing is expensive, type-checking is more expensive. If you mash them together, you lose the ability to cache them independently. A change that only affects one function’s body shouldn’t require re-parsing the entire file — and it shouldn’t require re-type-checking unrelated functions either.
Different invalidation granularity. The parser runs on the whole source text. The type checker runs on individual expressions. By separating them, Salsa can skip the type checker when only the parse result changes but the types stay the same (this happens less often, but it can happen — think comments or whitespace that don’t affect the AST structure).
The pipeline so far:
SourceFile (input)
│
▼
parse() ← tracked, cached
│
▼
LuaAst ← owned data
│
▼
derived queries (type checking, name resolution, etc.)
Step 1: The Same Input
We reuse the SourceFile input from Chapter 1:
#![allow(unused)]
fn main() {
use std::path::PathBuf;
#[salsa::input]
pub struct SourceFile {
#[returns(ref)]
pub path: PathBuf,
#[returns(ref)]
pub text: String,
}
}
Nothing new here. The source text is our input, and Salsa tracks when it changes.
Step 2: An Owned AST
Analisar’s AST uses borrowed strings (Cow<'a, str> for identifiers). This makes sense for analisar — it avoids allocating copies. But Salsa tracked functions need to return owned data. Why? Because the cached result has to survive across revisions. If the result borrows from the source text, and the source text changes, you’ve got a dangling reference.
So we define our own owned AST types:
#![allow(unused)]
fn main() {
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub struct LuaAst {
pub statements: Vec<Statement>,
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub enum Statement {
/// `local x = expr` or `x = expr`
Assignment {
local: bool,
targets: Vec<Expression>,
values: Vec<Expression>,
},
/// A bare expression statement (usually a function call)
Expression(Expression),
/// `return expr, ...`
Return(Vec<Expression>),
/// `function name(args) block end`
Function {
local: bool,
name: String,
params: Vec<String>,
body: Vec<Statement>,
},
/// `if cond then block [elseif cond then block]* [else block] end`
If {
test: Expression,
then_block: Vec<Statement>,
else_block: Option<Vec<Statement>>,
},
/// `while cond do block end`
While {
condition: Expression,
body: Vec<Statement>,
},
}
#[derive(Debug, Clone, PartialEq, Eq, Hash, salsa::Update)]
pub enum Expression {
Nil,
True,
False,
Number(String),
StringLiteral(String),
Name(String),
BinaryOp {
left: Box<Expression>,
op: BinOp,
right: Box<Expression>,
},
UnaryOp {
op: UnaryOp,
expr: Box<Expression>,
},
FunctionCall {
func: Box<Expression>,
args: Vec<Expression>,
},
FieldAccess {
object: Box<Expression>,
field: String,
},
Index {
object: Box<Expression>,
index: Box<Expression>,
},
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum BinOp {
Add, Sub, Mul, Div, Mod, Pow,
Concat, Lt, Gt, Le, Ge, Eq, Ne,
And, Or,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, salsa::Update)]
pub enum UnaryOp {
Negate, Not, Length,
}
}
That’s a lot of types. But notice the pattern — every type derives salsa::Update, every string is owned (String, not Cow<'a, str>), and recursive structures use Box. These aren’t arbitrary choices. salsa::Update is required for any type used as a tracked function’s return value — Salsa caches the results of tracked functions, and when inputs change, it needs to compare old and new values to decide whether downstream queries need to re-run. For simple types (numbers, strings), it’s a straightforward equality check; for nested types, it recurses into fields. If you forget this derive, the compiler will remind you. Owned Strings are necessary because Salsa caches the result across revisions — if the AST borrowed from the source text and the source text changed, you’d have dangling references. The conversion from analisar’s Cow<'a, str> to our String happens in the convert_ast function; that’s the boundary where borrowed data becomes owned data. And Box for recursive types is Rust’s rule: a type can’t contain itself by value, so Expression::BinaryOp { left: Box<Expression>, ... } compiles, but left: Expression wouldn’t.
But wait — isn’t cloning expensive? Yes, for large files. That’s why in Chapter 3 we’ll use #[salsa::interned] for strings, and in Chapter 4 we’ll use #[salsa::tracked] for AST nodes. Both avoid cloning. For now, we keep it simple with owned Strings.
Step 3: The Parse Query
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn parse(db: &dyn salsa::Database, source: SourceFile) -> LuaAst {
let text = source.text(db);
convert_ast(text)
}
}
This is the key tracked function. It takes a SourceFile (input) and returns a LuaAst (owned data). Salsa caches this — if the source text hasn’t changed since the last revision, the cached AST is returned without re-parsing.
The convert_ast function is where we bridge analisar’s borrowed world into our owned world. We walk analisar’s AST node by node and build our own representation. It’s mechanical — mapping one enum variant to another — but it’s important to understand why we do it. analisar gives us a correct but borrowed AST; our AST is owned, hashable, and Salsa-compatible; and the conversion is the boundary between “external library” and “Salsa system.”
This boundary pattern — converting external data into Salsa-owned data at the query boundary — is common in Salsa projects. rust-analyzer does the same thing with the Rust parser.
The Conversion in Practice
The pattern is simple: each analisar variant maps to one of our variants. Here are three representative examples that show the key ideas:
#![allow(unused)]
fn main() {
// 1. Direct mapping — change the type, keep the structure
ast::Expression::Name(n) => Expression::Name(n.name.into_owned()),
// 2. Recursive wrapping — Box the children and recurse
ast::Expression::BinOp { left, op, right } => Expression::BinaryOp {
left: Box::new(convert_expr(*left)),
op: convert_binop(op),
right: Box::new(convert_expr(*right)),
},
// 3. Splitting a combined variant — analisar's Suffixed becomes FieldAccess or Index
ast::Expression::Suffixed(s) => {
let subject = convert_expr(s.subject);
// `s.property` is analisar's name for the suffix part — it could be
// a field name (an Expression::Name) or an index expression.
if !s.computed && !s.method {
// Field access: a.b — the property is always a Name in this case
if let Expression::Name(field) = convert_expr(s.property) {
Expression::FieldAccess { object: Box::new(subject), field }
} else {
// Shouldn't happen for a.b, but handle gracefully
Expression::FieldAccess { object: Box::new(subject), field: "<unknown>".to_string() }
}
} else if s.computed {
// Index: a[b]
Expression::Index { object: Box::new(subject), index: Box::new(convert_expr(s.property)) }
} else {
// Method calls (a:b()) — not modeled, placeholder keeps things compiling
Expression::FieldAccess { object: Box::new(subject), field: "<method-call>".to_string() }
}
}
}
n.name.into_owned() is where Cow<'a, str> becomes String. The analisar parser uses Cow<'a, str> for name fields — it can borrow from the source text during parsing, avoiding allocations. We need owned String for our AST (no lifetime parameter), so we call .into_owned(), a standard method on Cow. If you haven’t met Cow before: it’s a clone-on-write smart pointer that holds either a borrowed &str or an owned String. Calling .into_owned() extracts the owned String (or clones the borrowed &str into one) — either way, you get a String with no lifetime attached.
The Suffixed split is where analisar’s combined representation diverges from our simpler model. Analisar represents a.b, a[b], and a:b() with a single Suffixed node, and we split them into FieldAccess and Index. Method calls (a:b()) get a <method-call> placeholder — our AST doesn’t model them, but using a descriptive string instead of "?" means a type checker would report “unknown variable <method-call>” rather than the confusing “unknown variable ?”. In a production compiler you’d add a proper MethodCall variant; for the tutorial, the placeholder keeps things compiling and makes the simplification visible.
The _ => Expression::Nil fallback handles analisar variants we haven’t modeled yet (like table constructors). In a real project you’d add those over time. For the tutorial, silently falling back to Nil keeps things compiling without requiring every Lua construct up front.
The rest of the conversion follows the same pattern: match on the analisar variant, call convert_expr / convert_statement on children, wrap Box::new around recursive references, and call .into_owned() on Cow fields. Here’s the complete code — every variant, every helper function:
Complete Conversion Code
#![allow(unused)]
fn main() {
fn convert_ast(lua_text: &str) -> LuaAst {
let mut parser = LuaParser::new(lua_text.as_bytes());
let mut statements = Vec::new();
while let Some(Ok(stmt)) = parser.next() {
statements.push(convert_statement(stmt));
}
LuaAst { statements }
}
}
convert_ast creates an analisar parser, pulls statements one at a time, converts each, and wraps them in our LuaAst struct.
#![allow(unused)]
fn main() {
fn convert_statement(stmt: ast::Statement) -> Statement {
match stmt {
ast::Statement::Assignment { local, targets, values } => Statement::Assignment {
local,
targets: targets.into_iter().map(convert_expr).collect(),
values: values.into_iter().map(convert_expr).collect(),
},
ast::Statement::Expression(expr) => Statement::Expression(convert_expr(expr)),
ast::Statement::Return(ret) => {
Statement::Return(ret.0.into_iter().map(convert_expr).collect())
}
ast::Statement::Function { local, name, body } => Statement::Function {
local,
name: convert_func_name(&name),
params: body.par_list.names.0.into_iter().map(|n| n.name.into_owned()).collect(),
body: convert_block(body.block),
},
ast::Statement::If(if_stmt) => Statement::If {
test: convert_expr(if_stmt.test),
then_block: convert_block(if_stmt.block),
else_block: if_stmt.catch_all.map(convert_block),
},
ast::Statement::While { exp, block } => Statement::While {
condition: convert_expr(exp),
body: convert_block(block),
},
// Skip less common statement types for now
_ => Statement::Expression(Expression::Nil),
}
}
}
The Function variant shows a wrinkle: analisar gives us a FuncName type (with dot-separated names and optional method syntax), and we convert it to a plain String via the convert_func_name helper. The body.par_list.names field contains the parameter names — each one has a .name field that’s a Cow<'a, str>, so we call .into_owned() to get our owned String.
#![allow(unused)]
fn main() {
fn convert_block(block: ast::Block) -> Vec<Statement> {
block.0.into_iter().map(convert_statement).collect()
}
fn convert_func_name(name: &ast::FuncName) -> String {
let base: String = name.dot_separated.iter()
.map(|n| n.name.as_ref())
.collect::<Vec<_>>()
.join(".");
match &name.method {
Some(m) => format!("{}:{}", base, m.name),
None => base,
}
}
}
convert_func_name handles Lua’s function name syntax: foo.bar:baz becomes the string "foo.bar:baz". The dot_separated field is a vec of name segments; the optional method field adds the colon-separated method name.
#![allow(unused)]
fn main() {
fn convert_expr(expr: ast::Expression) -> Expression {
match expr {
ast::Expression::Nil => Expression::Nil,
ast::Expression::True => Expression::True,
ast::Expression::False => Expression::False,
ast::Expression::Numeral(n) => Expression::Number(n.0.into_owned()),
ast::Expression::LiteralString(s) => Expression::StringLiteral(s.0.to_string()),
ast::Expression::Name(n) => Expression::Name(n.name.into_owned()),
ast::Expression::BinOp { left, op, right } => Expression::BinaryOp {
left: Box::new(convert_expr(*left)),
op: convert_binop(op),
right: Box::new(convert_expr(*right)),
},
ast::Expression::UnaryOp { op, exp } => Expression::UnaryOp {
op: convert_unaryop(op),
expr: Box::new(convert_expr(*exp)),
},
ast::Expression::FuncCall(call) => Expression::FunctionCall {
func: Box::new(convert_expr(*call.prefix)),
args: convert_args(call.args),
},
ast::Expression::Suffixed(s) => {
let subject = convert_expr(s.subject);
if !s.computed && !s.method {
// Field access: a.b
if let Expression::Name(field) = convert_expr(s.property) {
Expression::FieldAccess {
object: Box::new(subject),
field,
}
} else {
// Shouldn't happen for a.b, but handle gracefully
Expression::FieldAccess {
object: Box::new(subject),
field: "<unknown>".to_string(),
}
}
} else if s.computed {
// Index: a[b]
Expression::Index {
object: Box::new(subject),
index: Box::new(convert_expr(s.property)),
}
} else {
// Method calls (a:b()) — placeholder keeps things compiling
Expression::FieldAccess {
object: Box::new(subject),
field: "<method-call>".to_string(),
}
}
}
_ => Expression::Nil,
}
}
}
Most variants are direct mappings — change the type, preserve the structure. The Suffixed case is the interesting one (we covered it in the examples above). The _ => Expression::Nil fallback catches table constructors and other Lua constructs we’re not modeling.
#![allow(unused)]
fn main() {
fn convert_args(args: ast::Args) -> Vec<Expression> {
match args {
ast::Args::ExpList(exprs) => exprs.into_iter().map(convert_expr).collect(),
ast::Args::String(s) => vec![Expression::StringLiteral(s.0.to_string())],
ast::Args::Table(_) => {
// Table arguments not supported in this simplified tutorial
vec![]
}
}
}
fn convert_binop(op: ast::BinaryOperator) -> BinOp {
match op {
ast::BinaryOperator::Add => BinOp::Add,
ast::BinaryOperator::Subtract => BinOp::Sub,
ast::BinaryOperator::Multiply => BinOp::Mul,
ast::BinaryOperator::Divide => BinOp::Div,
ast::BinaryOperator::Modulo => BinOp::Mod,
ast::BinaryOperator::Power => BinOp::Pow,
ast::BinaryOperator::Concatenate => BinOp::Concat,
ast::BinaryOperator::LessThan => BinOp::Lt,
ast::BinaryOperator::GreaterThan => BinOp::Gt,
ast::BinaryOperator::LessThanEqual => BinOp::Le,
ast::BinaryOperator::GreaterThanEqual => BinOp::Ge,
ast::BinaryOperator::Equal => BinOp::Eq,
ast::BinaryOperator::NotEqual => BinOp::Ne,
ast::BinaryOperator::And => BinOp::And,
ast::BinaryOperator::Or => BinOp::Or,
_ => todo!("bitwise binary operator not yet supported"),
}
}
fn convert_unaryop(op: ast::UnaryOperator) -> UnaryOp {
match op {
ast::UnaryOperator::Negate => UnaryOp::Negate,
ast::UnaryOperator::Not => UnaryOp::Not,
ast::UnaryOperator::Length => UnaryOp::Length,
ast::UnaryOperator::BitwiseNot => todo!("bitwise not not yet supported"),
}
}
}
convert_binop and convert_unaryop are straightforward mappings — with one important detail. The _ => catch-all and the BitwiseNot arm used to silently map to BinOp::Add and UnaryOp::Negate respectively. That’s worse than it sounds: ~0xFF would silently become -256, and 5 & 3 would silently become 5 + 3. No compilation error, no panic — a wrong answer with no indication that the conversion was the culprit.
We use todo!() instead. This panics at runtime if you hit an unsupported operator, which is loud and obvious. That’s the right trade-off for a tutorial: you want to know immediately when you’ve reached a construct the code doesn’t handle, not discover it later through subtly wrong results. The _ => Expression::Nil fallback earlier follows the same principle — Nil is obviously wrong in context, so a reader would spot the problem immediately. Silent remaps to plausible values are the dangerous case.
Step 4: Query Chaining
Now that we have parse(), we can build derived queries on top of it:
#![allow(unused)]
fn main() {
#[salsa::tracked]
pub fn top_level_names(db: &dyn salsa::Database, source: SourceFile) -> Vec<String> {
let ast = parse(db, source); // depends on parse
let mut names = Vec::new();
for stmt in &ast.statements {
match stmt {
Statement::Assignment { local: true, targets, .. } => {
for target in targets {
if let Expression::Name(name) = target {
names.push(name.clone());
}
}
}
Statement::Function { local: true, name, .. } => {
names.push(name.clone());
}
_ => {}
}
}
names
}
}
top_level_names calls parse(db, source), so it depends on the parse result. If the source changes:
parse()re-runs (its input changed)top_level_names()re-runs (its dependency — parse — produced a new result)
But here’s a subtlety: if parse() returns a different LuaAst (the source changed in a way that affects the AST), top_level_names() re-runs. If parse() returns the same LuaAst (e.g., you changed a comment that analisar doesn’t include in the AST), top_level_names() returns its cached value.
Salsa checks the value of dependencies, not only whether they were re-executed. This is called verify re-execution: Salsa re-runs parse(), compares the result to the old cache, and only propagates the change if the value is actually different.
Running
cargo run --bin ch02-parsing-lua
A Note on Code Structure
Each chapter in this tutorial is a standalone binary with its own copy of the AST types, environment, and inference logic. This is intentional — it means you can open any chapter and see the complete, working code without chasing imports across multiple files.
The downside is obvious: any bug found in one chapter exists in all of them. In production code, you’d extract shared types into a library crate. For a tutorial, self-containedness wins — you never have to wonder “which version of this function am I looking at?”
Next: Chapter 3: Interned Symbols — We’ll use #[salsa::interned] for identifiers, which lets us compare names by ID instead of by string comparison. This is how compilers handle identifiers efficiently.